The goal of the R package geoparser is to detect country names in a text document (e.g. a PDF file imported with the R package pdftools).

Installation

You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("frbcesab/geoparser")

Then you can attach the package geoparser:

Overview

The package geoparser contains the function geoparser() used to detect countries listed in the internal dataset world_countries.

The function geoparser() returns a data.frame with the following columns:

  • geographic_entity: the name of the country according to GADM;
  • n_pages: the total number of pages in the document;
  • page: the page number;
  • count: the occurrence of the country for a given page.

Example:

geographic_entity n_pages page count
Canada 11 1 2
Canada 11 2 5
United States 11 3 1
Canada 11 3 5
United States 11 4 1
United States 11 5 1
Canada 11 9 1
Denmark 11 10 1
United Kingdom 11 10 2
United States 11 10 2
Canada 11 10 5
Australia 11 11 1
Bangladesh 11 11 1
Estonia 11 11 1
Canada 11 11 2
United Kingdom 11 11 2
United States 11 11 5

Citation

Please cite this package as:

Casajus Nicolas (2023) geoparser: An R package to detect country names in documents. R package version 0.1. https://frbcesab.github.io/geoparser.

Code of Conduct

Please note that the geoparser project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.