The goal of the R package geoparser is to detect country names in a text document (e.g. a PDF file imported with the R package pdftools).
You can install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("frbcesab/geoparser")Then you can attach the package geoparser:
The package geoparser contains the function geoparser() used to detect countries listed in the internal dataset world_countries.
The function geoparser() returns a data.frame with the following columns:
geographic_entity: the name of the country according to GADM;n_pages: the total number of pages in the document;page: the page number;count: the occurrence of the country for a given page.Example:
| geographic_entity | n_pages | page | count |
|---|---|---|---|
| Canada | 11 | 1 | 2 |
| Canada | 11 | 2 | 5 |
| United States | 11 | 3 | 1 |
| Canada | 11 | 3 | 5 |
| United States | 11 | 4 | 1 |
| United States | 11 | 5 | 1 |
| Canada | 11 | 9 | 1 |
| Denmark | 11 | 10 | 1 |
| United Kingdom | 11 | 10 | 2 |
| United States | 11 | 10 | 2 |
| Canada | 11 | 10 | 5 |
| Australia | 11 | 11 | 1 |
| Bangladesh | 11 | 11 | 1 |
| Estonia | 11 | 11 | 1 |
| Canada | 11 | 11 | 2 |
| United Kingdom | 11 | 11 | 2 |
| United States | 11 | 11 | 5 |
Please cite this package as:
Casajus Nicolas (2023) geoparser: An R package to detect country names in documents. R package version 0.1. https://frbcesab.github.io/geoparser.
Please note that the geoparser project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.