The goal of the R package geoparser
is to detect country names in a text document (e.g. a PDF file imported with the R package pdftools
).
You can install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("frbcesab/geoparser")
Then you can attach the package geoparser
:
The package geoparser
contains the function geoparser()
used to detect countries listed in the internal dataset world_countries
.
The function geoparser()
returns a data.frame
with the following columns:
geographic_entity
: the name of the country according to GADM;n_pages
: the total number of pages in the document;page
: the page number;count
: the occurrence of the country for a given page.Example:
geographic_entity | n_pages | page | count |
---|---|---|---|
Canada | 11 | 1 | 2 |
Canada | 11 | 2 | 5 |
United States | 11 | 3 | 1 |
Canada | 11 | 3 | 5 |
United States | 11 | 4 | 1 |
United States | 11 | 5 | 1 |
Canada | 11 | 9 | 1 |
Denmark | 11 | 10 | 1 |
United Kingdom | 11 | 10 | 2 |
United States | 11 | 10 | 2 |
Canada | 11 | 10 | 5 |
Australia | 11 | 11 | 1 |
Bangladesh | 11 | 11 | 1 |
Estonia | 11 | 11 | 1 |
Canada | 11 | 11 | 2 |
United Kingdom | 11 | 11 | 2 |
United States | 11 | 11 | 5 |
Please cite this package as:
Casajus Nicolas (2023) geoparser: An R package to detect country names in documents. R package version 0.1. https://frbcesab.github.io/geoparser.
Please note that the geoparser
project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.