class: right, middle, title-slide .title[ #
Corpus management ] .subtitle[ ##
Good practices, tips & tricks
] .author[ ###
.author[Nicolas Casajus] .email[Data scientist @ FRB-CESAB
nicolas.casajus@fondationbiodiversite.fr
]
] .date[ ### .date[December 2023] ] --- ## Table of content ### References acquisition <br> ### References management <br> ### References access <br> ### Remove duplicates --- ## Table of content ### References acquisition <br> ### .greyish[References management] <br> ### .greyish[References access] <br> ### .greyish[Remove duplicates] --- ## References acquisition ### Search equation `TS=("Salmo salar" AND Conservation) AND PY=(2010-2021)` --- ## References acquisition ### Web interfaces -- - Web of Science <img src="img/wos-0.png" width="95%" style="display: block; margin: auto;" /> --- ## References acquisition ### Web interfaces - Web of Science <img src="img/wos-1.png" width="95%" style="display: block; margin: auto;" /> --- ## References acquisition ### Web interfaces - Web of Science <img src="img/wos-2.png" width="95%" style="display: block; margin: auto;" /> --- ## References acquisition ### Web interfaces - Web of Science <img src="img/wos-3.png" width="95%" style="display: block; margin: auto;" /> --- ## References acquisition ### Files format - Plain text (`.txt`) ```latex FN Clarivate Analytics Web of Science VR 1.0 PT J AU Buisson, L Thuiller, W Casajus, N Lek, S Grenouillet, G AF Buisson, Laetitia Thuiller, Wilfried Casajus, Nicolas Lek, Sovan Grenouillet, Gael TI Uncertainty in ensemble forecasting of species distribution SO GLOBAL CHANGE BIOLOGY PY 2010 VL 16 IS 4 BP 1145 EP 1157 DI 10.1111/j.1365-2486.2009.02000.x UT WOS:000274813800001 ER EF% ``` --- ## References acquisition ### Files format - RIS format (`.ris`) ```latex TY - JOUR AU - Buisson, L AU - Thuiller, W AU - Casajus, N AU - Lek, S AU - Grenouillet, G TI - Uncertainty in ensemble forecasting of species distribution T2 - GLOBAL CHANGE BIOLOGY PY - 2010 VL - 16 IS - 4 SP - 1145 EP - 1157 DO - 10.1111/j.1365-2486.2009.02000.x AN - WOS:000274813800001 ER - ``` --- ## References acquisition ### Files format - BibTeX format (`.bib`) ```latex @article{WOS:000274813800001, author = {Buisson, Laetitia and Thuiller, Wilfried and Casajus, Nicolas and Lek, Sovan and Grenouillet, Gael}, title = {Uncertainty in ensemble forecasting of species distribution}, journal = {GLOBAL CHANGE BIOLOGY}, year = {2010}, volume = {16}, number = {4}, pages = {1145-1157}, doi = {10.1111/j.1365-2486.2009.02000.x} } ``` --- ## References acquisition ### Files format - BibTeX Article ```latex @article{mouillot2021, author = {Mouillot, David and Loiseau, Nicolas and Greni{\'{e}}, Matthias and Algar, Adam C. and Allegra, Michele and Cadotte, Marc W. and Casajus, Nicolas and Denelle, Pierre and Gu{\'{e}}guen, Maya and Maire, Anthony and Maitner, Brian and McGill, Brian J. and McLean, Matthew and Mouquet, Nicolas and Munoz, Fran{\c{c}}ois and Thuiller, Wilfried and Vill{\'{e}}ger, S{\'{e}}bastien and Violle, Cyrille and Auber, Arnaud}, year = {2021}, title = {The Dimensionality and structure of species trait spaces}, journal = {Ecology Letters}, volume = {9}, pages = {1988--2009}, doi = {10.1111/ele.13778} } ``` --- ## References acquisition ### Files format - BibTeX Book ```latex @book{berteaux2014, author = {Berteaux, Dominique and Casajus, Nicolas and de Blois, Sylvie}, year = {2014}, title = {Changements climatiques et biodiversit{\'{e}} du {Q}u{\'{e}}bec: vers un nouveau patrimoine naturel}, publisher = {Presses de l'Universit{\'{e}} du Qu{\'{e}}bec, Qu{\'{e}}bec, Canada}, pages = {202} } ``` --- ## References acquisition ### Files format - BibTeX Book Chapter ```latex @incollection{buisson2010, author = {Buisson, La{\"{e}}titia and Grenouillet, Ga{\"{e}}l and Casajus, Nicolas and Lek, Sovan}, year = {2010}, title = {Predicting the potential impacts of climate change on stream fish assemblages}, booktitle = {Community {E}cology of {N}orth {A}merican {S}tream {F}ishes: {C}oncepts, {A}pproaches, and {T}echniques}, publisher = {American Fisheries Society Symposium 73}, pages = {327--346} } ``` <br> -- - Resources: [https://www.bibtex.com](https://www.bibtex.com) --- ## References acquisition ### API (Application Programming Interface) --- ## References acquisition ### Web navigation <img src="img/api-1.png" width="95%" style="display: block; margin: auto;" /> - Click to download file(s) --- ## References acquisition ### API (Application Programming Interface) <img src="img/api-2.png" width="95%" style="display: block; margin: auto;" /> -- - Access to the raw data (JSON, XML, etc.) - Access using a client (implemented in R, Python, etc.) - Available for Web of Science, Scopus (subscription) - No API for Google Scholar (web scraping) - Some limitations: - Number of records per request - Number of requests per month - Incomplete data (i.e. abstract) --- ## References acquisition ### API - R package [`rwoslite`](https://github.com/frbcesab/rwoslite) for the [WOS Lite API](https://developer.clarivate.com/apis/woslite) <img src="img/rwoslite.png" width="75%" style="display: block; margin: auto;" /> > Available at: [https://github.com/frbcesab/rwoslite/](https://github.com/frbcesab/rwoslite/) -- - Installation ```r install.packages("remotes") remotes::install_github("frbcesab/rwoslite") ``` > But... you need an API key !!! --- ## References acquisition ### API - Usage of `rwoslite` ```r ## Write the query ---- query <- 'AU=("Casajus N")' ``` -- ```r ## Get the total number of records ---- rwoslite::wos_search(query, database = "WOS") ## [1] 20 ``` -- ```r ## Download records metadata ---- refs <- rwoslite::wos_get_records(query, database = "WOS") dim(refs) ## [1] 20 21 ``` --- ## References acquisition ### API - Usage of `rwoslite` ```r ## Preview of records ---- View(refs) ``` <img src="img/rwoslite-output.png" width="95%" style="display: block; margin: auto;" /> -- A lot of metadata, but no abstract... --- ## References acquisition ### API - Usage of `rwoslite` ```r ## Write complex queries ---- query <- 'TS=("Salmo salar" AND Conservation) AND PY=(2010-2021)' query <- "TS=salmo+salar AND TS=conservation AND PY=(2010-2021)" ``` -- <br> - Export as `.csv` (BibTeX not supported yet) ```r ## Export records ---- write.csv(refs, "path/to/filename.csv", row.names = FALSE) ``` --- ## Table of content ### .greyish[References acquisition] <br> ### References management <br> ### .greyish[References access] <br> ### .greyish[Remove duplicates] --- ## References management ### What is a References Management Software? -- Enables you to: - store, organize, and annotate references - retrieve metadata or full text (connection to databases) - easily add new references (plugins for Web browsers) - insert in-text citations (plugins for Word, Writer, RStudio) - generate a bibliography - share libraries and collaborate (online account) -- <br /> Different products: - [EndNote](https://endnote.com/) (Clarivate Analytics) - Not free - Mendeley Desktop (Elsevier) - Not supported anymore - [Mendeley Reference Manager](https://www.mendeley.com/reference-management/reference-manager) (Elsevier) - The new Mendeley - [Zotero](https://www.zotero.org/) (open-source) - Pick me! --- ## References management ### Quick comparison <br> | | [`EndNote`](https://endnote.com/) | [`Mendeley`](https://www.mendeley.com/reference-management/reference-manager) | [`Zotero`](https://www.zotero.org/) | |:----------------:|:----------------------:|:----------------:|:------------------:| | `OS` |
|
|
| | `License` | Proprietary<br>(Clarivate) | Proprietary<br>(Elsevier) | Open source<br>(AGPL) | | `Pricing` | \> 200 $ | Free | Free | | `Online storage` | 2 GB<br>(+ pricing options) | 2 GB<br>(+ pricing options) | 300 MB<br>(+ pricing options) | | `Citation styles` | \> 7000 | \> 7000 | \> 10,000 | | `Google Doc` |
|
|
| | `LaTeX` |
|
|
| | `Customization` |
|
|
| --- ## References management ### Introducing... .pull-left-2[ <img src="img/zotero-logo.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right-2[ <img src="img/zotero-ui.png" width="100%" style="display: block; margin: auto;" /> ] - [`Open source`](https://github.com/zotero/zotero) and free - [`Well documented`](https://www.zotero.org/support/) and active [`community`](https://forums.zotero.org/discussions) - [`Web browser`](https://www.zotero.org/download/connectors) and [`Word processor`](https://www.zotero.org/support/word_processor_integration) connectors - A lot of [`plugins`](https://www.zotero.org/support/plugins) and [`styles`](https://www.zotero.org/styles) - Support for LaTeX, BibTeX, and [`RStudio`](https://rstudio.github.io/visual-markdown-editing/citations.html) --- ## References management ### Zotero plugins -- - `Zotero connector`<br> .small[<https://www.zotero.org/download/connectors>] - Available for the most popular Web browsers:
- Add items to your Zotero library in one click - Add PDF (if open access) -- - `Word processor plugin`<br> .small[<https://www.zotero.org/support/word_processor_integration>] - Available for `Microsoft Word`, `LibreOffice Writer` and `Google Doc` - Insert in-text citations - Generate bibliography according to a selected style -- - Citation picker for text editors - [`zotero-citations`](https://atom.io/packages/zotero-citations) for Atom - [`Citation Picker for Zotero`](https://marketplace.visualstudio.com/items?itemName=mblode.zotero) for Visual Studio Code (and VS Codium) --- ## References management ### Zotero plugins <https://www.zotero.org/support/plugins> <br> -- - `Better BibTeX`<br> .small[<https://retorque.re/zotero-better-BibTeX/>] - Improve the compatibility with LaTeX, Markdown, and R - Auto-generate citation keys - Auto-export BibTeX files (one file per collection) <br> -- - `ZotFile`<br> .small[<http://zotfile.com/>] - Rename attachments (with a lot of naming rules) - Move attachments (to a specific location) --- ## References management ### Zotero and RStudio <img src="img/zotero-rstudio.png" width="60%" style="display: block; margin: auto;" /> <br> - Require a recent version of [`RStudio`](https://www.rstudio.com/products/rstudio/) - Edit R Markdown file using the Visual Markdown Editing Mode - Easily insert in-text citations - Auto-generate BibTex file .small[<https://rstudio.github.io/visual-markdown-editing/citations.html>] --- ## Table of content ### .greyish[References acquisition] <br> ### .greyish[References management] <br> ### References access <br> ### .greyish[Remove duplicates] --- class: inverse, middle, center ## On the side --- ## Bibliography with R Markdown -- **What do you need?** - The R packages [`rmarkdown`](https://rmarkdown.rstudio.com/) and [`tinytex`](https://yihui.org/tinytex/) (for PDF output) - The software [`Pandoc`](https://pandoc.org/installing.html) -- <br> and the following files structure: ```txt . ├── index.Rmd # Main Rmd document ├── references/ │ └── references.bib # A list of references (BibTeX) [exported from Zotero] └── styles/ └── style.csl # A bibliography style (CSL) [https://www.zotero.org/styles] ``` --- ## Bibliography with R Markdown .pull-left[ ```yaml --- title: 'Title of my article' author: 'N. Casajus' date: 'October 2022' output: pdf_document documentclass: article classoption: a4paper bibliography: 'references/references.bib' csl: 'styles/style.csl' --- Lorem ipsum dolor sit amet, consectetur adipis elit, sed do eiusmod tempor incididunt ut labo dolore magna aliqua. Ut enim ad minim veniam, nostrud exercitation ullamco laboris nisi ut consequat [@Ballesteros2020; @Basset2017]. @Decaens2021 aute irure dolor in reprehenderit # Bibliography ``` ] -- .pull-right[ <img src="img/cited-biblio-pdf.png" width="80%" style="display: block; margin: auto;" /> ] --- ## Bibliography with R Markdown .pull-left[ ```yaml --- title: 'Annexe A - Bibliography' output: pdf_document documentclass: article classoption: a4paper bibliography: 'references/references.bib' csl: 'styles/style.csl' nocite: '@*' --- ``` ] -- .pull-right[ <img src="img/biblio-pdf.png" width="80%" style="display: block; margin: auto;" /> ] .small[`nocite: '@*'` will create a bibliography with all references listed in `references/references.bib` without any in-text citation - Ideal for an appendix!] --- class: inverse, middle, center ## Back on topic --- ## References access -- ### Option 1 - SQL query Zotero stores references data in an [`SQLite`](https://www.sqlite.org) database -- ### Option 2 - Zotero API .small[<https://www.zotero.org/support/dev/web_api/v3/start>] Zotero provides an API to access online data (including public groups) -- ### Option 3 - Export .csv from Zotero Maybe not a good choice... -- ### Option 4 - Import BibTeX Directly read `.bib` files in
R package [`rbibtools`](https://github.com/frbcesab/rbibtools) --- ## References access ### Read `.bib` files in
- R package [`rbibtools`](https://github.com/frbcesab/rbibtools) <img src="img/rbibtools.png" width="75%" style="display: block; margin: auto;" /> > Available at: <https://github.com/frbcesab/rbibtools/> -- - Installation ```r install.packages("remotes") remotes::install_github("frbcesab/rbibtools") ``` --- ## References access ### Read `.bib` files in
- Usage of [`rbibtools`](https://github.com/frbcesab/rbibtools) ```r ## Read .bib file(s) ---- refs <- rbibtools::read_bib(path = "data/") ``` --- ## References access ### Read `.bib` files in
- Usage of `rbibtools` <img src="img/rbibtools-output.png" width="95%" style="display: block; margin: auto;" /> --- ## References access ### Read `.bib` files in
- Usage of `rbibtools` Additional parameters of `rbibtools::read_bib()`: - `tags`: select BibTeX fields (e.g. authors, title, keywords, etc.) - `categories`: filter references types (e.g. article, book, chapter, etc.) <br> See [`?rbibtools::read_bib`](https://frbcesab.github.io/rbibtools/reference/read_bib.html) for further information --- ## Table of content ### .greyish[References acquisition] <br> ### .greyish[References management] <br> ### .greyish[References access] <br> ### Remove duplicates --- ## Remove duplicates ### Deduplication .pull-left-2[ <img src="img/revtools-hex.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right-2[ - An R package to work on evidence synthesis - Can be used for `deduplication` and `screening` - Can be also used to import `.bib` files, but... <!-- end --> - Available at: <https://revtools.net/> ] -- To detect duplicates, you can use the function [`revtools::find_duplicates()`](https://revtools.net/deduplication.html)
By default, detect duplicates based on the DOI (can be changed) <br> -- A ShinyApp is also available with [`revtools::screen_duplicates()`](https://revtools.net/deduplication.html#screening-duplicates) --
Duplicated references will have the same ID (no deletion) --- ## Wrap-up <img src="img/wrap-up.png" width="100%" style="display: block; margin: auto;" /> --- class: inverse, middle, center ## Exercise --- ### Exercise 1. Download the two following `.bib` files: .small[
<https://raw.githubusercontent.com/literaturesynthesis/corpus-management/main/data/exercise/refs-scopus.bib>] <br/ > .small[
<https://raw.githubusercontent.com/literaturesynthesis/corpus-management/main/data/exercise/refs-webofscience.bib>] 2. Import the two `.bib` files in Zotero .small[
Create one collection per file (source)] 3. Export two new `.bib` files from Zotero 4. Import the new `.bib` files in
.small[
Use the package `rbibtools`: <https://github.com/frbcesab/rbibtools>] 5. Detect duplicated references .small[
Use the package `revtools`: <https://revtools.net/>] 6. Export the final table .small[
Use the package `writexl`: <https://cran.r-project.org/package=writexl>] --- ### Correction ```r ## Folder to save .bib files ---- path <- file.path("~", "Documents", "Demo") dir.create(path, recursive = TRUE) ## Download .bib files ---- repo_url <- paste0("https://raw.githubusercontent.com/literaturesynthesis/", "corpus-management/main/data/exercise/") filename_1 <- "refs-scopus.bib" filename_2 <- "refs-webofscience.bib" download.file(url = paste0(repo_url, filename_1), destfile = file.path(path, filename_1)) download.file(url = paste0(repo_url, filename_2), destfile = file.path(path, filename_2)) ## Import .bib files ---- refs <- rbibtools::read_bib(path) ## Detect duplicates (based on DOI) ---- refs$"unique_id" <- revtools::find_duplicates(refs) ## Number of duplicates ---- length(which(duplicated(refs$"unique_id"))) ## Create .xlsx file ---- writexl::write_xlsx(refs, file.path(path, "unique_references.xlsx")) ```