Working with a Compendium
Source:vignettes/working_with_a_compendium.Rmd
working_with_a_compendium.Rmd
PLEASE READ THE GET STARTED VIGNETTE FIRST
Compendium content
First, create a new empty RStudio project. Let’s
called it comp
. To create a new compendium structure, run
rcompendium::new_compendium()
.
By default, the following content is created:
comp/ # Root of the compendium
│
├── comp.Rproj # RStudio project (created by user, optional)
│
├── .git/ # GIT tracking folder
├── .gitignore # List of files/folders to be ignored by GIT
| # (specific to R language)
│
├── R/ # R functions location
│ ├── fun-demo.R # Example of an R function (to remove)
│ └── comp-package.R # Dummy R file for high-level documentation
│
├── man/ # R functions helps (automatically updated)
│ ├── print_msg.Rd # Documentation of the demo R function
│ └── pkg-package.Rd # High-level documentation
│
├── DESCRIPTION # Project metadata [*]
├── LICENSE.md # Content of the GPL (>= 2) license (default)
├── NAMESPACE # Automatically generated
├── .Rbuildignore # List of files/folders to be ignored while
│ # checking/installing the package
│
├── README.md # GitHub README (automatically generated)
├── README.Rmd # GitHub README [*]
│
├── data/ # User raw data (.csv, .gpkg, etc.)
│ ├── raw-data/ # Read-only files
│ └── derived-data/ # Modified data derived from raw data
│
├── analyses/ # R scripts (not function) to run analyses
│
├── outputs/ # Outputs (R objects, .csv, etc.)
├── figures/ # Figures (.png, .pdf, etc.)
│
└── make.R # Main R script to source all R scripts
# stored in analyses/
[*] These files are automatically created but user needs to manually add
some information.
If create_repo = TRUE
(default), a new GitHub repository
will be created directly from R. It will be available at:
https://github.com/{{account}}/comp/
(where
{{account}}
is either your GitHub account or a GitHub
organization). This repository can be private if
private = TRUE
.
Compendium metadata
DESCRIPTION
The DESCRIPTION
file contains important compendium
metadata. By default rcompendium
creates the following
file:
Package: comp
Type: Package
Title: The Title of the Project [*]
Version: 0.0.0.9000
Authors@R: c(
person(given = "John",
family = "Doe",
role = c("aut", "cre", "cph"),
email = "john.doe@domain.com",
comment = c(ORCID = "9999-9999-9999-9999")))
Description: A paragraph providing a full description of the project (on [*]
several lines...)
URL: https://github.com/jdoe/comp
BugReports: https://github.com/jdoe/comp/issues
License: GPL (>= 2)
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
Imports:
devtools,
here
[*] Title and Description must be adapted by user.
This DESCRIPTION
file is specific to R package but it
can be used to work with research compendia (see below). For further
information on how to edit this file, please read https://r-pkgs.org/description.html.
README
The README.md
is the homepage of your repository on
GitHub. Its purpose is to help visitor to understand your project.
Always edit the README.Rmd
(not the .md
version).
For further information, please read https://r-pkgs.org/release.html?q=README#readme.
Recommendations
- Keep the root of the project as clean as possible
- Store your raw data in
data/raw-data/
- Document raw data modifications (with R scripts and/or R functions)
- Export modified raw data in
data/derived-data/
(oroutputs/
) - Store only R functions in
R/
- Store only R scripts and/or Rmd in
analyses/
- Control your project with the
make.R
file (add lines to source R scripts) - Built relative paths using
here::here()
- Call external functions as
package::function()
- Alternatively add
#' @import package
(or#' @importFrom package function
) inR/comp-package.R
to call external functions asfunction()
- Use
devtools::document()
to update theNAMESPACE
- Use
rcompendium::add_dependencies(".")
to update the list of required dependencies inDESCRIPTION
- Do not use
install.packages()
butremotes::install_deps()
(this will install required dependencies listed inDESCRIPTION
) - Do not use
library()
butdevtools::load_all()
(this will load required dependencies listed inDESCRIPTION
and R functions stored inR/
) - Do not source your functions but use instead
devtools::load_all()
- And document everything!
Advanced features
Using renv
The default structure created by
rcompendium::new_compendium()
is a good starting point for
making analyses reproducible. You can increase reproducibility by using
the package renv
.
renv
will freeze the exact package versions you depend
on (in renv.lock
). This ensures that each collaborator (or
you in the future) will use the exact same versions of these packages.
Moreover renv
provides to each project its own private
package library making each project isolated from others.
To initialize renv
for your compendium, use
renv = TRUE
in rcompendium::new_compendium()
or call the function rcompendium::add_renv()
after
rcompendium::new_compendium()
.
The make.R
will also be updated (replacement of
remotes::install_deps()
by
renv::restore()
)
Working with renv
- Work as usual
- Update
NAMESPACE
withdevtools::document()
- Update required dependencies in
DESCRIPTION
withrcompendium::add_dependencies(".")
- Install required dependencies locally with
renv::install()
- Save the local environment with
renv::snapshot()
Using docker
Docker is a tool that creates a
self-contained environment (containers) within which
the operating system, system libraries and software (i.e. R, RStudio
server) are frozen and ready-to-use. The use of a container requires a
recipe (Dockerfile
) that describes the steps needed to
create the environment. From this recipe, a Docker image is built: this
is a template from which a container will be created.
More information on Docker for R users: https://environments.rstudio.com/docker
To add a Dockerfile
to your compendium, use
dockerfile = TRUE
in
rcompendium::new_compendium()
or call the function
rcompendium::add_dockerfile()
after
rcompendium::new_compendium()
.
By default rcompendium
creates the following
Dockerfile
:
FROM rocker/rstudio:4.1.3
MAINTAINER John Doe <john.doe@domain.com>
## Install system dependencies (for devtools) ----
RUN sudo apt update -yq \
&& sudo apt install --no-install-recommends libxml2-dev -yq \
&& sudo apt clean all \
&& sudo apt purge \
&& sudo rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
## Copy local project ----
ENV folder="/home/rstudio/"
COPY . $folder
RUN chown -R rstudio:rstudio $folder
## Set working directory ----
WORKDIR $folder
## Install R packages ----
RUN R -e "install.packages('remotes', repos = c(CRAN = 'https://cloud.r-project.org'))" \
&& R -e "remotes::install_deps(upgrade = 'never')"
If you use renv
with Docker (recommended) the last step
will look like:
## Install R packages ----
ENV RENV_VERSION 0.15.4
RUN R -e "install.packages('remotes', repos = c(CRAN = 'https://cloud.r-project.org'))" \
&& R -e "remotes::install_github('rstudio/renv@${RENV_VERSION}')" \
&& sudo -u rstudio R -e "renv::restore()"
By default the Docker image is based on rocker/rstudio. You
can customize this Dockerfile
(e.g. adding system
dependencies) and use a different default Docker image
(i.e. tidyverse
, verse
,
geospatial
, etc.). For more information: https://github.com/rocker-org/rocker-versioned2
The versions of R and renv
(if applicable) specified in
the Dockerfile
are the same as the local system.
When the image will be built, the whole project will be added to the image. When a container will be launched the default working directory will be the root of the project.
Once the project is ready to be shared, the collaborator (or you) must follow these steps:
- Clone the repository
- Build the Docker image (in a terminal) from the
Dockerfile
by running:docker build -t "image_name" .
(this can take time) - Start a container (in a terminal) from this new Docker image
image_name
by running:docker run --rm -p 127.0.0.1:8787:8787 -e DISABLE_AUTH=true image_name
- Open a Web browser and visit
127.0.0.1:8787
: a new instance of RStudio Server is available with everything ready-to-use (data, code, packages, etc.) - To reproduce the analysis, run:
source("make.R")