Version control

Introduction to git & GitHub

May 2024

Nicolas Casajus

Introduction

Motivations

Project content (without git)

Motivations

Project content (without git)

Questions

  • Which version of analyses.R is the final one?
  • What about data.csv?
  • What are the differences between versions?
  • Who have contributed to these versions? When?

Motivations

Project content (without git)

Questions

  • Which version of analyses.R is the final one?
  • What about data.csv?
  • What are the differences between versions?
  • Who have contributed to these versions? When?

 We need a tool that deals with versions for us

Motivations

Project content (without git)

Project content (with git)

Presentation of git

git is a Version Control System (VCS).

Presentation of git

git is a Version Control System (VCS). With git you can:

  • keep your working copy clean
  • make contributions transparent
    (what | who | when | why)
  • keep the entire history of a file (and project)
  • inspect a file throughout its life time
  • revert back to a previous version
  • handle multiple versions (branches)
  • facilitate collaborations w/ code hosting platforms
    (GitHub, GitLab, Bitbucket, etc.)
  • backup your project



A word of warning

git and GitHub are not the same thing

  • git is a free and open-source software
  • GitHub (and co) is a web platform to host and share projects tracked by git


In other words:

You do not need GitHub to use git but you cannot use GitHub without using git

git as a CLI

RStudio and git

Git main panel

RStudio and git

Stage files, view differences and commit changes

View history and versions

Using git

How does git work?

  • git takes a sequence of snapshots
  • Each snapshot can contain changes for one or many file(s)
  • User chooses which files to ‘save’ in a snapshot and when
    (!= file hosting services like Dropbox, Google Drive, etc.)


 In the git universe, a snapshot is a version, i.e. the state of the whole project at a specific point in time


A snapshot is a two-step process:

  • Stage files: select which files to add to the version
  • Commit changes: save the version and add metadata (commit message)

Basic workflow

 Initialize git in a (empty) folder (repository)


git init


The three areas of a git repository:

  • working copy: current state of the directory (what you actually see)
  • staging area: selected files that will be added to the next version
  • repository: area w/ all the versions
    (the .git/ subdirectory)

Basic workflow

 Add new files in the repository


git status

# On branch main
# 
# No commits yet
# 
# Untracked files:
#   README.md
#   analyses.R
#   data.csv
# 
# Nothing added to commit but untracked files present
# Use "git add <file>..." to track

Basic workflow

 Stage (select) one file


git add data.csv


git status

# On branch main
# 
# No commits yet
# 
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#   new file:   data.csv
# 
# Untracked files:
#   (use "git add <file>..." to track)
#   README.md
#   analyses.R

Basic workflow

 Stage (select) several files


git add data.csv analyses.R


git status

# On branch main
# 
# No commits yet
# 
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#   new file:   analyses.R
#   new file:   data.csv
# 
# Untracked files:
#   (use "git add <file>..." to track)
#   README.md

Basic workflow

 Stage (select) all files


git add .


git status

# On branch main
# 
# No commits yet
# 
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#   new file:   analyses.R
#   new file:   data.csv
#   new file:   README.md

Basic workflow

 Commit changes to create a new version


git commit -m "a good commit message"

Basic workflow

 Now we are up-to-date


git status

# On branch main
# nothing to commit, working tree clean

The status of a file

With git a file can be untracked or tracked1. If it’s tracked, it can be:

  • unmodified
  • modifed and unstaged
  • modified and staged

The status of a file

When you create a new file, by default it’s untracked.

The status of a file

To tell git to track this new file, you have to stage it.

The status of a file

After commiting your changes, the file becomes unmodified (up-to-date with the latest version).

The status of a file

If you edit tracked file, it becomes modified.

The status of a file

When you decide to create a new version, stage the modified file.

The status of a file

After commiting your changes, the file becomes unmodified (up-to-date with the latest version).

The .gitignore

 We can also tell git to ignore specific files: it’s the purpose of the .gitignore file


Which files? For instance:

  • passwords, tokens and other secrets
  • temporary files
  • large files

The .gitignore

 We can also tell git to ignore specific files: it’s the purpose of the .gitignore file


Which files? For instance:

  • passwords, tokens and other secrets
  • temporary files
  • large files

The syntax is simple:

# Ignore a specific file
README.html

# Ignore all PDF
*.pdf

# Ignore a folder
data/

# Ignore a subfolder
data/raw-data/

# Ignore a specific file in a subfolder
data/raw-data/raw-data.csv


 Template for projects available here

Commits

When committing a new version (w/ git commit), the following information must be added:

  • WHO - the person who has made the changes
    (automatically added by git)
  • WHEN - the date of the commit
    (automatically added by git)
  • WHAT - the files that have been modified
    (selected by the user w/ git add)
  • WHY - the reason of the commit, i.e. what has been done compared to the previous version
    (added by the user w/ git commit)

Commits

When committing a new version (w/ git commit), the following information must be added:

  • WHO - the person who has made the changes
    (automatically added by git)
  • WHEN - the date of the commit
    (automatically added by git)
  • WHAT - the files that have been modified
    (selected by the user w/ git add)
  • WHY - the reason of the commit, i.e. what has been done compared to the previous version
    (added by the user w/ git commit)

A commit message has a title line, and an optional body

# Commit message w/ title
git commit -m "title"


What is a good commit message?

A good commit title:

  • should be short (less than 50 characters)
  • should be informative and unambiguous
  • should use active voice and present tense


An optional body can be added to provide detailed information and to link external references (e.g. issue, pull request, etc.)

When should you commit?

When should you commit?


  • Commit a new version when you reach a milestone
  • Create small and atomic commits
  • Commit a state that is actually working

Using GitHub

Code hosting platforms

GitHub and co are cloud-based git repository hosting services

  Perfect solutions to collaborate on projects tracked by git


Services

  • Full integration of version control (commits, history, differences)
  • Easy collaboration w/ branches, forks, pull requests
  • Issues tracking system
  • Enhanced documentation rendering (README, Wiki)
  • Static website hosting
  • Automation & monitoring (CI/CD)

Main platforms

Presentation of GitHub

Overview

  • Created in 2008
  • For-profit company (property of Microsoft since 2018)
  • Used by more than 100 million developers around the world


Advantages

  • User-friendly interface for git
  • Free account w/ unlimited public/private repositories
  • Organization account (w/ free plan)
  • Advanced tools for collaboration

GitHub - Account homepage

GitHub - Organization homepage

GitHub - Repository homepage

Create a repository

Create a repository

Create a repository

Clone a repository w/ RStudio


Select Version Control

Select Git

Copy the URL and fill all the fields

Get the URL to clone

Local copy of a repository

Working w/ GitHub

 Add a new file: README.md


git status

# On branch main
# Your branch is up to date with 'origin/main'
#
# Untracked files:
#   README.md
# 
# Nothing added to commit but untracked files present
# Use "git add <file>..." to track

Working w/ GitHub

 Stage changes


git add .


git status

# On branch main
# Your branch is up to date with 'origin/main'
#
# Changes to be committed:
#   (use "git restore --staged <file>..." to unstage)
#   new file:   README.md

Working w/ GitHub

 Commit changes


git commit -m "add README"


git status

# On branch main
# Your branch is ahead of 'origin/main' by 1 commit.
#   (use "git push" to publish your local commits)
# 
# nothing to commit, working tree clean

Working w/ GitHub

 Push changes to remote


git push

# Sometimes, you'll need to use:
git push -u origin main


git status

# On branch main
# Your branch is up to date with 'origin/main'.
# 
# nothing to commit, working tree clean

Working w/ GitHub

 Pull changes from remote

Working w/ GitHub

 Pull changes from remote


git pull


git status

# On branch main
# Your branch is up to date with 'origin/main'.
# 
# nothing to commit, working tree clean

Help me, I can’t push!

When you try to push, you might see this following error message:

git push

# To github.com:ahasverus/projectname.git
#  ! [rejected]        main -> main (fetch first)
#
# error: failed to push some refs to 'github.com:ahasverus/projectname.git'
#
# hint: Updates were rejected because the remote contains work that you do
# hint: not have locally. This is usually caused by another repository pushing
# hint: to the same ref. You may want to first integrate the remote changes
# hint: (e.g., 'git pull ...') before pushing again.
# hint: See the 'Note about fast-forwards' in 'git push --help' for details.


 Just git pull and try to git push again

Help me, I can’t pull!

When you try to pull, you might see this following error message:

git pull

# [...]
# Auto-merging README.md
# CONFLICT (content): Merge conflict in README.md
#
# error: could not apply b8302e6... edit README
#
# hint: Resolve all conflicts manually, mark them as resolved with
# hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
# hint: You can instead skip this commit: run "git rebase --skip".
# hint: To abort and get back to the state before "git rebase", 
# hint: run "git rebase --abort".


 Welcome to the wonderful world of git conflicts

Resolving conflicts

What is a (lexical) conflict?

A git conflict appears when two versions cannot be merged by git because changes have been made to the same lines.


README.md - Version A

# The SURPRISE pizza

An amazing surprise of the dev team dedicated just to your fancy
thirst for fortune and originality.

README.md - Version B

# The Surprise Pizza

An amazing surprise of the dev team dedicated just to your fancy
thirst for fortune and originality.


Git will identify conflicts in files:

<<<<<<< HEAD
# The SURPRISE pizza
=======
# The Surprise Pizza
>>>>>>> b8302e6 (edit README)

An amazing surprise of the dev team dedicated just to your fancy
thirst for fortune and originality.


 You have to decide which version you want to keep.

Resolving conflicts

What is a (lexical) conflict?

A git conflict appears when two versions cannot be merged by git because changes have been made to the same lines.


README.md - Version A

# The SURPRISE pizza

An amazing surprise of the dev team dedicated just to your fancy
thirst for fortune and originality.

README.md - Version B

# The Surprise Pizza

An amazing surprise of the dev team dedicated just to your fancy
thirst for fortune and originality.


Git will identify conflicts in files:

<<<<<<< HEAD
# The SURPRISE pizza
=======
# The Surprise Pizza
>>>>>>> b8302e6 (edit README)

An amazing surprise of the dev team dedicated just to your fancy
thirst for fortune and originality.


README.md - Final version

# My wonderful pizza

An amazing surprise of the dev team dedicated just to your fancy
thirst for fortune and originality.


 You have to decide which version you want to keep.

Time to practice