Git has firmly established itself as the leading version control system for teams of developers across both open source and enterprise. According to the latest StackOverflow developer survey, over 87% of developers now use Git – far surpassing competitors like SVN, TFS, and Mercurial.
With distributed architecture, decentralized workflows, feature branching, issues tracking, code review and more baked-in, Git changes the dynamics of collaborating on software projects both large and small. Mastering Git as a professional developer now feels almost mandatory given the dominance of the tool.
In this comprehensive advanced guide on Git, I‘ll cover:
- Installing Git on Ubuntu
- Configuring user information
- Initializing repositories
- Staging changes, diffs and commit workflow
- Branch management, merging vs rebasing
- Git hooks, submodules, internals
- Github integration and collaboration
Including specifics on commands and detailed explanations of underlying Git mechanics – everything needed to leverage Git at an expert level.
Installing the Latest Git Version on Ubuntu
The first step is installing a current, maintained Git version. Ubuntu 20.04 repositories contain Git, but it‘s often behind the latest stable release.
We‘ll install Git from the official PPA, which keeps an up-to-date version packaged:
sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository ppa:git-core/ppa
sudo apt update
sudo apt install git
Verify the installation was successful and check Git version:
git --version
# git version 2.36.1
With Git itself ready, we can configure our identity before starting to use it.
Configuring Identity and Defaults
Git associates commits with your name and email address – this info should be configured as a first step:
git config --global user.name "John Doe"
git config --global user.email johndoe@example.com
Some useful additional customizations include:
git config --global color.ui auto # colored output
git config --global core.editor nano # set text editor
git config --global merge.tool vimdiff # set merge tool
The global configuration is located at ~/.gitconfig. Open this file directly in an editor to make changes.
Listing current configuration via git config --list --show-origin prints all settings along with where they are defined.
With identity set up, we can initialize repositories and understand the core Git workflow.
Git Repositories, Add/Commit/Push Cycle
To start version controlling a project, navigate into the directory and initialize a new Git repository:
cd myproject
git init
This creates a .git subdirectory containing all required metadata objects for the repo.
After making changes to files, the typical lifecycle for committing work is:
- Check status with
git status - Stage files with
git add <files> - Verify diff with
git diff - Commit changes with
git commit -m "message" - Push commits to remote repo
Repeating these steps ensures changes are regularly captured via descriptive commits. I‘ll expand on some common commands for managing staged and unstaged changes.
Viewing Repository Status
The git status command provides overview of current state – changes yet to commit, files ignored, branch tracking status, and more:
$ git status
On branch master
Your branch is up-to-date with ‘origin/master‘.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: README
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: benchmarks.rb
Status enables easy understanding of modified and staged contents in the working tree.
Changes Staging
To track a file in Git, changes must be successfully "staged" before committing via git add:
git add README.md # stages a single file
git add docs/ # stages all docs subdirectory changes
git add . # stages all file changes in tree
Only changes added to the stage are captured on next commit.
Committing Updates
To commit current changes from index state, the commit command permanently stores data with a message:
git commit -m "Implement new analytics algorithm"
Commits increment repository version history – staging, committing, and pushing to remote servers serves as a developer‘s commit log.
Speaking more technically, this finalizes the Git "Directed Acyclic Graph" (DAG) by anchoring an object vertex with parent relationships and file content fingerprints. More on internals later.
I‘ll now overview a powerful Git feature – customizable hooks.
Git Hooks for Automation
Hooks allow custom scripts to plug into certain Git actions like pre-commit, post-checkout, post-merge, and more. Implementing checks, tests, linting, or notifications around commits can improve team development.
For example, the following hook verifies code style before accepting changes:
.git/hooks/pre-commit:
#!/bin/sh
flake8 --max-line-length=120
if [ $? -ne 0 ]; then
exit 1
fi
With executable permissions, this now runs on every commit!
Hooks reside in .git/hooks of each repo, executed in order. Common use cases:
- Run frontend build processes before push
- Block commits on test failures
- Enforce commit message formats
- Update tickets after merge to main
Hooks feed into continuous integration pipelines covered later.
Now we‘ll explore how Git branches enable concurrent, isolated workstreams.
Branching and Merging
Branching is a cornerstone Git workflow leveraging cheap local operations and history commits as a directed acyclic graph (DAG). This allows a clean separation of concerns via individual workspaces stemming from an initial base.
Listing current branches clarifies the way HEAD ref points:
$ git branch
development
* master
Stars indicate checked-out branches. Branches consume almost no space thanks to tree structures reusing commits.
Feature development trivially starts from master:
git switch -c new-feature # or checkout -b
# Edit files
git add .
git commit -m "Start new widget feature"
This adds new commits onto independently tracked branch. Switch between them with git switch branchname.
When the topic branch completes, changes can merge back to mainline:
git switch master
git merge new-feature
This fast-forwards master to include feature work in a merge commit.
Git handles combining separate changes intelligently – by recursively checking hashes at a file level. Conflicts arise when two branches alter the same file section – Git prompts to manually resolve the discrepancies in those cases.
In more complex flows, rebasing branches on updated mainline code keeps history linear:
# While working on feature
git fetch origin
git rebase origin/master
# Resolve conflicts
# Finished, rebase again
git fetch origin
git rebase origin/master
# Fast-forward merge to origin/master
git checkout master
git merge new-feature
Rebasing essentially reapplies commits on top of the latest code for linearization. This keeps project history coherent and merge friendly.
Git Submodules
Within repos, Git submodules allow embedded links to external repositories:
git submodule add https://github.com/libgit2/libgit2 mylib
This drops the linked project under mylib/ folder and sets up parent repo to track commits. Useful for dependencies!
Submodules enable modular components within a codebase.
Now that we understand standard branching techniques, we‘ll dive into some internals.
Under the Hood: Git Objects and DAG
Fundamentally, Git manages filesystem snapshots and history through a Merkle DAG (directed acyclic graph). This allows storing files & trees through content-addressable identifiers for versioning.
There are four core object types:
- Commits – Pointers to trees representing project state
- Trees – Hierarchical filesystem directory storing blobs and trees
- Blobs – File contents
- Tags – Annotations for commits
A commit object includes metadata like parent commits, author, message and a top-level tree it points to representing contents. Trees track filenames, modes, SHA-1 hashes of blob data contents or further subdirectories.
Packing objects optimizes storage usage for history and file duplication across snapshots. This Merkle structure means only deltas get retained on commits.
Git‘s DAG allows sophisticated branch merging, rollback to any historical version, integrity checking, and distribution between repos. Commands manipulate the object database.
While diving deeper into engine internals is outside our current scope, understanding these concepts builds mental models for expert usage.
Up next – remote repository collaboration.
Remote Repositories, Cloning and Collaboration
So far we‘ve worked locally, but sharing code requires remote repositories. These live on servers accessible over SSH or HTTP protocols. GitHub provides free, unlimited public and private source hosting leveraging Git version control capabilities.
Alternatives like GitLab or BitBucket offer similar repository services – once code exists remotely groups can better coordinate.
Adding Remotes
A barebones repository lacks working tree state. Initialize on a remote host like this:
git init --bare new-project.git
The extension .git further indicates its sole purpose of sharing revision history. Clients now add these servers under remote aliases:
git remote add origin git@remote-host:new-project.git
Verify with git remote -v. Multiple remotes can be tracked!
Cloning Repositories
Retrieving projects via git clone sets origin remote automatically:
git clone https://github.com/libgit2/libgit2 mylib
cd mylib
Cloning pulls entire history and sets upstream branch tracking from origin. This two-way linking means git pull and git push sync local commits bi-directionally.
Fetching and Pull Requests
Updating from remotes utilizes fetch and merge or rebase:
git fetch origin
# In integration branch
git merge origin/master
# Or rebase on top of fetched branches
git rebase origin/master
Pull requests on GitHub let developers openly review code before merging features.
Overall, Git distributed models empower huge, open source projects and enterprises alike thanks to its decentralized nature – most work happens without central servers involved.
Finally, we‘ll integrate Git with GitHub for enhanced collaboration.
Integrations with GitHub
In recent years GitHub has built vastly powerful integrations through GitHub Actions continuous integration pipelines, issue boards, code review workflows, team permissions and more.
These tools telescope Git version control foundations by automating triggers upon events, ensured code quality, streamlined reviews and comprehensive visibility. Open source lives within public GitHub repositories.
Here is a standard CI pipeline from .github/workflows/main.yml automatically running on new code commits:
name: CI
on:
push:
branches: [ "master" ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run tests
run: |
./configure
make
make test
Workflows suited for web backends might setup repositories, databases, deploy onto cloud platforms like AWS and notify channels on failures.
Integrations fuse business logic expanding raw VCS functionality for tremendous innovation velocity and developer happiness!
This leaves engineering teams free focusing efforts on product. GitHub paved the way mainstreaming collaborative methodologies.
Wrapping Up
We‘ve covered a wide range of Git techniques – everything from fundamental commits and branches through to advanced rebasing, hooks, submodules and directed acyclic graph structures powering the tooling. Integrations with services like GitHub propel groups to new heights.
There‘s always more depth in specific commands, arguments or configuration formats to absorb from the entire Git documentation. But this guide hits key theoretical and practical concepts to radically boost Git skills up to an expert level.
Git skills directly translate to business impact – the tools made development resiliency and innovation what it is today. I encourage experimenting with test repos to drive muscle memory through each discussed workflow.
What aspect of our Git journey resonated most? Which integrations seem valuable exploring further? Let me know in the comments!


