Git has firmly established itself as the leading version control system for teams of developers across both open source and enterprise. According to the latest StackOverflow developer survey, over 87% of developers now use Git – far surpassing competitors like SVN, TFS, and Mercurial.

With distributed architecture, decentralized workflows, feature branching, issues tracking, code review and more baked-in, Git changes the dynamics of collaborating on software projects both large and small. Mastering Git as a professional developer now feels almost mandatory given the dominance of the tool.

In this comprehensive advanced guide on Git, I‘ll cover:

  • Installing Git on Ubuntu
  • Configuring user information
  • Initializing repositories
  • Staging changes, diffs and commit workflow
  • Branch management, merging vs rebasing
  • Git hooks, submodules, internals
  • Github integration and collaboration

Including specifics on commands and detailed explanations of underlying Git mechanics – everything needed to leverage Git at an expert level.

Installing the Latest Git Version on Ubuntu

The first step is installing a current, maintained Git version. Ubuntu 20.04 repositories contain Git, but it‘s often behind the latest stable release.

We‘ll install Git from the official PPA, which keeps an up-to-date version packaged:

sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository ppa:git-core/ppa
sudo apt update
sudo apt install git

Verify the installation was successful and check Git version:

git --version
# git version 2.36.1

With Git itself ready, we can configure our identity before starting to use it.

Configuring Identity and Defaults

Git associates commits with your name and email address – this info should be configured as a first step:

git config --global user.name "John Doe"
git config --global user.email johndoe@example.com

Some useful additional customizations include:

git config --global color.ui auto # colored output
git config --global core.editor nano # set text editor 
git config --global merge.tool vimdiff # set merge tool

The global configuration is located at ~/.gitconfig. Open this file directly in an editor to make changes.

Listing current configuration via git config --list --show-origin prints all settings along with where they are defined.

With identity set up, we can initialize repositories and understand the core Git workflow.

Git Repositories, Add/Commit/Push Cycle

To start version controlling a project, navigate into the directory and initialize a new Git repository:

cd myproject
git init

This creates a .git subdirectory containing all required metadata objects for the repo.

After making changes to files, the typical lifecycle for committing work is:

  1. Check status with git status
  2. Stage files with git add <files>
  3. Verify diff with git diff
  4. Commit changes with git commit -m "message"
  5. Push commits to remote repo

Repeating these steps ensures changes are regularly captured via descriptive commits. I‘ll expand on some common commands for managing staged and unstaged changes.

Viewing Repository Status

The git status command provides overview of current state – changes yet to commit, files ignored, branch tracking status, and more:

$ git status
On branch master
Your branch is up-to-date with ‘origin/master‘.

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    new file:   README

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   benchmarks.rb

Status enables easy understanding of modified and staged contents in the working tree.

Changes Staging

To track a file in Git, changes must be successfully "staged" before committing via git add:

git add README.md # stages a single file
git add docs/ # stages all docs subdirectory changes
git add . # stages all file changes in tree

Only changes added to the stage are captured on next commit.

Committing Updates

To commit current changes from index state, the commit command permanently stores data with a message:

git commit -m "Implement new analytics algorithm"

Commits increment repository version history – staging, committing, and pushing to remote servers serves as a developer‘s commit log.

Speaking more technically, this finalizes the Git "Directed Acyclic Graph" (DAG) by anchoring an object vertex with parent relationships and file content fingerprints. More on internals later.

I‘ll now overview a powerful Git feature – customizable hooks.

Git Hooks for Automation

Hooks allow custom scripts to plug into certain Git actions like pre-commit, post-checkout, post-merge, and more. Implementing checks, tests, linting, or notifications around commits can improve team development.

For example, the following hook verifies code style before accepting changes:

.git/hooks/pre-commit:

#!/bin/sh
flake8 --max-line-length=120
if [ $? -ne 0 ]; then 
    exit 1
fi 

With executable permissions, this now runs on every commit!

Hooks reside in .git/hooks of each repo, executed in order. Common use cases:

  • Run frontend build processes before push
  • Block commits on test failures
  • Enforce commit message formats
  • Update tickets after merge to main

Hooks feed into continuous integration pipelines covered later.

Now we‘ll explore how Git branches enable concurrent, isolated workstreams.

Branching and Merging

Branching is a cornerstone Git workflow leveraging cheap local operations and history commits as a directed acyclic graph (DAG). This allows a clean separation of concerns via individual workspaces stemming from an initial base.

Listing current branches clarifies the way HEAD ref points:

$ git branch
  development
* master

Stars indicate checked-out branches. Branches consume almost no space thanks to tree structures reusing commits.

Feature development trivially starts from master:

git switch -c new-feature # or checkout -b 
# Edit files
git add .
git commit -m "Start new widget feature"

This adds new commits onto independently tracked branch. Switch between them with git switch branchname.

When the topic branch completes, changes can merge back to mainline:

git switch master
git merge new-feature

This fast-forwards master to include feature work in a merge commit.

Git handles combining separate changes intelligently – by recursively checking hashes at a file level. Conflicts arise when two branches alter the same file section – Git prompts to manually resolve the discrepancies in those cases.

In more complex flows, rebasing branches on updated mainline code keeps history linear:

# While working on feature
git fetch origin
git rebase origin/master

# Resolve conflicts 

# Finished, rebase again    
git fetch origin  
git rebase origin/master

# Fast-forward merge to origin/master
git checkout master
git merge new-feature

Rebasing essentially reapplies commits on top of the latest code for linearization. This keeps project history coherent and merge friendly.

Git Submodules

Within repos, Git submodules allow embedded links to external repositories:

git submodule add https://github.com/libgit2/libgit2 mylib

This drops the linked project under mylib/ folder and sets up parent repo to track commits. Useful for dependencies!

Submodules enable modular components within a codebase.

Now that we understand standard branching techniques, we‘ll dive into some internals.

Under the Hood: Git Objects and DAG

Fundamentally, Git manages filesystem snapshots and history through a Merkle DAG (directed acyclic graph). This allows storing files & trees through content-addressable identifiers for versioning.

There are four core object types:

  • Commits – Pointers to trees representing project state
  • Trees – Hierarchical filesystem directory storing blobs and trees
  • Blobs – File contents
  • Tags – Annotations for commits

A commit object includes metadata like parent commits, author, message and a top-level tree it points to representing contents. Trees track filenames, modes, SHA-1 hashes of blob data contents or further subdirectories.

Packing objects optimizes storage usage for history and file duplication across snapshots. This Merkle structure means only deltas get retained on commits.

Git‘s DAG allows sophisticated branch merging, rollback to any historical version, integrity checking, and distribution between repos. Commands manipulate the object database.

While diving deeper into engine internals is outside our current scope, understanding these concepts builds mental models for expert usage.

Up next – remote repository collaboration.

Remote Repositories, Cloning and Collaboration

So far we‘ve worked locally, but sharing code requires remote repositories. These live on servers accessible over SSH or HTTP protocols. GitHub provides free, unlimited public and private source hosting leveraging Git version control capabilities.

Alternatives like GitLab or BitBucket offer similar repository services – once code exists remotely groups can better coordinate.

Adding Remotes

A barebones repository lacks working tree state. Initialize on a remote host like this:

git init --bare new-project.git

The extension .git further indicates its sole purpose of sharing revision history. Clients now add these servers under remote aliases:

git remote add origin git@remote-host:new-project.git

Verify with git remote -v. Multiple remotes can be tracked!

Cloning Repositories

Retrieving projects via git clone sets origin remote automatically:

git clone https://github.com/libgit2/libgit2 mylib 
cd mylib

Cloning pulls entire history and sets upstream branch tracking from origin. This two-way linking means git pull and git push sync local commits bi-directionally.

Fetching and Pull Requests

Updating from remotes utilizes fetch and merge or rebase:

git fetch origin 

# In integration branch
git merge origin/master

# Or rebase on top of fetched branches
git rebase origin/master 

Pull requests on GitHub let developers openly review code before merging features.

Overall, Git distributed models empower huge, open source projects and enterprises alike thanks to its decentralized nature – most work happens without central servers involved.

Finally, we‘ll integrate Git with GitHub for enhanced collaboration.

Integrations with GitHub

In recent years GitHub has built vastly powerful integrations through GitHub Actions continuous integration pipelines, issue boards, code review workflows, team permissions and more.

These tools telescope Git version control foundations by automating triggers upon events, ensured code quality, streamlined reviews and comprehensive visibility. Open source lives within public GitHub repositories.

Here is a standard CI pipeline from .github/workflows/main.yml automatically running on new code commits:

name: CI

on: 
  push:
    branches: [ "master" ]

jobs:

  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3
    - name: Run tests
      run: |
        ./configure
        make 
        make test

Workflows suited for web backends might setup repositories, databases, deploy onto cloud platforms like AWS and notify channels on failures.

Integrations fuse business logic expanding raw VCS functionality for tremendous innovation velocity and developer happiness!

This leaves engineering teams free focusing efforts on product. GitHub paved the way mainstreaming collaborative methodologies.

Wrapping Up

We‘ve covered a wide range of Git techniques – everything from fundamental commits and branches through to advanced rebasing, hooks, submodules and directed acyclic graph structures powering the tooling. Integrations with services like GitHub propel groups to new heights.

There‘s always more depth in specific commands, arguments or configuration formats to absorb from the entire Git documentation. But this guide hits key theoretical and practical concepts to radically boost Git skills up to an expert level.

Git skills directly translate to business impact – the tools made development resiliency and innovation what it is today. I encourage experimenting with test repos to drive muscle memory through each discussed workflow.

What aspect of our Git journey resonated most? Which integrations seem valuable exploring further? Let me know in the comments!

Similar Posts