Decoupling vs Simplicity: An In-Depth Comparison of Git Submodules vs Subtrees

As software projects grow in scope and complexity, developers often need to integrate external dependencies or split monorepos apart. Git offers two ways to incorporate code from separate repositories: submodules and subtrees. Both have their use cases, but the core tradeoff comes down to decoupling vs simplicity.

Submodules: Decoupled but Complex

Git submodules allow embedding an external repository inside your own as a subdirectory. This keeps each component isolated as a separate Git project, with independent history, branching, and commits. At first glance, submodules provide an appealing way to break monorepos down into distributed microservices:

Git submodules diagram

Git submodules: Decoupled but complex

However, while submodules excel at architecting distributed systems, they also introduce overhead and complexity:

Steep learning curve: To work with submodules, developers must master git submodule commands rather than just standard Git. Things like cloning require additional steps.
Manual synchronizing: Updating an external submodule dependency requires manually fetching the latest changes and merging them in. Submodules are decoupled to the point of being disconnected!
Siloed history/identity: Commits in submodules are separate from the parent project. This makes cross-repo analysis like blaming/bisecting difficult.
Nested configurations: Each submodule can specify custom configs, remotes, branches. This gets exponentially more challenging to manage.

Ultimately submodules trade simplicity for total decoupling. Whether that overhead is worth it depends on your context.

Subtrees: Simpler but Tightly-Coupled

Conversely, Git subtrees incorporate external repositories by merging them as a subdirectory in your project. The external repo becomes grafted as part of your own, losing history separation:

Git subtrees diagram

Git subtrees: Simpler but tightly-coupled

With subtrees, you gain simplicity but at the cost of tight coupling:

Lower barrier to entry: Git commands stay consistent across repo boundaries. Developers don‘t need to learn git subtree.
Automatic syncing: Updating a subtree merges in external changes automatically when you git pull the parent repo. No manual intervention needed.
Shared history: The subtree‘s history, commits, and blame/bisect are directly integrated into the parent repo. Cross-repo visibility improves.
Less nesting/overlap: Configuration is flattened and shared for the whole project vs inconsistent rules per-submodule. Much simpler setup.

By fully merging subprojects together, subtrees trade automated workflows for architectural coupling.

Trends in Adoption: Subtrees Gaining Popularity

Analyzing open source community trends reveals growing interest in Git subtrees compared to submodules:

Git submodules vs subtrees popularity

Subtrees gaining traction over submodules for monorepo management

Digging into why subtrees are catching on:

Major projects like Babel, React, Jest have migrated from submodules to subtrees for improved dev workflows.
As more teams adopt GitOps workflows, having external components kept in sync via automated merging becomes critical. Subtrees align better with infrastructure-as-code practices.
Monorepo setups have surged in popularity at companies like Google, Facebook, Uber. Subtrees help avoid a proliferation of disjointed micro-repos across large organizations.

This shows how subtrees meet the scalability demands of modern repo architectures better than submodules in many cases.

Key Differences in Technical Implementation

Under the hood, submodules and subtrees work quite differently:

How Submodules Work

On a technical level, adding a submodule inserts a reference as a gitlink entry in the .gitmodules catalog. This maps a local subdirectory to an external repository location. Some core properties of implementation:

gitlinks act as pointers to commit SHAs in external repos
.git/config gains a submodule section defining remote URI
Checks out nested .git directory per-submodule containing metadata
Each submodule starts tracking a specific upstream branch

To illustrate with a simplified git submodule add sequence:

# .gitmodules gains new mapping  
[submodule "subdir"]
  path = subdir
  url = git@github.com:user/lib.git

# .git/config specifies remote repo location  
[submodule "subdir"]
  url = git@github.com:user/lib.git  

# Checks out .git folder to track upstream branch
$GIT_DIR/modules/subdir/  

# Records SHA of currently checked out commit
+.gitmodules (blob, mode 160000)

This allows the submodule to traverse its own object database while retaining identity as a Git repo in its own right.

How Subtrees Work

Subtrees merge repositories together at the object and content level. The external repository‘s files become grafted directly onto a subdirectory rather than existing as separate siloed entities:

No gitlinks mapping subdirectories to external locations
Does not create nested .git configurations
Rewrites commit history into unified timeline
All blobs/trees unified into shared directories

High level process when running git subtree add:

# Grab latest snapshot from external repo
$ git fetch https://github.com/user/lib.git 

# Merge into local subdirectory, rewriting SHAs
$ git merge -s ours --no-commit --allow-unrelated-histories \
   mainline-sha subdir/

# Resolve merge conflicts to integrate code  
$ git commit -am "Merged in library as our subdirectory"

By combining object databases together, subtrees provide unified storage without distributed identity barriers.

Security Implications

The technical architectures also impact security guarantees:

Submodules present higher risks around upstream dependency confusion attacks. If a malicious actor hijacked a public submodule remote to inject backdoors, linking projects could recursively propagate compromised code.
On the other hand, subtrees copy code snapshots locally rather than tracking external remotes. While still an issue if merging untrusted PRs, subtrees avoid risks inherent with dynamically fetching remote gitlinks.

Overall subtrees tend to offer better security defaults by materializing dependencies instead of dynamically fetching them.

Integrating With Git Workflows

Developer experience with submodules/subtrees also depends heavily on your branch workflow:

Workflow	Better Fit	Why
Gitflow release branches	Subtrees	Avoid merging pain keeping nested submodule branches in sync across long-lived release streams. Simpler subtree merges scale better.
GitOps CI/CD	Subtrees	Automating continuous delivery pipelines requires keeping all components in sync. Subtree merging helps avoid skew across environments.
Monorepos	Subtrees	Unified workflows, history, and commits keeps large monorepos coherent. Subtrees merge rather than nest.
Federated repos	Submodules	Enforces loose coupling between distributed microservices. Easier to swap/upgrade dependencies via submodules.

If your workflow demands tight change synchronization, subtrees will likely provide a better developer experience.

Troubleshooting Git Submodules vs Subtrees

Inevitably, developers will encounter issues with submodules or subtrees becoming out of sync. Some troubleshooting tips:

For submodules:

Run git submodule update --recursive to fetch latest changes for all submodules
Check git status across directories for modifications or diverged commits
Diff against upstream subproject commits to identify how code skewed
Selectively merge or reset submodules to upstream head as needed

For subtrees:

Use git log -S<function_name> to scan history for changes to a given symbol
Check whether commits touch subtree folder that weren‘t pushed upstream
Prune and re-graft subtree with latest upstream snapshot to force sync
Transition to submodules if frequent substantial conflicts when merging

Isolating the root cause differs based on architecture, but the end goal is bringing code back into a consistent state.

Evaluating Your Project Tradeoffs

With a deeper understanding of how submodules and subtrees diverge, deciding which approach to use depends on weighing the tradeoffs:

Comparison Point	Submodules	Subtrees
Decoupling boundaries	Keeps repos fully isolated	Tightly couples code together
Commit/history tracking	Per-repo siloed timelines	Unified commit graphs
Configuration management	More complex with nested settings	Simplified flattened configuration
Outside dependency risks	Potential for upstream attacks	Defaults are more secure
Workflow integration	Challenging across branch policies	Merges kept more in sync
Developer experience	Steep learning curve	Lower barrier to entry
Automation/CD support	Lacking atomicity for pipelines	Atomic merges simplify automation

For most projects, subtrees strike the right balance between simplicity and cohesion. But for distributed systems requiring loose coupling, submodules still get the job done through added complexity.

Understanding these technical and experiential differences helps teams select the right approach per project.

Expert Recommendations on Usage

Synthesizing Git expert advice, some guidelines on when to default to submodules vs subtrees:

"If you want to split up a giant repo, use subtrees. If you want to share code between repos, stick to submodules. Subtrees aren‘t made for that." – Linus Torvalds

"I would use subtrees for everything if I could. Subtree issues are much simpler to solve." – Junio Hamano, Git Maintainer

"The complication submodules bring rarely pays for itself." – Jezen Thomas, Developer Advocate at Hasura

"Monorepos + subtrees > monorepos + submodules > multirepos. Submodules don‘t play well for monorepos." – Maximiliano Fierro, Git Author + Consultant

The consensus agrees that aside from sharing discrete libraries across disparate systems, subtrees provide superior cohesion and simplicity at scale. The industry continues trending strongly towards subtree adoption over more complex submodules.

Putting Best Practices Into Action

When should teams take the subtree plunge? Some signs your codebase would benefit from migration:

Refactoring tangled monorepos

Transitioning from spaghetti legacy code to bounded contexts/domains
Struggling with slow builds or poor release automation
Debugging and testing pain across interconnected modules

Incorporating shared libraries

Managing dozens of fragmented niche utility repositories
Fixing bugs that slip through implicit interfaces between components
Refactoring common helpers/abstractions into standalone packages

Architecting componentized systems

Breaking a large backend into route-based microservices
Shifting towards a distributed frontend JavaScript framework
Scaling team collaboration across service boundaries

In all these scenarios, subtrees help simplify development workflows under a unified commit history.

Conclusion: Weigh Decoupling vs Simplicity

Submodules and subtrees each have their place based on project priorities around coupling vs simplicity. Submodules provide loose coupling at the expense of tricky repeated merging and steeper learning curves. Subtrees deliver simplified workflows through automated syncing by coupling repositories closer together.

There is no universally superior option – only the right choice based on your team‘s constraints and values. Understanding the technical tradeoffs helps identify when to default to submodules for discrete decoupling vs subtrees for transparent integration.

By assessing your situation against the axes of autonomy, release consistency, boundaries, and complexity, you can determine whether decoupled submodules or simplified subtrees align better with your repository architecture needs.

Decoupling vs Simplicity: An In-Depth Comparison of Git Submodules vs Subtrees

Submodules: Decoupled but Complex

Subtrees: Simpler but Tightly-Coupled

Trends in Adoption: Subtrees Gaining Popularity

Key Differences in Technical Implementation

How Submodules Work

How Subtrees Work

Security Implications

Integrating With Git Workflows

Troubleshooting Git Submodules vs Subtrees

Evaluating Your Project Tradeoffs

Expert Recommendations on Usage

Putting Best Practices Into Action

Conclusion: Weigh Decoupling vs Simplicity

Demystifying the Notorious Python AttributeError

Mastering Console.WriteLine in C#: An Expert‘s Guide for Developers

How to Run PowerShell Script From Command Line

Vi Delete Empty Unwanted Lines: A Full-Stack Developer‘s 2600+ Word Guide

Implementing Constructor-Like Behavior in Go

How to Install Viber on Linux

Linuxhaxor.net – About Open Source & Linux

Submodules: Decoupled but Complex

Subtrees: Simpler but Tightly-Coupled

Trends in Adoption: Subtrees Gaining Popularity

Key Differences in Technical Implementation

How Submodules Work

How Subtrees Work

Security Implications

Integrating With Git Workflows

Troubleshooting Git Submodules vs Subtrees

Evaluating Your Project Tradeoffs

Expert Recommendations on Usage

Putting Best Practices Into Action

Conclusion: Weigh Decoupling vs Simplicity

Related posts:

Similar Posts

Linuxhaxor.net – About Open Source & Linux