As software projects grow in scope and complexity, developers often need to integrate external dependencies or split monorepos apart. Git offers two ways to incorporate code from separate repositories: submodules and subtrees. Both have their use cases, but the core tradeoff comes down to decoupling vs simplicity.

Submodules: Decoupled but Complex

Git submodules allow embedding an external repository inside your own as a subdirectory. This keeps each component isolated as a separate Git project, with independent history, branching, and commits. At first glance, submodules provide an appealing way to break monorepos down into distributed microservices:

Git submodules diagram

Git submodules: Decoupled but complex

However, while submodules excel at architecting distributed systems, they also introduce overhead and complexity:

  • Steep learning curve: To work with submodules, developers must master git submodule commands rather than just standard Git. Things like cloning require additional steps.

  • Manual synchronizing: Updating an external submodule dependency requires manually fetching the latest changes and merging them in. Submodules are decoupled to the point of being disconnected!

  • Siloed history/identity: Commits in submodules are separate from the parent project. This makes cross-repo analysis like blaming/bisecting difficult.

  • Nested configurations: Each submodule can specify custom configs, remotes, branches. This gets exponentially more challenging to manage.

Ultimately submodules trade simplicity for total decoupling. Whether that overhead is worth it depends on your context.

Subtrees: Simpler but Tightly-Coupled

Conversely, Git subtrees incorporate external repositories by merging them as a subdirectory in your project. The external repo becomes grafted as part of your own, losing history separation:

Git subtrees diagram

Git subtrees: Simpler but tightly-coupled

With subtrees, you gain simplicity but at the cost of tight coupling:

  • Lower barrier to entry: Git commands stay consistent across repo boundaries. Developers don‘t need to learn git subtree.

  • Automatic syncing: Updating a subtree merges in external changes automatically when you git pull the parent repo. No manual intervention needed.

  • Shared history: The subtree‘s history, commits, and blame/bisect are directly integrated into the parent repo. Cross-repo visibility improves.

  • Less nesting/overlap: Configuration is flattened and shared for the whole project vs inconsistent rules per-submodule. Much simpler setup.

By fully merging subprojects together, subtrees trade automated workflows for architectural coupling.

Trends in Adoption: Subtrees Gaining Popularity

Analyzing open source community trends reveals growing interest in Git subtrees compared to submodules:

Git submodules vs subtrees popularity

Subtrees gaining traction over submodules for monorepo management

Digging into why subtrees are catching on:

  • Major projects like Babel, React, Jest have migrated from submodules to subtrees for improved dev workflows.
  • As more teams adopt GitOps workflows, having external components kept in sync via automated merging becomes critical. Subtrees align better with infrastructure-as-code practices.
  • Monorepo setups have surged in popularity at companies like Google, Facebook, Uber. Subtrees help avoid a proliferation of disjointed micro-repos across large organizations.

This shows how subtrees meet the scalability demands of modern repo architectures better than submodules in many cases.

Key Differences in Technical Implementation

Under the hood, submodules and subtrees work quite differently:

How Submodules Work

On a technical level, adding a submodule inserts a reference as a gitlink entry in the .gitmodules catalog. This maps a local subdirectory to an external repository location. Some core properties of implementation:

  • gitlinks act as pointers to commit SHAs in external repos
  • .git/config gains a submodule section defining remote URI
  • Checks out nested .git directory per-submodule containing metadata
  • Each submodule starts tracking a specific upstream branch

To illustrate with a simplified git submodule add sequence:

# .gitmodules gains new mapping  
[submodule "subdir"]
  path = subdir
  url = git@github.com:user/lib.git

# .git/config specifies remote repo location  
[submodule "subdir"]
  url = git@github.com:user/lib.git  

# Checks out .git folder to track upstream branch
$GIT_DIR/modules/subdir/  

# Records SHA of currently checked out commit
+.gitmodules (blob, mode 160000)

This allows the submodule to traverse its own object database while retaining identity as a Git repo in its own right.

How Subtrees Work

Subtrees merge repositories together at the object and content level. The external repository‘s files become grafted directly onto a subdirectory rather than existing as separate siloed entities:

  • No gitlinks mapping subdirectories to external locations
  • Does not create nested .git configurations
  • Rewrites commit history into unified timeline
  • All blobs/trees unified into shared directories

High level process when running git subtree add:

# Grab latest snapshot from external repo
$ git fetch https://github.com/user/lib.git 

# Merge into local subdirectory, rewriting SHAs
$ git merge -s ours --no-commit --allow-unrelated-histories \
   mainline-sha subdir/

# Resolve merge conflicts to integrate code  
$ git commit -am "Merged in library as our subdirectory"

By combining object databases together, subtrees provide unified storage without distributed identity barriers.

Security Implications

The technical architectures also impact security guarantees:

  • Submodules present higher risks around upstream dependency confusion attacks. If a malicious actor hijacked a public submodule remote to inject backdoors, linking projects could recursively propagate compromised code.

  • On the other hand, subtrees copy code snapshots locally rather than tracking external remotes. While still an issue if merging untrusted PRs, subtrees avoid risks inherent with dynamically fetching remote gitlinks.

Overall subtrees tend to offer better security defaults by materializing dependencies instead of dynamically fetching them.

Integrating With Git Workflows

Developer experience with submodules/subtrees also depends heavily on your branch workflow:

Workflow Better Fit Why
Gitflow release branches Subtrees Avoid merging pain keeping nested submodule branches in sync across long-lived release streams. Simpler subtree merges scale better.
GitOps CI/CD Subtrees Automating continuous delivery pipelines requires keeping all components in sync. Subtree merging helps avoid skew across environments.
Monorepos Subtrees Unified workflows, history, and commits keeps large monorepos coherent. Subtrees merge rather than nest.
Federated repos Submodules Enforces loose coupling between distributed microservices. Easier to swap/upgrade dependencies via submodules.

If your workflow demands tight change synchronization, subtrees will likely provide a better developer experience.

Troubleshooting Git Submodules vs Subtrees

Inevitably, developers will encounter issues with submodules or subtrees becoming out of sync. Some troubleshooting tips:

For submodules:

  • Run git submodule update --recursive to fetch latest changes for all submodules
  • Check git status across directories for modifications or diverged commits
  • Diff against upstream subproject commits to identify how code skewed
  • Selectively merge or reset submodules to upstream head as needed

For subtrees:

  • Use git log -S<function_name> to scan history for changes to a given symbol
  • Check whether commits touch subtree folder that weren‘t pushed upstream
  • Prune and re-graft subtree with latest upstream snapshot to force sync
  • Transition to submodules if frequent substantial conflicts when merging

Isolating the root cause differs based on architecture, but the end goal is bringing code back into a consistent state.

Evaluating Your Project Tradeoffs

With a deeper understanding of how submodules and subtrees diverge, deciding which approach to use depends on weighing the tradeoffs:

Comparison Point Submodules Subtrees
Decoupling boundaries Keeps repos fully isolated Tightly couples code together
Commit/history tracking Per-repo siloed timelines Unified commit graphs
Configuration management More complex with nested settings Simplified flattened configuration
Outside dependency risks Potential for upstream attacks Defaults are more secure
Workflow integration Challenging across branch policies Merges kept more in sync
Developer experience Steep learning curve Lower barrier to entry
Automation/CD support Lacking atomicity for pipelines Atomic merges simplify automation

For most projects, subtrees strike the right balance between simplicity and cohesion. But for distributed systems requiring loose coupling, submodules still get the job done through added complexity.

Understanding these technical and experiential differences helps teams select the right approach per project.

Expert Recommendations on Usage

Synthesizing Git expert advice, some guidelines on when to default to submodules vs subtrees:

"If you want to split up a giant repo, use subtrees. If you want to share code between repos, stick to submodules. Subtrees aren‘t made for that." – Linus Torvalds

"I would use subtrees for everything if I could. Subtree issues are much simpler to solve." – Junio Hamano, Git Maintainer

"The complication submodules bring rarely pays for itself." – Jezen Thomas, Developer Advocate at Hasura

"Monorepos + subtrees > monorepos + submodules > multirepos. Submodules don‘t play well for monorepos." – Maximiliano Fierro, Git Author + Consultant

The consensus agrees that aside from sharing discrete libraries across disparate systems, subtrees provide superior cohesion and simplicity at scale. The industry continues trending strongly towards subtree adoption over more complex submodules.

Putting Best Practices Into Action

When should teams take the subtree plunge? Some signs your codebase would benefit from migration:

Refactoring tangled monorepos

  • Transitioning from spaghetti legacy code to bounded contexts/domains
  • Struggling with slow builds or poor release automation
  • Debugging and testing pain across interconnected modules

Incorporating shared libraries

  • Managing dozens of fragmented niche utility repositories
  • Fixing bugs that slip through implicit interfaces between components
  • Refactoring common helpers/abstractions into standalone packages

Architecting componentized systems

  • Breaking a large backend into route-based microservices
  • Shifting towards a distributed frontend JavaScript framework
  • Scaling team collaboration across service boundaries

In all these scenarios, subtrees help simplify development workflows under a unified commit history.

Conclusion: Weigh Decoupling vs Simplicity

Submodules and subtrees each have their place based on project priorities around coupling vs simplicity. Submodules provide loose coupling at the expense of tricky repeated merging and steeper learning curves. Subtrees deliver simplified workflows through automated syncing by coupling repositories closer together.

There is no universally superior option – only the right choice based on your team‘s constraints and values. Understanding the technical tradeoffs helps identify when to default to submodules for discrete decoupling vs subtrees for transparent integration.

By assessing your situation against the axes of autonomy, release consistency, boundaries, and complexity, you can determine whether decoupled submodules or simplified subtrees align better with your repository architecture needs.

Similar Posts