Git is unsuited for applications

June 7, 2022

[!NOTE] > 2025 Update

 

I wrote this in June 2022, three months after "Does a git-based architecture make sense?". This was originally an investor email. Publishing for historical reasons.

 

Like the first post, this frames it as "files vs databases", but that's not the right angle. A database can be a file (SQLite). The real insight is interoperability: files can be accessed without friction or gatekeepers.

1. The two states problem

Current localization solutions store translations in databases. That is problematic because they have to maintain and synchronize two sources of truth with fundamentally different data structures (files vs a table/SQL). On top of that, the developer's source of truth (the source code) heavily makes use of git workflows like branching. The entailed problem is that the database schema has to at least support version history and branching just to synchronize the state between source code and a database. Those two features are not trivial to implement and require tremendous engineering effort.

Smartling, a localization platform, does not support branches at all. Lokalise, another localization platform, required 2 1/2 years of engineering effort to support branches. Despite all the effort, these platforms can't synchronize the state without relying on workarounds and heuristics which eventually fall apart.

We ran into the maintain two states problem as well. Cycling back, the obvious question arises of why a git repository is not used as the single source of truth instead of trying to synchronize states. We would get:

  1. A single source of truth.
  2. No synchronization issues.
  3. Hook into existing workflows and CI/CD pipelines.
  4. All features of git like version history, branching, and an awesome review system "for free".

The astonishing realization was that it works and makes a ton of sense. Besides avoiding massive engineering efforts to synchronize two states and data structures, the approach substantially lowers the friction to adopt the solution. There is no setup required. Give inlang a link to a git repo (data) and be done. Furthermore, this approach hooks directly into existing workflows instead of forcing an organization to change their workflows.

2. Git's limitations

A git-based architecture is the right one to localize software. It avoids friction and synchronization issues, and required features like branching, version history, and a review system are essentially "free". There is one problem though: We are using a bunch of workarounds to get an application running on top of git, in the browser. Problems that are already emerging:

2.1. No lazy-loading

We currently have to clone the whole repository just to edit translation files. That is problematic for big repositories. The repository for posthog.com for example is ~680MB in size. Even though we only need translation files which would be at max 1MB in size, we have to clone the whole repository. That is also one of the reasons why git is not used at Facebook, Google & Co which have repository sizes in the gigabytes.

2.2. No real-time collaboration

Our platform (translations management) needs real-time collaboration. That's the current wave every new piece of software rides on. Databases give real-time collaboration out of the box. Git however is built for pure async collaboration. That is fine for thousands of volunteering developers that contribute to the Linux kernel, but not ideal for an organization/team collaborating. Software like Figma, Notion and Google Docs strives because of real-time collaboration. We have to build a real-time layer on top of git. We are not the only ones. The next wave of IDEs is prepping up for git + real-time collaboration, see JetBrains Fleet.

2.3. CLI-first, not API-first

Git was built as a CLI, not as an API to build applications on top of. The following is the output of git status via isomorphic-git. A combination of numbers in an array indicates the status of a file instead of a query API returning "unstaged" etc.

// example StatusMatrix
[
  ["a.txt", 0, 2, 0], // new, untracked
  ["b.txt", 0, 2, 2], // added, staged
  ["c.txt", 0, 2, 3], // added, staged, with unstaged changes
  ["d.txt", 1, 1, 1], // unmodified
  ["e.txt", 1, 2, 1], // modified, unstaged
  ["f.txt", 1, 2, 2], // modified, staged
  ["g.txt", 1, 2, 3], // modified, staged, with unstaged changes
  ["h.txt", 1, 0, 1], // deleted, unstaged
  ["i.txt", 1, 0, 0], // deleted, staged
];

We have to overcome the issues above, and other issues that will arise. That will inevitably lead to a git that is suited to build applications on top of, namely our localization platform.

3. The next git

We have to overcome these limitations. That will inevitably lead to building something new—a git that is suited for applications. Files in combination with collaboration infrastructure, providing a user interface on top.

Get notified about new blog posts