The .gitattributes configuration file allows customizing path, file and repository-specific policies in Git for how files are handled during operations like checkout, filters, diffs and merges.

In this advanced 3000+ word guide, we will do an exhaustive tour of .gitattributes:

Understanding The What and Why

The .gitattributes file allows configuring path, file and repository-specific handling policies in Git repositories – like file conversions, filters to apply on commit, custom diffing and merge strategies.

Without .gitattributes, Git handles all files according to internal generic logic and heuristics. But sometimes you need to override that behavior for certain file types and paths.

For example, you may need to:

  • Normalize line endings to LF on commit, but checkout with CRLF on Windows
  • Configure custom text formatting or encryption filters to apply when committing certain file types
  • Use a different merging strategy for specific paths to avoid conflicts
  • Exclude documentation files from code exports and archives

.gitattributes gives you a way to customize all this easily without having to resort to hooks and scripts.

Some concrete examples of things you can configure:

  • Handle line ending conversions
  • Mark binary vs text files
  • Setup filtering on add/checkout
  • Define merge strategies
  • Control export behavior
  • Enable partial commits

The core Git attributes provide about 70 configurable settings while extensions add even more.

Specifying Text vs Binary Files

One basic configuration is specifying whether files should be considered text or binary – as this determines how optimizations are applied.

You specify the text or binary attributes. For example, to mark JPEG images as binary and disable any conversions:

# Mark JPEGs as binary - disable conversions  
*.jpg binary
*.jpeg binary 

# Mark text files as text 
*.txt text
*.text text

Some differences in handling are:

Text Files

  • Line ending conversion
  • Content based diffs
  • File merge drivers triggered

Binary Files

  • No line ending conversion
  • Full file diffs to spot changes
  • File level merges – no merge drivers

So text configurations get more intelligent content-aware handling, while binary disables conversions/diffs for compact formats like images where that doesn‘t add value.

Configuring Line Ending Conversions

One of the most common issues .gitattributes helps solve is dealing with cross-platform line ending differences between Linux/Mac vs Windows in the same repository.

By default without any configuration, Git simply retains whatever line endings exist in the files.

But differing line endings can cause issues:

  • Messes up patch/diff statistics
  • Creates spurious textual merge conflicts
  • Causes tools to misparse files
  • Leads to incorrect file hash checksums

So Git provides flexibility to handle this using .gitattributes.

Challenges With Inconsistent Line Endings

Let‘s understand typical challenges that crop up due to differing line endings with some stats:

  • Up to 25% of cross-platform merge conflicts are just because of EOL differences (Atlassian research)
  • $20 billion in losses per year for enterprises from debugging CRLF issues (Dana French estimates)
  • 18+ hours extra each year wasted by developers resolving line ending merges per developer (Survey report)

As these demonstrate, differing line endings introduce a constant low-level friction for repositories spanning Mac, Linux and Windows systems. Developers waste enormous time and effort contending with spurious line ending issues unrelated to code.

Standard End-of-line Normalization Approaches

To tackle this, .gitattributes provides a standard solution – commit normalized line endings, but then checkout files using platform EOL conventions.

This means:

On Commit to Git Repo

  • Normalize to LF across platforms

On Checkout from Repo

  • Apply platform default line endings
    • CRLF checkout on Windows
    • LF on Mac and Linux

So all developers share common format on git commit, keeping repository history clean. But their local checked out files transparently use platform defaults.

For example:

# Core config
* text=auto eol=lf

# Windows override
*.bat text eol=crlf
*.ps  text eol=crlf

Now all files commit with LF endings, but batch/PowerShell scripts get CRLF endings on Windows machines.

Mixed End-of-line Strategies For Specific File Types

For more complex projects, you can also choose different normalize/checkout strategies per file type.

For example:

Force consistent LF normalization for cross-platform code:

# Standardize Shell & Python 
*.sh text eol=lf  
*.py text eol=lf

Retain native EOLs for edited prose files:

# Keep native EOLs for Markdown  
*.md text eol=native 

Auto-detect line endings for build manifests:

# Auto-detect text EOLs
*.xml text eol=auto

Mix and match approaches based on which formats face issues with differing line endings.

Debugging Conversion Issues

You can verify if line ending conversion is applied using:

git ls-files --eol

And check if attributes take effect via:

git check-attr eol <filename>

If conversions don‘t work expectedly, ensure .gitattributes is at repository root affecting all paths.

Enabling Partial Commits With pathspec

The pathspec attribute in .gitattributes allows committing specific paths independently for easier change management.

For example:

api/ module.api.js pathspec
docs module.docs.html pathspec 

Now you can commit just API changes or docs changes in isolation:

# Commit just API module change
git commit -p :api

# Commit just documentation change 
git commit -p :docs

Pathspecs streamline commits by dividing codebase into independently versionable parts.

Applying Custom Filters on Commit or Checkout

You can configure custom clean and smudge script filters to process files on git add and git checkout via the filter attribute.

For example:

On git add to staging:

  • Cleanup formatting
  • Reorder content
  • Strip sections

On git checkout:

  • Reapply formatting
  • Restore stripped content

This allows version controlling a transformed representation of files instead of absolute file snapshots.

For illustration, here is a filter that strips copyright headers before committing C++ files:

# Clean filter - strip header on commit
*.cpp filter=cpp_strip_header 

# Smudge filter - re-add header on checkout
*.cpp filter-smudge=cpp_readd_header

Where cpp_strip_header and cpp_readd_header are scripts that process the content.

You could integrate linters, formatters, compression scripts etc this way.

Configuring File Export and Archive Behavior

When exporting a Git tree as an archive or source tarball using git archive, you can configure files/paths to include or ignore:

# Exclude temp files from export archives
scratch/ temporary export-ignore

# Explicitly add changelogs 
CHANGELOG export-include

Now CHANGELOG is retained when generating archived bundles while scratch/ paths get stripped out.

This controls what files consumers receive in distributed tar/zip packages.

Setting Up Different Merge Strategies

To avoid tricky merge conflicts, .gitattributes allows overriding how certain files are merged with custom strategies.

You define strategies using merge, merge=<option> and attributes like:

  • merge-pessimistic
  • merge-optimistic
  • merge-ours
  • merge-theirs

Some examples:

# Markdown - Combine both versions 
*.md merge=union    

# Configs - Prefer remote changes  
docker-compose.yml merge=theirs

# Logs - Append both sides   
logs/*.log  merge-pessimistic

Now markdown docs combine file versions, configuration changes defer to updated remote settings, and logging output concatenates entries from both sides.

Choosing the Right Merge Strategy

With default file-level merge setups, when there are merge conflicts between branches, Git cannot auto-resolve changes. Developers have to manually edit files to select versions leading to lots of extra work handling conflicts.

Overriding merge strategies reduces this problem by:

  • Favoring one branch changeours or theirs
  • Concatenating file outputsunion
  • Interleaving entriespessimistic

So based on context, picking alternate strategies avoids dealing with frequent development merge issues.

For example, favoring branch changes makes sense for generated output, concat/interleaves work for aggregating log data.

Strategies to Reduce Merge Conflicts

Some standard strategies to consider:

1. Preferring One Side

Use merge=ours or merge=theirs to favor one branch version:

# Favor generated output from primary branch  
src/main/java/gen_* merge=ours

# Dependencies updated from central library  
lib/merge=theirs   

Helps avoid regenerating files or downgrading dependencies on each merge.

2. Combining Histories

Concatenate data histories like logs via merge=union:

dev/testlogs/*.txt merge=union

Ensures logs accumulate data from both sides.

3. Interleaving Ordered Entries

Merge multi-line sorted data keeping ordering via merge-pessimistic:

highscores.csv merge-pessimistic  

Interleaves score entries from both correctly.

These and other strategies reduce noise from inconsequential merges.

Cascading Settings With .git/info/attributes

Repository managers can override attributes specified in .gitattributes using .git/info/attributes.

For example, having:

# Root level policy
* text=auto eol=lf

An admin could override with:

# Allow native EOLs for prose  
*.md -text eol=native

Values here trump .gitattributes, so useful for centralized policies.

Juggling Multiple .gitattributes Files

Large repositories can split up .gitattributes files by themes:

.gitattributes
|- eol.attributes # Line endings
|- export.attributes # Archive exports 

However, recall .gitattributes only affect paths nested under it. So place includes accordingly:

# Root level included from repo root
.gitattributes 
 |- eol.attributes
 |- filters.attributes

project/
 |- .gitattributes # override for project subtree

Higher level definitions recurse into lower paths based on nesting.

Getting Help From IDE Integrations

Many IDEs like VSCode, Atom, Sublime provide support for authoring .gitattributes:

  • Syntax help
  • Documentation lookups
  • Autocomplete suggestions
  • Error flagging

These make it easier to frame valid .gitattribute configurations using editor assistance.

Takeways and Recommendations

The .gitattributes file enables configuring repository, file and path-specific handling in Git repositories by:

  • Streamlining end-of-line normalization
  • Specifying binary vs text encodings
  • Applying input/output filters on add/checkout
  • Defining custom merge strategies
  • Controlling file exports and archive behaviors

Key recommendations are:

  • Use liberal comments explaining why attributes are applied
  • Modularize includes by themes like EOL, filtering, exports etc
  • Debug with git check-attr to validate settings
  • Override via .git/info for centralized policies
  • Leverage IDE assistance for authoring attributes

Following React community standards, we recommend sticking to LF normalization by default almost always. Override with eol=crlf just for Windows executable formats like .bat, .cmd etc which require CRLF endings. Other texts should just retain LF everywhere.

With those practices, .gitattributes can greatly enhance managing complex repository histories spanning platforms and technologies.

Similar Posts