How to find difference between 2 files in Python?

In most applications, especially in data processing, software development or testing, it is required to compare two files to detect changes, validate outputs or find discrepancies. Python offers several ways to compare files which ranges from basic line-by-line comparisons to more advanced diff utilities. Following are the key methods used to compare two files in Python ?

  • Line-by-line comparison: This is a straightforward approach to textual differences.
  • difflib module: This module is used to produce human-readable diffs similar to Unix's diff command.
  • filecmp module: This module is used for quick binary or shallow comparisons.

All the above methods can be applied to various file types such as .txt, .csv, .json, or code files.

Line-by-Line Comparison (Basic Method)

The Line-by-line comparison method is the basic method that reads both files line by line and compares each line in order. This method is useful in detecting small changes in text files such as source code, logs, or configuration files. This method can handle files of different lengths also.

Example

Following is the example which compares two text files and reports differences found at specific line numbers ?

def compare_files(file1, file2):
    # Create sample files for demonstration
    with open(file1, 'w') as f:
        f.write("Hello, welcome to Tutorialspoint.\nHave a happy learning.\n")
    
    with open(file2, 'w') as f:
        f.write("Hello welcome to Tutorialspoint.\nhave a happy learning.\n")
    
    # Compare files line by line
    with open(file1, 'r') as f1, open(file2, 'r') as f2:
        f1_lines = f1.readlines()
        f2_lines = f2.readlines()

    max_lines = max(len(f1_lines), len(f2_lines))
    for i in range(max_lines):
        line1 = f1_lines[i].strip() if i < len(f1_lines) else "<no line>"
        line2 = f2_lines[i].strip() if i < len(f2_lines) else "<no line>"

        if line1 != line2:
            print(f"Difference at line {i + 1}:")
            print(f"File1: {line1}")
            print(f"File2: {line2}")
            print("-" * 40)

# Example usage
compare_files('file1.txt', 'file2.txt')

The output after comparing both files is ?

Difference at line 1:
File1: Hello, welcome to Tutorialspoint.
File2: Hello welcome to Tutorialspoint.
----------------------------------------
Difference at line 2:
File1: Have a happy learning.
File2: have a happy learning.
----------------------------------------

Using difflib for Detailed Diffs

The difflib module provides various tools for computing and working with differences between sequences. It's a more advanced method for file comparison which produces detailed and human-readable difference output similar to Unix's diff command. This method is useful when we want to highlight exactly what changed in each line.

Here are the symbols and their meanings defined in the output when comparing two files ?

  • - line unique to sequence 1
  • + line unique to sequence 2
  • ' ' line common to both sequences
  • ? line not present in either input sequence

Example

Following is an example that compares two text files using difflib.unified_diff() function and prints a unified difference showing the changes ?

import difflib

def diff_files(file1, file2):
    # Create sample files for demonstration
    with open(file1, 'w') as f:
        f.write("Hello, welcome to Tutorialspoint.\nHave a happy learning.\n")
    
    with open(file2, 'w') as f:
        f.write("Hello welcome to Tutorialspoint.\nhave a happy learning.\n")
    
    # Compare files using difflib
    with open(file1, 'r') as f1, open(file2, 'r') as f2:
        f1_lines = f1.readlines()
        f2_lines = f2.readlines()

    diff = difflib.unified_diff(f1_lines, f2_lines, fromfile='file1.txt', tofile='file2.txt')
    print(''.join(diff))

# Example usage
diff_files('file1.txt', 'file2.txt')

The output after comparing both files using difflib.unified_diff() function is ?

--- file1.txt
+++ file2.txt
@@ -1,2 +1,2 @@
-Hello, welcome to Tutorialspoint.
-Have a happy learning.
+Hello welcome to Tutorialspoint.
+have a happy learning.

Using filecmp for File Comparison

The filecmp module provides a fast way to compare files to check if they are identical or not. This module performs a shallow or deep comparison, which is useful for binary files or when we simply need to know whether two files are exactly the same without caring about the specific differences.

Example

Following is the example which uses filecmp.cmp() to compare two files and prints whether they are identical or different ?

import filecmp

def compare_binary(file1, file2):
    # Create sample files for demonstration
    with open(file1, 'w') as f:
        f.write("Hello, welcome to Tutorialspoint.\nHave a happy learning.\n")
    
    with open(file2, 'w') as f:
        f.write("Hello welcome to Tutorialspoint.\nhave a happy learning.\n")
    
    # Compare files using filecmp
    result = filecmp.cmp(file1, file2, shallow=False)
    if result:
        print("Files are the same.")
    else:
        print("Files differ.")

# Example usage
compare_binary('file1.txt', 'file2.txt')

The output after comparing both files is ?

Files differ.

Conclusion

Choose line-by-line comparison for simple text differences, difflib for detailed human-readable diffs, or filecmp for quick binary comparisons. Each method serves different use cases depending on your specific requirements.

Updated on: 2026-03-24T18:21:16+05:30

11K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements