Python - Create a Pipeline in Pandas

To create a pipeline in Pandas, we use the pipe()

What is the pipe() Method?

The pipe() method applies a function to the DataFrame and returns the result. It's designed to make method chaining more readable by allowing custom functions to be integrated into the chain.

Basic Syntax

DataFrame.pipe(func, *args, **kwargs)

Creating a Simple Pipeline

Let's start by creating a DataFrame and a custom function to convert column names to uppercase ?

import pandas as pd

# Function to convert column names to uppercase
def upperFunc(dataframe):
    # Converting to uppercase
    dataframe.columns = dataframe.columns.str.upper()
    return dataframe

# Create DataFrame
dataFrame = pd.DataFrame({
    "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
    "Units": [100, 150, 110, 80, 110, 90]
})

print("Original DataFrame:")
print(dataFrame)

# Creating pipeline using pipe()
pipeline = dataFrame.pipe(upperFunc)

print("\nAfter applying pipeline (uppercase columns):")
print(pipeline)
Original DataFrame:
       Car  Units
0      BMW    100
1    Lexus    150
2     Audi    110
3  Mustang     80
4  Bentley    110
5   Jaguar     90

After applying pipeline (uppercase columns):
       CAR  UNITS
0      BMW    100
1    Lexus    150
2     Audi    110
3  Mustang     80
4  Bentley    110
5   Jaguar     90

Multiple Operations in Pipeline

You can chain multiple pipe() operations together for more complex transformations ?

import pandas as pd

def uppercase_columns(df):
    df.columns = df.columns.str.upper()
    return df

def filter_high_units(df):
    return df[df['UNITS'] > 100]

def add_category(df):
    df['CATEGORY'] = 'Premium'
    return df

# Create DataFrame
df = pd.DataFrame({
    "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
    "Units": [100, 150, 110, 80, 110, 90]
})

# Chain multiple operations
result = (df.pipe(uppercase_columns)
           .pipe(filter_high_units)  
           .pipe(add_category))

print("Final result after pipeline:")
print(result)
Final result after pipeline:
       CAR  UNITS  CATEGORY
1    Lexus    150   Premium
2     Audi    110   Premium
4  Bentley    110   Premium

Pipeline with Parameters

You can pass additional arguments to functions in the pipeline ?

import pandas as pd

def filter_by_units(df, min_units):
    return df[df['Units'] >= min_units]

def multiply_units(df, factor):
    df['Units'] = df['Units'] * factor
    return df

# Create DataFrame
df = pd.DataFrame({
    "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
    "Units": [100, 150, 110, 80, 110, 90]
})

# Pipeline with parameters
result = (df.pipe(filter_by_units, min_units=100)
           .pipe(multiply_units, factor=2))

print("Pipeline with parameters:")
print(result)
Pipeline with parameters:
       Car  Units
0      BMW    200
1    Lexus    300
2     Audi    220
4  Bentley    220

Key Benefits

Pipelines offer several advantages:

  • Readability: Code flows from left to right, making it easier to follow
  • Modularity: Each function handles a specific transformation
  • Reusability: Functions can be reused in different pipelines
  • Method chaining: Integrates well with other pandas methods

Conclusion

The pipe() method in Pandas enables clean, readable data transformation pipelines. It allows you to chain custom functions together, making complex data processing workflows more maintainable and easier to understand.

Updated on: 2026-03-26T02:01:30+05:30

350 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements