Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python - Create a Pipeline in Pandas
To create a pipeline in Pandas, we use the The Let's start by creating a DataFrame and a custom function to convert column names to uppercase ? You can chain multiple You can pass additional arguments to functions in the pipeline ? Pipelines offer several advantages: The pipe()
What is the pipe() Method?
pipe() method applies a function to the DataFrame and returns the result. It's designed to make method chaining more readable by allowing custom functions to be integrated into the chain.Basic Syntax
DataFrame.pipe(func, *args, **kwargs)
Creating a Simple Pipeline
import pandas as pd
# Function to convert column names to uppercase
def upperFunc(dataframe):
# Converting to uppercase
dataframe.columns = dataframe.columns.str.upper()
return dataframe
# Create DataFrame
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
print("Original DataFrame:")
print(dataFrame)
# Creating pipeline using pipe()
pipeline = dataFrame.pipe(upperFunc)
print("\nAfter applying pipeline (uppercase columns):")
print(pipeline)
Original DataFrame:
Car Units
0 BMW 100
1 Lexus 150
2 Audi 110
3 Mustang 80
4 Bentley 110
5 Jaguar 90
After applying pipeline (uppercase columns):
CAR UNITS
0 BMW 100
1 Lexus 150
2 Audi 110
3 Mustang 80
4 Bentley 110
5 Jaguar 90
Multiple Operations in Pipeline
pipe() operations together for more complex transformations ?
import pandas as pd
def uppercase_columns(df):
df.columns = df.columns.str.upper()
return df
def filter_high_units(df):
return df[df['UNITS'] > 100]
def add_category(df):
df['CATEGORY'] = 'Premium'
return df
# Create DataFrame
df = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
# Chain multiple operations
result = (df.pipe(uppercase_columns)
.pipe(filter_high_units)
.pipe(add_category))
print("Final result after pipeline:")
print(result)
Final result after pipeline:
CAR UNITS CATEGORY
1 Lexus 150 Premium
2 Audi 110 Premium
4 Bentley 110 Premium
Pipeline with Parameters
import pandas as pd
def filter_by_units(df, min_units):
return df[df['Units'] >= min_units]
def multiply_units(df, factor):
df['Units'] = df['Units'] * factor
return df
# Create DataFrame
df = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
# Pipeline with parameters
result = (df.pipe(filter_by_units, min_units=100)
.pipe(multiply_units, factor=2))
print("Pipeline with parameters:")
print(result)
Pipeline with parameters:
Car Units
0 BMW 200
1 Lexus 300
2 Audi 220
4 Bentley 220
Key Benefits
Conclusion
pipe() method in Pandas enables clean, readable data transformation pipelines. It allows you to chain custom functions together, making complex data processing workflows more maintainable and easier to understand.
