Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python Pandas - Merge DataFrame with indicator value
To merge Pandas DataFrame with indicator information, use the merge() function with the indicator parameter set to True. This adds a special _merge column showing the source of each row.
What is the Indicator Parameter?
The indicator parameter creates a column that tracks whether each row comes from the left DataFrame, right DataFrame, or both ?
import pandas as pd
# Create DataFrame1
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
print("DataFrame1:")
print(dataFrame1)
DataFrame1:
Car Units
0 BMW 100
1 Lexus 150
2 Audi 110
3 Mustang 80
4 Bentley 110
5 Jaguar 90
Creating the Second DataFrame
import pandas as pd
# Create DataFrame2
dataFrame2 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Tesla', 'Mustang', 'Mercedes', 'Jaguar'],
"Reg_Price": [7000, 1500, 5000, 8000, 9000, 6000]
})
print("DataFrame2:")
print(dataFrame2)
DataFrame2:
Car Reg_Price
0 BMW 7000
1 Lexus 1500
2 Tesla 5000
3 Mustang 8000
4 Mercedes 9000
5 Jaguar 6000
Merging with Indicator
Now merge the DataFrames using indicator=True to track row sources ?
import pandas as pd
# Create both DataFrames
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Units": [100, 150, 110, 80, 110, 90]
})
dataFrame2 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Tesla', 'Mustang', 'Mercedes', 'Jaguar'],
"Reg_Price": [7000, 1500, 5000, 8000, 9000, 6000]
})
# Merge DataFrames with indicator
mergedRes = pd.merge(dataFrame1, dataFrame2, how="left", indicator=True)
print("Merged dataframe with indicator:")
print(mergedRes)
Merged dataframe with indicator:
Car Units Reg_Price _merge
0 BMW 100 7000.0 both
1 Lexus 150 1500.0 both
2 Audi 110 NaN left_only
3 Mustang 80 8000.0 both
4 Bentley 110 NaN left_only
5 Jaguar 90 6000.0 both
Understanding the _merge Column
The _merge column contains three possible values ?
| Value | Meaning | Description |
|---|---|---|
both |
Present in both DataFrames | Row exists in left and right |
left_only |
Only in left DataFrame | Row exists only in first DataFrame |
right_only |
Only in right DataFrame | Row exists only in second DataFrame |
Custom Indicator Column Name
You can specify a custom name for the indicator column ?
import pandas as pd
dataFrame1 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Audi'],
"Units": [100, 150, 110]
})
dataFrame2 = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Tesla'],
"Reg_Price": [7000, 1500, 5000]
})
# Custom indicator column name
mergedRes = pd.merge(dataFrame1, dataFrame2, how="outer", indicator="source")
print(mergedRes)
Car Units Reg_Price source
0 BMW 100.0 7000.0 both
1 Lexus 150.0 1500.0 both
2 Audi 110.0 NaN left_only
3 Tesla NaN 5000.0 right_only
Conclusion
The indicator parameter in merge() adds a tracking column showing row sources. Use indicator=True for default _merge column, or specify a custom name for better clarity in your analysis.
Advertisements
