Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Analyzing Census Data in Python
Census data is information collected by the government to understand population characteristics including age, gender, education, and housing. This data helps governments understand current scenarios and plan for the future.
In this article, we will learn how to analyze census data in Python using libraries like pandas, numpy, and matplotlib.
Sample Census Dataset
We'll use sample census data with the following structure:
| age | gender | education | worktype | income |
|---|---|---|---|---|
| 21 | Male | Bachelors | Private | 60000 |
| 24 | Female | Masters | Government | 72000 |
| 28 | Male | High-School | Self-employed | 35000 |
| 34 | Female | Bachelors | Private | 48000 |
| 39 | Male | Doctorate | Government | 90000 |
| 35 | Female | High-School | Self-employed | 32000 |
Loading Census Data
First, let's create and load the dataset using pandas:
import pandas as pd
# Create sample census data
data = {
'age': [21, 24, 28, 34, 39, 35],
'gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
'education': ['Bachelors', 'Masters', 'High-School', 'Bachelors', 'Doctorate', 'High-School'],
'worktype': ['Private', 'Government', 'Self-employed', 'Private', 'Government', 'Self-employed'],
'income': [60000, 72000, 35000, 48000, 90000, 32000]
}
census_data = pd.DataFrame(data)
print(census_data)
age gender education worktype income 0 21 Male Bachelors Private 60000 1 24 Female Masters Government 72000 2 28 Male High-School Self-employed 35000 3 34 Female Bachelors Private 48000 4 39 Male Doctorate Government 90000 5 35 Female High-School Self-employed 32000
Filtering Data by Age
Let's find all individuals aged above 30:
import pandas as pd
# Create sample census data
data = {
'age': [21, 24, 28, 34, 39, 35],
'gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
'education': ['Bachelors', 'Masters', 'High-School', 'Bachelors', 'Doctorate', 'High-School'],
'worktype': ['Private', 'Government', 'Self-employed', 'Private', 'Government', 'Self-employed'],
'income': [60000, 72000, 35000, 48000, 90000, 32000]
}
census_data = pd.DataFrame(data)
adults_above_30 = census_data[census_data["age"] > 30]
print(adults_above_30)
age gender education worktype income 3 34 Female Bachelors Private 48000 4 39 Male Doctorate Government 90000 5 35 Female High-School Self-employed 32000
Income Analysis by Education Level
Let's calculate the average income grouped by education level using the groupby() method:
import pandas as pd
# Create sample census data
data = {
'age': [21, 24, 28, 34, 39, 35],
'gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
'education': ['Bachelors', 'Masters', 'High-School', 'Bachelors', 'Doctorate', 'High-School'],
'worktype': ['Private', 'Government', 'Self-employed', 'Private', 'Government', 'Self-employed'],
'income': [60000, 72000, 35000, 48000, 90000, 32000]
}
census_data = pd.DataFrame(data)
avg_income_by_education = census_data.groupby("education")["income"].mean()
print(avg_income_by_education)
education Bachelors 54000.0 Doctorate 90000.0 High-School 33500.0 Masters 72000.0 Name: income, dtype: float64
Gender Distribution Visualization
Let's create a bar chart showing the distribution of males and females in our dataset:
import pandas as pd
import matplotlib.pyplot as plt
# Create sample census data
data = {
'age': [21, 24, 28, 34, 39, 35],
'gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
'education': ['Bachelors', 'Masters', 'High-School', 'Bachelors', 'Doctorate', 'High-School'],
'worktype': ['Private', 'Government', 'Self-employed', 'Private', 'Government', 'Self-employed'],
'income': [60000, 72000, 35000, 48000, 90000, 32000]
}
census_data = pd.DataFrame(data)
gender_counts = census_data["gender"].value_counts()
# Create bar chart
plt.figure(figsize=(8, 6))
gender_counts.plot(kind="bar", title="Population by Gender", color=['skyblue', 'lightcoral'])
plt.xlabel("Gender")
plt.ylabel("Count")
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()
print("Gender distribution:")
print(gender_counts)
Gender distribution: gender Female 3 Male 3 Name: count, dtype: int64
Summary Statistics
Let's get basic statistics about our census data:
import pandas as pd
# Create sample census data
data = {
'age': [21, 24, 28, 34, 39, 35],
'gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
'education': ['Bachelors', 'Masters', 'High-School', 'Bachelors', 'Doctorate', 'High-School'],
'worktype': ['Private', 'Government', 'Self-employed', 'Private', 'Government', 'Self-employed'],
'income': [60000, 72000, 35000, 48000, 90000, 32000]
}
census_data = pd.DataFrame(data)
print("Basic Statistics:")
print(census_data.describe())
print("\nEducation levels:")
print(census_data['education'].value_counts())
print("\nWork types:")
print(census_data['worktype'].value_counts())
Basic Statistics:
age income
count 6.000000 6.000000
mean 30.166667 56166.666667
std 7.305616 23409.584247
min 21.000000 32000.000000
25% 25.500000 42500.000000
50% 31.000000 54000.000000
75% 34.750000 69000.000000
max 39.000000 90000.000000
Education levels:
education
Bachelors 2
Doctorate 1
High-School 2
Masters 1
Name: count, dtype: int64
Work types:
worktype
Government 2
Private 2
Self-employed 2
Name: count, dtype: int64
Conclusion
Python provides powerful tools for census data analysis through pandas for data manipulation, matplotlib for visualization, and statistical functions for insights. These tools enable filtering, grouping, and visualizing population characteristics effectively for informed decision-making.
