Analyzing Census Data in Python

Census data is information collected by the government to understand population characteristics including age, gender, education, and housing. This data helps governments understand current scenarios and plan for the future.

In this article, we will learn how to analyze census data in Python using libraries like pandas, numpy, and matplotlib.

Sample Census Dataset

We'll use sample census data with the following structure:

age gender education worktype income
21 Male Bachelors Private 60000
24 Female Masters Government 72000
28 Male High-School Self-employed 35000
34 Female Bachelors Private 48000
39 Male Doctorate Government 90000
35 Female High-School Self-employed 32000

Loading Census Data

First, let's create and load the dataset using pandas:

import pandas as pd

# Create sample census data
data = {
    'age': [21, 24, 28, 34, 39, 35],
    'gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
    'education': ['Bachelors', 'Masters', 'High-School', 'Bachelors', 'Doctorate', 'High-School'],
    'worktype': ['Private', 'Government', 'Self-employed', 'Private', 'Government', 'Self-employed'],
    'income': [60000, 72000, 35000, 48000, 90000, 32000]
}

census_data = pd.DataFrame(data)
print(census_data)
   age  gender    education       worktype  income
0   21    Male    Bachelors        Private   60000
1   24  Female      Masters     Government   72000
2   28    Male  High-School  Self-employed   35000
3   34  Female    Bachelors        Private   48000
4   39    Male    Doctorate     Government   90000
5   35  Female  High-School  Self-employed   32000

Filtering Data by Age

Let's find all individuals aged above 30:

import pandas as pd

# Create sample census data
data = {
    'age': [21, 24, 28, 34, 39, 35],
    'gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
    'education': ['Bachelors', 'Masters', 'High-School', 'Bachelors', 'Doctorate', 'High-School'],
    'worktype': ['Private', 'Government', 'Self-employed', 'Private', 'Government', 'Self-employed'],
    'income': [60000, 72000, 35000, 48000, 90000, 32000]
}

census_data = pd.DataFrame(data)
adults_above_30 = census_data[census_data["age"] > 30]
print(adults_above_30)
   age  gender    education       worktype  income
3   34  Female    Bachelors        Private   48000
4   39    Male    Doctorate     Government   90000
5   35  Female  High-School  Self-employed   32000

Income Analysis by Education Level

Let's calculate the average income grouped by education level using the groupby() method:

import pandas as pd

# Create sample census data
data = {
    'age': [21, 24, 28, 34, 39, 35],
    'gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
    'education': ['Bachelors', 'Masters', 'High-School', 'Bachelors', 'Doctorate', 'High-School'],
    'worktype': ['Private', 'Government', 'Self-employed', 'Private', 'Government', 'Self-employed'],
    'income': [60000, 72000, 35000, 48000, 90000, 32000]
}

census_data = pd.DataFrame(data)
avg_income_by_education = census_data.groupby("education")["income"].mean()
print(avg_income_by_education)
education
Bachelors      54000.0
Doctorate      90000.0
High-School    33500.0
Masters        72000.0
Name: income, dtype: float64

Gender Distribution Visualization

Let's create a bar chart showing the distribution of males and females in our dataset:

import pandas as pd
import matplotlib.pyplot as plt

# Create sample census data
data = {
    'age': [21, 24, 28, 34, 39, 35],
    'gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
    'education': ['Bachelors', 'Masters', 'High-School', 'Bachelors', 'Doctorate', 'High-School'],
    'worktype': ['Private', 'Government', 'Self-employed', 'Private', 'Government', 'Self-employed'],
    'income': [60000, 72000, 35000, 48000, 90000, 32000]
}

census_data = pd.DataFrame(data)
gender_counts = census_data["gender"].value_counts()

# Create bar chart
plt.figure(figsize=(8, 6))
gender_counts.plot(kind="bar", title="Population by Gender", color=['skyblue', 'lightcoral'])
plt.xlabel("Gender")
plt.ylabel("Count")
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

print("Gender distribution:")
print(gender_counts)
Gender distribution:
gender
Female    3
Male      3
Name: count, dtype: int64

Summary Statistics

Let's get basic statistics about our census data:

import pandas as pd

# Create sample census data
data = {
    'age': [21, 24, 28, 34, 39, 35],
    'gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
    'education': ['Bachelors', 'Masters', 'High-School', 'Bachelors', 'Doctorate', 'High-School'],
    'worktype': ['Private', 'Government', 'Self-employed', 'Private', 'Government', 'Self-employed'],
    'income': [60000, 72000, 35000, 48000, 90000, 32000]
}

census_data = pd.DataFrame(data)

print("Basic Statistics:")
print(census_data.describe())

print("\nEducation levels:")
print(census_data['education'].value_counts())

print("\nWork types:")
print(census_data['worktype'].value_counts())
Basic Statistics:
             age         income
count   6.000000       6.000000
mean   30.166667   56166.666667
std     7.305616   23409.584247
min    21.000000   32000.000000
25%    25.500000   42500.000000
50%    31.000000   54000.000000
75%    34.750000   69000.000000
max    39.000000   90000.000000

Education levels:
education
Bachelors      2
Doctorate      1
High-School    2
Masters        1
Name: count, dtype: int64

Work types:
worktype
Government       2
Private          2
Self-employed    2
Name: count, dtype: int64

Conclusion

Python provides powerful tools for census data analysis through pandas for data manipulation, matplotlib for visualization, and statistical functions for insights. These tools enable filtering, grouping, and visualizing population characteristics effectively for informed decision-making.

Updated on: 2026-03-25T07:59:43+05:30

884 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements