Creating a Dataframe using CSV files

A DataFrame is a powerful two-dimensional data structure in Python's pandas library, similar to a spreadsheet. CSV files are the most common way to store tabular data. This article demonstrates how to create DataFrames from CSV files and perform essential data operations.

What are DataFrames and CSV Files?

A DataFrame is a two-dimensional, size-mutable, tabular data structure with columns of potentially different types. It's similar to a spreadsheet or SQL table, commonly used for data analysis in Python.

A CSV (Comma-Separated Values) file stores data in tabular format, with each row representing a record and columns separated by commas. CSV files are widely supported and easy to work with across different applications.

Reading CSV Files into DataFrames

Use pandas' read_csv() function to load CSV data into a DataFrame ?

import pandas as pd

# Create sample CSV data
import io
csv_data = """Title,Year,Genre,Runtime
The Shawshank Redemption,1994,Drama,142
The Godfather,1972,Crime,175
The Dark Knight,2008,Action,152
12 Angry Men,1957,Drama,96"""

# Read CSV from string (simulating file read)
df = pd.read_csv(io.StringIO(csv_data))
print(df)
                     Title  Year  Genre  Runtime
0  The Shawshank Redemption  1994  Drama      142
1             The Godfather  1972  Crime      175
2           The Dark Knight  2008  Action      152
3              12 Angry Men  1957  Drama       96

Syntax

import pandas as pd
df = pd.read_csv('filename.csv')

The read_csv() function has many optional parameters like delimiter, encoding, and header to customize file reading.

Exploring DataFrames

Basic DataFrame Information

import pandas as pd
import io

csv_data = """Title,Year,Genre,Runtime
The Shawshank Redemption,1994,Drama,142
The Godfather,1972,Crime,175
The Dark Knight,2008,Action,152
12 Angry Men,1957,Drama,96
Pulp Fiction,1994,Crime,154"""

df = pd.read_csv(io.StringIO(csv_data))

# View first few rows
print("First 3 rows:")
print(df.head(3))

print("\nDataFrame shape:")
print(df.shape)

print("\nSummary statistics:")
print(df.describe())
First 3 rows:
                     Title  Year  Genre  Runtime
0  The Shawshank Redemption  1994  Drama      142
1             The Godfather  1972  Crime      175
2           The Dark Knight  2008  Action      152

DataFrame shape:
(5, 4)

Summary statistics:
             Year     Runtime
count    5.000000    5.000000
mean  1985.000000  143.800000
std     20.273135   27.896438
min   1957.000000   96.000000
25%   1972.000000  142.000000
50%   1994.000000  152.000000
75%   1994.000000  154.000000
max   2008.000000  175.000000

Selecting Columns

# Select specific columns
subset = df[['Title', 'Genre']]
print(subset)
                     Title  Genre
0  The Shawshank Redemption  Drama
1             The Godfather  Crime
2           The Dark Knight  Action
3              12 Angry Men  Drama
4              Pulp Fiction  Crime

Manipulating DataFrames

Sorting Data

# Sort by Year in descending order
sorted_df = df.sort_values('Year', ascending=False)
print(sorted_df)
                     Title  Year  Genre  Runtime
2           The Dark Knight  2008  Action      152
0  The Shawshank Redemption  1994  Drama      142
4              Pulp Fiction  1994  Crime      154
1             The Godfather  1972  Crime      175
3              12 Angry Men  1957  Drama       96

Filtering Data

# Filter movies by genre
crime_movies = df[df['Genre'] == 'Crime']
print(crime_movies)
           Title  Year  Genre  Runtime
1  The Godfather  1972  Crime      175
4   Pulp Fiction  1994  Crime      154

Grouping Data

# Group by Genre and calculate mean runtime
genre_stats = df.groupby('Genre')['Runtime'].mean()
print(genre_stats)
Genre
Action    152.0
Crime     164.5
Drama     119.0
Name: Runtime, dtype: float64

Writing DataFrames to CSV Files

Save your processed DataFrame back to a CSV file using to_csv() ?

# Create a modified DataFrame
df_modified = df[df['Runtime'] > 140]

# Convert to CSV string (simulating file write)
csv_output = df_modified.to_csv(index=False)
print("CSV Output:")
print(csv_output)
CSV Output:
Title,Year,Genre,Runtime
The Shawshank Redemption,1994,Drama,142
The Godfather,1972,Crime,175
The Dark Knight,2008,Action,152
Pulp Fiction,1994,Crime,154

Syntax

# Write to CSV file
df.to_csv('output.csv', index=False)

# Write with custom separator
df.to_csv('output.csv', sep=';', index=False)

Common Operations Summary

Operation Function Purpose
Read CSV pd.read_csv() Load CSV into DataFrame
View Data df.head() Show first few rows
Get Info df.shape Get dimensions
Sort df.sort_values() Sort by column(s)
Filter df[condition] Filter rows
Save CSV df.to_csv() Export DataFrame

Conclusion

DataFrames provide a powerful way to work with CSV data in Python. Use pd.read_csv() to load data, explore it with head() and describe(), and manipulate it with sorting, filtering, and grouping operations. Save your results with to_csv() for future use.

Updated on: 2026-03-27T06:00:13+05:30

6K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements