Often you may want to print a summary of a pandas DataFrame.
One of the most common ways to do so is by using the info() method, which uses the following syntax:
DataFrame.info(verbose=None, buf=None, max_col=None, memory_usage=None, show_counts=None)
where:
- verbose: Whether to print the full summary
- buf: Where to send the output
- max_col: When to switch from verbose to truncated output
- memory_usage: Whether total memory usage of the DataFrame elements should be displayed
- show_counts: Whether to show non-null counts
By using this single info() method, we are able to gain a good understanding of each column in a pandas DataFrame.
The following example shows how to use the info() method in practice with a pandas DataFrame.
Example: How to Use the info() Method in Pandas
Suppose we create the following pandas DataFrame that contains information about various basketball players:
import pandas as pd import numpy as np #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'C', 'C', 'C', 'D'], 'points': [12, 14, 18, 13, np.nan, np.nan, 20, 29], 'assists': [10, 22, 24, 20, 14, 18, 10, 12]}) #view DataFrame print(df) team points assists 0 A 12.0 10 1 A 14.0 22 2 B 18.0 24 3 B 13.0 20 4 C NaN 14 5 C NaN 18 6 C 20.0 10 7 D 29.0 12
Suppose that we would like to generate a summary of each column in this particular DataFrame.
We can use the info() method to do so:
#print summary of DataFrame
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 team 8 non-null object
1 points 6 non-null float64
2 assists 8 non-null int64
dtypes: float64(1), int64(1), object(1)
memory usage: 324.0+ bytes
The output displays a variety of information that summarizes the DataFrame.
Here is how to interpret each line in the output:
The first line shows the class of the object, which is a pandas DataFrame.
The second line displays the range of the index column, which we can see has 8 total entries that range from 0 to 7.
The next portion of the output shows the index number, column name, non-null element counts and dtype of the each column.
For example, we can see:
- The team column is an object, i.e. a string column.
- The points column is a floating point number column.
- The assists column is an integer column.
The last line in the output displays the total memory usage of the DataFrame elements.
Note that we could set the show_counts argument and the memory_usage arguments to False if we would like to avoid showing the total non-null counts in each column along with the total memory usage of the DataFrame elements, which isn’t always of interest:
#print summary of DatFrame with less info
df.info(show_counts=False, memory_usage=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 3 columns):
# Column Dtype
--- ------ -----
0 team object
1 points float64
2 assists int64
dtypes: float64(1), int64(1), object(1)
Notice that the output no longer shows the non-null count of elements in each column in the DataFrame and it no longer displays the total memory usage.
Note: You can find the complete documentation for the info() method in pandas here.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
How to Use qcut() in Pandas
How to Use pct_change() in Pandas
How to Use the map() Function in Pandas