The Data USA API allows users to explore the entire database using carefully constructed query strings, returning data as JSON results. All of the visualizations on the page have a "show data" button on their top-right that displays the API call(s) used to generate that visualization.
The following sections describe the main components of the API:
Below is a list of the core endpoints available:
/cubes
Returns a list of all available data cubes in the Tesseract server.
https://api.datausa.io/tesseract/cubes
/cubes/{name}
Returns the full schema of a specific cube, including its measures, dimensions, and levels.
https://api.datausa.io/tesseract/cubes/acs_yg_total_population_5
/members?cube=<name>&level=<level>
Returns all members (distinct categorization values) for a selected level of a given cube. This is useful for populating filters or understanding the scope of available data in a dimension level.
https://api.datausa.io/tesseract/members?cube=acs_yg_total_population_5&level=State
/data.{format}
This is the main endpoint used to fetch the data itself. You can query specific dimensions, measures, and filters to retrieve the dataset you're interested in.
The data.{format} endpoint allows you to choose the desired format of the response. Supported formats are:
csv Comma-separated valuestsv Tab-separated valuesparquet Apache Parquet columnar binary format (for big data processing)xlsx Microsoft Excel spreadsheetjsonarrays JSON array of arrays (compact)jsonrecords JSON array of objects (more readable)To learn how the data endpoint works, let's break down the anatomy of an example API request. In this case we analyze the population by state in 2023:
https://api.datausa.io/tesseract/data.jsonrecords
This is the root URL for the API. It will be the same for all requests. This will return a JSON object with a list of records. If you prefer a different format, you can change the extension to.csv, .xlsx, or .parquet.
?cube=acs_yg_total_population_5
This is the dataset you want to query. You can find the list of all available datasets by querying the cubes endpoint.
&drilldowns=State,Year
This is the drilldown you want to add to your query. Drilldowns represent the columns (or dimensions) that you want to see in the response. You can find the list of all available drilldowns for this specific cube (or dataset) by querying the acs_yg_total_population_5 cube endpoint.
&measures=Population
This is the measure you want to add to your query. Measures represent the columns (or metrics) that you want to see in the response. You can find the list of all available measures for this specific cube (or dataset) by querying the acs_yg_total_population_5 cube endpoint. It is important to note that measures will automatically be aggregated by the cube's default aggregation function.
&include=Year:2023
These are the filters you want to add to your query. Any column (or dimensions) availble to the specified cube can be used to filter the response. Multiple filters can be added by separating them with a semicolon. Filters take 2 parameters: the column name and the value you want to filter by. To find the available values for a specific column, you can query the members endpoint for that cube along with the level parameter. For example, to find the available values for the year column, you can query the 'Year' members endpoint for the acs_yg_total_population_5 cube.
&limit=100,0
This is the limit you want to add to your query where the first number is the limit and the second number is the offset. It is used to limit the number of records returned in the response.
To understand how the data is returned, let's break down the anatomy of an example response. In this case we're we'll stick with our previous example of the population by state in 2023:
The response will be a JSON object with the following structure:
{
"annotations":
{
"subtopic": "Demographics",
"dataset_link": "http://www.census.gov/programs-surveys/acs/",
"table_id": "B01003",
"dataset_name": "ACS 5-year Estimate",
"topic": "Diversity",
"source_name": "Census Bureau",
"source_description": "The American Community Survey (ACS) is conducted by the US Census and sent to a portion of the population every year."
},
"page":
{
"limit": 0,
"offset": 0,
"total": 52
},
"columns":
[
"State ID",
"State",
"Year",
"Population"
],
"data": [
{
"State ID": "04000US01",
"State": "Alabama",
"Year": 2023,
"Population": 5054253
},
{
"State ID": "04000US02",
"State": "Alaska",
"Year": 2023,
"Population": 733971
},
{
"State ID": "04000US04",
"State": "Arizona",
"Year": 2023,
"Population": 7268175
},
...
]
}The annotations object contains metadata about the dataset such as the source name, source description, topic, subtopic, and dataset name.
The page object contains metadata about the pagination of the data response, such as the limit, offset, and total number of records available for that set of parameters.
The columns object contains the list of columns in the dataset. Please note that even though we didn't specify to include the ID columns in the request, they are still returned.
The data object contains the list of records in the dataset. The keys are the column names (including both drilldown and measure columns) and the values are the values for that column for the given record.
The API supports a number of advanced parameters that allow you to filter and sort the data.
The include parameter allows you to specify the unique members of a dimension that should be considered when calculating measures. Users can specify one or more members to include, separated by commas (,). The format is as follows:
&include=<level_name>:<member_key>,<member_key>,<member_key>
You can specify one or more include parameters in the request, either in different keys or in the same key separated by semicolons (;).
&include=<level1>:<level1_member1>&include=<level2>:<level2_member1>,<level2_member2>
&include=<level1>:<level1_member1>;<level2>:<level2_member1>,<level2_member2>For example, the following API call will include data only for the 2023 and for Alabama:
In the same way that include specifies the unique members that should be considered, the exclude parameter specifies that all available members except those mentioned should be considered in the aggregation. The format is the same:
&exclude=<level1>:<level1_member1>;<level2>:<level2_member1>,<level2_member2>
&exclude=<level1>:<level1_member1>&include=<level2>:<level2_member1>,<level2_member2>For example, the following API call will return data for all available states except for Alabama:
The parents parameter allows you to include parent members in hierarchical dimensions. You can either specify specific levels to include parents for, or use a boolean value to include all parents. The format is as follows:
&parents=<level1>,<level2>
&parents=trueFor example, the following API call will include parent members for the County level:
The filters parameter allows you to filter results based on measure values using various comparison operations. The basic format is:
&filters=<measure>.<operation>
Null Operations
You can filter for null or non-null values using:
&filters=Population.isnull
&filters=Population.isnotnullComparison Operations
The following comparison operations are supported:
gt Greater Than (>)gte Greater Than or Equal (>=)lt Less Than (<)lte Less Than or Equal (<=)eq Equal (==)neq Not Equal (!=)For example, to filter for Population greater than 30 million:
&filters=Population.gt.30000000
You can also combine multiple conditions using .and./.or.. For example, to filter for population between 250,000 and 750,000:
&filters=Population.gt.250000.and.lt.750000
The API provides two parameters for ordering results: ranking and sort.
Ranking
The ranking parameter allows you to rank results based on a measure. You can specify the measure name with an optional minus sign for descending order, or use a boolean value. The format is:
&ranking=<measure>
&ranking=-<measure>
&ranking=<measure1>,-<measure2>
&ranking=trueSort
The sort parameter allows you to sort results by either a measure or a level (dimension). You can specify ascending or descending order. The format is:
&sort=<measure or level>.asc
&sort=<measure or level>.descFor example, to sort by Population in ascending order:
Or, to fetch the top state above a threshold of 30 million sorted by year:
The TopK parameter allows you to get the top N records for each unique combination of specified levels, sorted by a measure or level. The format is:
&top=<amount>.<level1>[,<level2>].<measure or level>.<order>
For example, to get the year of the highest population for each state:
Or, to get the 2 years with the highest population for each state:
Note: All columns mentioned in the parameter must be present in the request, either in drilldowns or measures.
The time parameter provides special filtering capabilities for time-based dimensions. It supports two main formats:
Latest/Oldest Format
This format allows you to fetch the most recent or oldest data points available in the dataset:
&time=<dimension>.latest[.amount]
&time=<dimension>.oldest[.amount]Trailing/Leading Format
This format allows you to set a time frame relative to the most/least recent record available in the dataset. Works similarly to the latest/oldest format, but is strictly time-based:
&time=<dimension>.trailing[.amount]
&time=<dimension>.leading[.amount]