Introduction

The Data USA API allows users to explore the entire database using carefully constructed query strings, returning data as JSON results. All of the visualizations on the page have a "show data" button on their top-right that displays the API call(s) used to generate that visualization.

The following sections describe the main components of the API:

Endpoints

Below is a list of the core endpoints available:

/cubes

Returns a list of all available data cubes in the Tesseract server.

https://api.datausa.io/tesseract/cubes

/cubes/{name}

Returns the full schema of a specific cube, including its measures, dimensions, and levels.

https://api.datausa.io/tesseract/cubes/acs_yg_total_population_5

/members?cube=<name>&level=<level>

Returns all members (distinct categorization values) for a selected level of a given cube. This is useful for populating filters or understanding the scope of available data in a dimension level.

https://api.datausa.io/tesseract/members?cube=acs_yg_total_population_5&level=State

/data.{format}

This is the main endpoint used to fetch the data itself. You can query specific dimensions, measures, and filters to retrieve the dataset you're interested in.

The data.{format} endpoint allows you to choose the desired format of the response. Supported formats are:

  • csv Comma-separated values
  • tsv Tab-separated values
  • parquet Apache Parquet columnar binary format (for big data processing)
  • xlsx Microsoft Excel spreadsheet
  • jsonarrays JSON array of arrays (compact)
  • jsonrecords JSON array of objects (more readable)

Anatomy of a Request

To learn how the data endpoint works, let's break down the anatomy of an example API request. In this case we analyze the population by state in 2023:

https://api.datausa.io/tesseract/data.jsonrecords?cube=acs_yg_total_population_5&drilldowns=State,Year&measures=Population&include=Year:2023&limit=100,0

1. Root URL

https://api.datausa.io/tesseract/data.jsonrecords

This is the root URL for the API. It will be the same for all requests. This will return a JSON object with a list of records. If you prefer a different format, you can change the extension to.csv, .xlsx, or .parquet.

2. Cube (or dataset)

?cube=acs_yg_total_population_5

This is the dataset you want to query. You can find the list of all available datasets by querying the cubes endpoint.

3. Drilldowns

&drilldowns=State,Year

This is the drilldown you want to add to your query. Drilldowns represent the columns (or dimensions) that you want to see in the response. You can find the list of all available drilldowns for this specific cube (or dataset) by querying the acs_yg_total_population_5 cube endpoint.

4. Measures

&measures=Population

This is the measure you want to add to your query. Measures represent the columns (or metrics) that you want to see in the response. You can find the list of all available measures for this specific cube (or dataset) by querying the acs_yg_total_population_5 cube endpoint. It is important to note that measures will automatically be aggregated by the cube's default aggregation function.

5. Include (or filter)

&include=Year:2023

These are the filters you want to add to your query. Any column (or dimensions) availble to the specified cube can be used to filter the response. Multiple filters can be added by separating them with a semicolon. Filters take 2 parameters: the column name and the value you want to filter by. To find the available values for a specific column, you can query the members endpoint for that cube along with the level parameter. For example, to find the available values for the year column, you can query the 'Year' members endpoint for the acs_yg_total_population_5 cube.

6. Limit and Offset

&limit=100,0

This is the limit you want to add to your query where the first number is the limit and the second number is the offset. It is used to limit the number of records returned in the response.

Anatomy of a Response

To understand how the data is returned, let's break down the anatomy of an example response. In this case we're we'll stick with our previous example of the population by state in 2023:

https://api.datausa.io/tesseract/data.jsonrecords?cube=acs_yg_total_population_5&drilldowns=State,Year&measures=Population&include=Year:2023&limit=100,0

Response Format

The response will be a JSON object with the following structure:

{
  "annotations":
    {
      "subtopic": "Demographics",
      "dataset_link": "http://www.census.gov/programs-surveys/acs/",
      "table_id": "B01003",
      "dataset_name": "ACS 5-year Estimate",
      "topic": "Diversity",
      "source_name": "Census Bureau",
      "source_description": "The American Community Survey (ACS) is conducted by the US Census and sent to a portion of the population every year."
    },
  "page":
    {
      "limit": 0,
      "offset": 0,
      "total": 52
    },
  "columns":
    [
      "State ID",
      "State",
      "Year",
      "Population"
    ],
  "data": [
    {
      "State ID": "04000US01",
      "State": "Alabama",
      "Year": 2023,
      "Population": 5054253
    },
    {
      "State ID": "04000US02",
      "State": "Alaska",
      "Year": 2023,
      "Population": 733971
    },
    {
      "State ID": "04000US04",
      "State": "Arizona",
      "Year": 2023,
      "Population": 7268175
    },
    ...
  ]
}

1. Annotations

The annotations object contains metadata about the dataset such as the source name, source description, topic, subtopic, and dataset name.

2. Pagination

The page object contains metadata about the pagination of the data response, such as the limit, offset, and total number of records available for that set of parameters.

3. Columns

The columns object contains the list of columns in the dataset. Please note that even though we didn't specify to include the ID columns in the request, they are still returned.

4. Data

The data object contains the list of records in the dataset. The keys are the column names (including both drilldown and measure columns) and the values are the values for that column for the given record.

Advanced Parameters

The API supports a number of advanced parameters that allow you to filter and sort the data.

1. Include Members

The include parameter allows you to specify the unique members of a dimension that should be considered when calculating measures. Users can specify one or more members to include, separated by commas (,). The format is as follows:

&include=<level_name>:<member_key>,<member_key>,<member_key>

You can specify one or more include parameters in the request, either in different keys or in the same key separated by semicolons (;).

&include=<level1>:<level1_member1>&include=<level2>:<level2_member1>,<level2_member2>
&include=<level1>:<level1_member1>;<level2>:<level2_member1>,<level2_member2>

For example, the following API call will include data only for the 2023 and for Alabama:

https://api.datausa.io/tesseract/data.jsonrecords?cube=acs_yg_total_population_5&measures=Population&include=Year:2023;State:04000US01&drilldowns=State,Year

2. Exclude Members

In the same way that include specifies the unique members that should be considered, the exclude parameter specifies that all available members except those mentioned should be considered in the aggregation. The format is the same:

&exclude=<level1>:<level1_member1>;<level2>:<level2_member1>,<level2_member2>
&exclude=<level1>:<level1_member1>&include=<level2>:<level2_member1>,<level2_member2>

For example, the following API call will return data for all available states except for Alabama:

https://api.datausa.io/tesseract/data.jsonrecords?cube=acs_yg_total_population_5&measures=Population&exclude:State:04000US01&drilldowns=State

3. Parents

The parents parameter allows you to include parent members in hierarchical dimensions. You can either specify specific levels to include parents for, or use a boolean value to include all parents. The format is as follows:

&parents=<level1>,<level2>
&parents=true

For example, the following API call will include parent members for the County level:

https://api.datausa.io/tesseract/data.jsonrecords?cube=acs_yg_total_population_5&measures=Population&include=Year:2023&drilldowns=County,Year &parents=True

4. Filters

The filters parameter allows you to filter results based on measure values using various comparison operations. The basic format is:

&filters=<measure>.<operation>

Null Operations

You can filter for null or non-null values using:

&filters=Population.isnull
&filters=Population.isnotnull

Comparison Operations

The following comparison operations are supported:

  • gt Greater Than (>)
  • gte Greater Than or Equal (>=)
  • lt Less Than (<)
  • lte Less Than or Equal (<=)
  • eq Equal (==)
  • neq Not Equal (!=)

For example, to filter for Population greater than 30 million:

&filters=Population.gt.30000000

You can also combine multiple conditions using .and./.or.. For example, to filter for population between 250,000 and 750,000:

&filters=Population.gt.250000.and.lt.750000

5. Ranking and Sort

The API provides two parameters for ordering results: ranking and sort.

Ranking

The ranking parameter allows you to rank results based on a measure. You can specify the measure name with an optional minus sign for descending order, or use a boolean value. The format is:

&ranking=<measure>
&ranking=-<measure>
&ranking=<measure1>,-<measure2>
&ranking=true

Sort

The sort parameter allows you to sort results by either a measure or a level (dimension). You can specify ascending or descending order. The format is:

&sort=<measure or level>.asc
&sort=<measure or level>.desc

For example, to sort by Population in ascending order:

https://api.datausa.io/tesseract/data.jsonrecords?cube=acs_yg_total_population_5&measures=Population&include=Year:2023&drilldowns=State,Year &sort=Population.asc

Or, to fetch the top state above a threshold of 30 million sorted by year:

https://api.datausa.io/tesseract/data.jsonrecords?cube=acs_yg_total_population_5&measures=Population&drilldowns=State,Year&filters=Population.gt.30000000 &sort=Year.desc

6. TopK

The TopK parameter allows you to get the top N records for each unique combination of specified levels, sorted by a measure or level. The format is:

&top=<amount>.<level1>[,<level2>].<measure or level>.<order>

For example, to get the year of the highest population for each state:

https://api.datausa.io/tesseract/data.jsonrecords?cube=acs_yg_total_population_5&measures=Population&drilldowns=State,Year&top=1.State.Population.desc

Or, to get the 2 years with the highest population for each state:

https://api.datausa.io/tesseract/data.jsonrecords?cube=acs_yg_total_population_5&measures=Population&drilldowns=State,Year&top=2.State.Population.desc

Note: All columns mentioned in the parameter must be present in the request, either in drilldowns or measures.

7. Time

The time parameter provides special filtering capabilities for time-based dimensions. It supports two main formats:

Latest/Oldest Format

This format allows you to fetch the most recent or oldest data points available in the dataset:

&time=<dimension>.latest[.amount]
&time=<dimension>.oldest[.amount]

Trailing/Leading Format

This format allows you to set a time frame relative to the most/least recent record available in the dataset. Works similarly to the latest/oldest format, but is strictly time-based:

&time=<dimension>.trailing[.amount]
&time=<dimension>.leading[.amount]