Frequently Asked Questions (FAQ)

What is GovQuery?

We built an AI-powered search engine to make key government sources easier to explore and access. It provides a single place to search across a range of sources, as well as the ability to search inside tens of thousands of documents, with deep linking of results to the exact page where text can be found.

We started by prioritizing sources that offer insights into program integrity and accountability. But as we built the search engine, we saw the broader value these sources provide in helping people understand how government operates. That perspective continues to guide our approach to expanding our data coverage, which currently includes:

Executive Orders

Executive Orders for the last 6 presidents, as provided by the Federal Register.

Department of Justice

All 200 thousand press releases as extracted from the DOJ API

Congressional Research Service

All 22 thousand reports as found on EveryCRSReport.com

Government Accountability Office

10.5 thousand Reports published since 2010, and 5.2 thousand Open Oversight Recommendations

Offices of Inspectors General

30 thousand Oversight.gov Federal Reports published since 2000, and 12.5 thousand Open Oversight Recommendations

Since we collect recommendations before 2010, we also selectively included the pre-2010 reports that contain these recommendations. The source websites only provide open recommendations, so PIA infers that a recommendation is closed if it disappears from the open datasets.

Note, the totals above will change each week as data is ingested.

How far back do the data and reports in PIA’s search engine go? 

We capture all open recommendations from Oversight.gov and GAO’s public recommendations database. We have also collected all GAO reports published since 2010, and OIG reports since 2000, along with any earlier reports that are cited in oversight recommendations. The only exception is that approximately 15% of reports listed on Oversight.gov do not have working links to the actual report files, typically where the report itself is actually hosted on another website.

How does PIA’s search engine differ from the search features on Oversight.gov and gao.gov?

Oversight.gov and GAO have very powerful search engines for finding reports based on their description and other attributes, and we are not aiming to replace these. Where we add value is in enabling users to search within reports, and we show results that link to the exact page in reports where the text can be found. We have also incorporated AI features that support searching for images by their description as well as filtering by PIA’s AI-generated tags. Finally, in offering multiple data sources in one search interface, people can perform global searches.

What does PIA’s search engine offer that other web search engines like Google do not?

Public search engines like Google do an excellent job of indexing public government data sources, including PDF reports from GAO and Oversight.gov. However, these search engines typically surface results at a high level compared to our search engine, often returning only a single link per report rather than linking directly to relevant individual pages. They also do not allow users to efficiently conduct targeted searches of open recommendations across oversight bodies, as PIA’s search engine does.  

How does the agency filter work in GovQuery?
This filter lets you narrow your search by the agencies associated with reports. Where possible, we use the agencies assigned to documents in the source system; when this information isn’t available, we use AI to tag documents automatically. The agencies on documents are defined as follows:
 
  • GAO – The same agencies as found on the filter ‘by Agency’ on the reports and testimonies search page.
  • Oversight.gov – The field ‘Agencies Reviewed/Investigated’ on the report pages, see here for an example.
  • CRS Reports and Federal Register Executive Orders – Since the source systems don’t associate documents with agencies, we use AI to predict which agencies each document belongs to by looking for agencies explicitly mentioned its text.
In all cases, we include parent agencies if a sub-tier agency is identified, and map data to USASpending.gov agency codes.

 

Recommendation Spotlight FAQ

What is the Recommendation Spotlight?

The Recommendation Spotlight is a curated, searchable tool that makes it easier to access and use federal oversight recommendations. It draws from reports and recommendation databases published by the U.S. Government Accountability Office (GAO) and federal Offices of Inspector General (OIGs), primarily via GAO’s website and Oversight.gov, a website that consolidates all public reports of OIGs that are members of the Council of the Inspectors General on Integrity and Efficiency (CIGIE).

The Spotlight is built on PIA’s custom data lakehouse, which brings together documents and data from multiple public sources in one place. It allows us to clean, organize, and analyze large volumes of information and power the AI tools behind the Spotlight. Recommendation text appears exactly as published in the original reports. PIA does not rewrite or reinterpret any content. The image below provides an overview of our process for creating the Spotlight.

Recommendations Data flow PIA's Lakehouse

What is the purpose of the Recommendation Spotlight?

We created the Spotlight to make federal oversight recommendations easier to find and act on, helping policymakers, civil servants, and researchers quickly identify evidence-based findings to improve government and safeguard integrity.

While oversight bodies publish recommendations on their websites and Oversight.gov, they are often difficult to compare across agencies or to analyze by theme or topic at scale. The Spotlight addresses these challenges by organizing oversight data in a way that supports targeted searches, thematic analysis, and comparative insights.

What makes the Spotlight different from other search tools or databases with oversight recommendations?

Unlike generic search engines or agencies’ websites, the Spotlight combines recommendations across multiple oversight platforms and standardizes them into a single, searchable interface. We enrich the information with structured metadata, AI-assisted classifications, and filters that support deeper analysis and practical use.

How should I cite data or visualizations from this site?

We strongly encourage citing both the original source agencies and the Program Integrity Alliance (PIA) when referencing data, visualizations, or analysis from this site.

Most of the information presented is drawn directly from public sources such as the U.S. Government Accountability Office (GAO) and Oversight.gov (the Office of Inspector General community). We always link back to these sources, and we ask that you include them in your citations.

At the same time, the Program Integrity Alliance adds value by:

  • Extracting and parsing data from unstructured formats (e.g. web pages, PDFs),
  • Cleaning and normalizing data across agencies and formats,
  • Enriching records with additional metadata and classifications (e.g. agency mapping, policy domain, integrity tags),
  • Providing curated summaries, search tools, and visualizations to support exploration and research.

Suggested citation
U.S. Government Accountability Office; various Offices of Inspector General. Data retrieved via Program Integrity Alliance (PIA). Spotlight on Oversight Recommendations. programintegrity.org

Where feasible, please also include:

  • A direct link to the original recommendation or report, and
  • The name of the issuing source agency (e.g., GAO, HHS OIG, DoD OIG).

Citing both PIA and the original data sources helps promote transparency, accountability, and the continued improvement of public-interest tools like this one.

 

Data Sources & Content

What are the specific data sources used to build the Spotlight?

The Spotlight consolidates recommendations of the GAO and federal OIGs, which they generally produce during their audits, evaluations, and investigations of government programs:

Does the Spotlight include all recommendations from GAO and federal OIGs?

We have close to 100% of all recommendations made available by GAO and Oversight.gov. However, some recommendations may not be publicly released, and others may be missing key information—such as agency affiliation—that affects whether they appear in aggregated data or filtered views.

As a nonprofit working with open data, PIA does not have direct access to internal GAO or OIG systems. Our coverage depends on publicly available sources, and while we strive to ingest and process all accessible recommendations, a small fraction may be excluded or only partially represented due to limitations in the source data itself (see below for further details). We are continuously improving our processes to better identify and include these cases.

If you have concerns about completeness or want to cross-check recommendations, we encourage you to also browse the websites linked above or individual OIG websites directly.

How often is the Recommendation Spotlight updated?

Updates occur once a week on weekends to avoid peak hours. The timing of when new recommendations appear in Spotlight depends on the release of new reports and the processing timeline for incorporating them.

How far back does the data and reports in PIA’s search engine go? 

We capture all recommendations from Oversight.gov and GAO’s public recommendations database. We also have collected all GAO and OIG reports published after 2010, as well as any reports published before 2010 if they are referenced in oversight recommendations. 

What fields are available in the recommendations data?

The fields in the Spotlight include metadata and information made publicly available on GAO’s website and Oversight.gov and in collected reports. There are also several fields that PIA created using rules-based filters or predicted using AI based on the original information. These fields are indicated below in parentheses (i.e., “PIA” and “PIA AI”).

The full list of fields available in the Spotlight are defined as follows:

  • Source: Indicates whether the recommendation comes from GAO or a federal OIG via the Oversight.gov website.
  • Entity: The entity reviewed in the recommendation (as indicated in the source systems GAO or Oversight.gov). This can be a top-tier, sub-tier agency. Note: GAO sets the agency field to “Congress” for any recommendations considered a matter for Congress.
  • Top-Tier Agency: The top-tier federal agency for the recommendation, as mapped using agency codes from USAspending.gov. We were not able to map some agencies (< 0.5%) in the source data because the agency name on the recommendation didn’t align with standard names in the agency data.
  • Recommendation: The specific recommendation made in the report.
  • Report: The title of the report where the recommendation appears.
  • Report hosting page: The web page on GAO or Oversight.gov where the report is published. This page may include background and additional context.
  • Report URL: The direct link to the downloadable PDF version of the report.
  • Status: Indicates whether the recommendation is still open or has been resolved. Most are marked “Open,” as the Spotlight focuses on unresolved recommendations.
  • Age: Length of time the recommendation has remained open, grouped by years.
  • Integrity-related (PIA AI): Indicates whether the recommendation is related to program integrity, as predicted using a large language model and the recommendation text. If there is any uncertainty due to insufficient context, we categorize the recommendation as “No” for this field. See “How does PIA use AI or Large Language Models (LLMs) to build and maintain the Spotlight?” below for more details on our AI process.
  • Priority flag: Marks whether the recommendation is considered priority. For GAO this is set in the “Priority” column of the recommendations CSV downloaded from their recommendations search. For Oversight.gov, this is determined by the column “Significant Recommendation” in the recommendation table found on “View Report” web pages.
  • GAO Audit Topics: The subject areas GAO assigned to the audit.
  • Fraud Risk Management Theme (PIA AI): The Fraud Risk Management Theme associated with the recommendation, where the possible values are: “Governance & Capacity”, “Risk Assessment”, “Design & Implementation of Controls”, “Detection”, “Response & Evaluation”, “Data & Technology”, “Other”. This field is set using AI to predict the category based on the recommendation text. As a foundation for defining the categories, we referenced the Government Accountability Office’s Framework for Managing Fraud Risks in Federal Programs, as well as the Chief Financial Officers Council and the Bureau of the Fiscal Service’s Program Integrity: the Antifraud Playbook. See also “How does PIA use AI or Large Language Models (LLMs) to build and maintain the Spotlight?” below for more details on our AI process.
  • Matter for Congress (PIA): Indicates if the recommendation was intended for Congressional action, as indicated on GAO recommendations. Agency is set to “Congress.”
  • Report Type: The type of report the recommendation comes from. For Oversight.gov, this is based on the report type listed in the recommendation search results. For GAO, the type defaults to “Report.”
  • Recommendation URL: The direct link to the recommendation on the Oversight.gov website, where known. This is not available on GAO’s website, so this links to the report’s hosting page instead.
  • Issue date: The date when the recommendation was officially released.
  • Federal Fiscal Year: The federal government’s fiscal year that corresponds to the recommendation’s issue date, which starts on October 1st and ends on September 30th of the following year.
  • Covid Year: Whether the recommendation was issued during the primary years of the COVID-19 pandemic in the U.S. (i.e., 2020, 2021, or 2022).  
  • Agency Comments: Any comments or responses from the audited agency about the recommendation. Available for GAO reports; not yet included for Oversight.gov. For GAO this is provided in the “Comment” column of the recommendations CSV downloaded from the recommendations database.
  • Report Author: The author of the report the recommendation is from. For GAO this is provided in the “Author” column of the recommendations CSV downloaded from the recommendations search. For Oversight.gov, this field is set to the Auditing OIG as provided in the recommendations search results.
  • Report Issue Date: The date the full report (not just the recommendation) was published.
  • Report ID: The unique ID assigned to the report in the GAO or Oversight.gov system.
  • Data Updated (PIA): The most recent date this information was added or updated in PIA’s database.
  • PIA Id: An internal tracking number used by PIA to reference this recommendation in its database.
Do you edit or adapt the recommendations at all? 

No. We use the recommendations exactly as they appear from the original sources, without any edits. We do add extra AI-assisted fields to make data analysis easier (as indicated using “PIA AI”). These fields rely on the original recommendation text as a reference.

Does the Spotlight include recommendations by state and local oversight bodies as well?

Currently, the Spotlight only includes recommendations issued by GAO and federal OIGs reported on Oversight.gov, as described above. These recommendations generally target federal spending, which can involve different levels of government.

Are there any limitations or known issues with the data used in the Spotlight? 

Oversight.gov Recommendations 

Though PIA recommendation totals have consistently matched the totals published on the GAO website, minor discrepancies exist between PIA and Oversight.gov. This public data is only accessible via a website, which requires complex parsing that depends on the internal consistency of the site’s content. Additionally, we have identified the following:

  • The total number of recommendations on Oversight.gov is approximately 1.5% (around 280 recommendations) higher than what we were able to extract. We believe this discrepancy may relate to the issues described below; however, without access to the internal system, we cannot determine the exact cause. 
  • As of August 7, 2025, the priority filter in the Oversight.gov search engine returns only 66 priority recommendations. In contrast, the individual recommendation pages indicate that approximately 3,500 recommendations have “Significant recommendation” marked as “Yes.” 
  • Approximately 15% of recommendations only have data presented in recommendations search results pages, and do not have a corresponding “View Report” or PDF report link. These recommendations are therefore missing fields in the source system, such as “Significant Recommendation.” 
  • Around 2% of recommendations have blank text for the recommendation itself. 
  • Another 2% have the recommendation text redacted due to sensitivity or security concerns. 
  • Seven recommendations are not associated with a report or agency. 
  • Fewer than 0.5% of recommendations are duplicated in the source system, where the same recommendation text appears twice within the same report. 

GAO recommendations  

  • Recommendations flagged as “matters for Congress” are assigned to the agency “Congress,” even when they relate to specific agencies. 
  • Sensitive reports have been redacted, and so associated recommendations are likely excluded.  

Finally, we mapped the top-tier agency for each recommendation to agency data from USAspending.gov, but were unable to map 0.3% of recommendations because the agency on the recommendation didn’t match any known top-tier or sub-tier agencies in the USAspending data.  

Can I bookmark searches, table settings, and trace diagrams in Spotlight?

Sure! The web address can be bookmarked or shared so that you can come back to exactly the spot you left off.

 

 

 

Technology and Use of Artificial Intelligence

How does PIA use AI or Large Language Models (LLMs) to build and maintain the Spotlight?

LLMs are used to extract, summarize, and classify recommendations from lengthy oversight reports. The models help identify key themes and assign relevant tags, improving searchability and organization and are especially useful in situations where there isn’t enough labelled data to train a traditional machine learning model. 

AI is also used as part of the software development process to help develop data processing pipelines.  

How do you ensure the accuracy and reliability of AI-generated outputs?

All AI-generated outputs are always grounded in GAO and Oversight.gov data and information; they never generate outputs based on their raw web training data. Any LLM output is automatically evaluated and reviewed closely by subject-matter experts before inclusion. We combine automation with human oversight to maintain quality and trustworthiness.

How do you develop the LLM process and test its accuracy?

For the Recommendations Spotlight, where we apply AI (specifically LLMs) to predict tags based on recommendation text, we developed a process that minimizes the burden on human experts while still relying on their input to define and evaluate the AI.  

We begin by writing a prompt based on expert-defined rules that guide how to assign tags. We then use this to generate an initial draft set of predicted tags across the whole dataset, which helps reveal the distribution of tag values despite some expected inaccuracies. We then sample a balanced evaluation set of 200 records and ask human reviewers to verify and correct the predicted tags. Next, we re-run the LLM on this evaluation set and compare its predictions to the human-reviewed tags, calculating metrics such as accuracy and recall. We analyze mismatches to determine whether the LLM or the human reviewers made an error, and we refine the prompt or dataset accordingly.  

We repeat this cycle of evaluation and adjustment until we stabilize both the prompt and the evaluation set. Finally, we apply advanced techniques—such as Chain of Thought prompting or experimenting with different LLMs and settings—to further improve accuracy, logging our evaluation metrics to track progress. 

The final evaluation is run automatically as part of our data processing pipelines so that we can track any drift in accuracy. 

What LLMs or AI tools are used in the Spotlight’s backend?

All data used in Spotlight exists in the public domain, but to ensure best data management practice we only use LLMs which are part of PIA’s cloud infrastructure in Microsoft Azure as this ensures data is never shared beyond the PIA organization. 

The Azure models used in Spotlight are GPT 4.1 Mini, GPT 4.1, as used in the following areas: 

  • Tagging of recommendations to enable richer analysis 
  • As part of AI search of recommendations and reports when indexing documents and summarizing search results 
  • As part of optimizing and accelerating the software development process to create PIA’s products 

All models have real time and granular logging in place to ensure observability, and have been configured to utilize content safety filters.

How do you extract and classify recommendations from oversight reports?

We use a combination of natural language processing, prompt engineering, and rules-based filtering to extract recommendation text and assign categories based on defined taxonomies. Human review ensures consistency.

Is the AI summarizing recommendations or just helping retrieve them?

We present tabular recommendations data as-is without summarization. AI is however used as part of Search to provide a concise summary of search results with supporting citations. 

How do you prevent bias or errors introduced by AI?

By grounding AI responses to use only GAO and Oversight.gov data, we mitigate bias risk associated with using wider content extracted from the web. We also perform human review to assess and score AI outputs to ensure they remain unbiased.

How are you using AI for software development? 

AI has become an integral part of accelerating software development of our data pipelines as well as user interfaces and website assets. The main applications we use are Cursor AI Code editor and GitHub Copilot. 

Verification & Trustworthiness

How do you verify the recommendations are accurate and up to date?

All recommendations are directly linked to primary source documents and are reviewed for accuracy. We make updates when oversight bodies publish progress or changes concerning the implementation of recommendations.

Are recommendations linked to the original oversight reports?

Yes. Each entry includes a link to the source document, typically hosted on Oversight.gov, gao.gov, or agency websites.

Feedback & Future Development

How can I provide feedback or request new features?

You can send feedback through this form. We welcome suggestions on functionality, usability, and additional features.

How do you plan to improve the Spotlight in the future?

We will continue to improve the Spotlight based on user feedback and needs of our communities. This includes feedback on AI-assisted fields that can help auditors and researchers to more quickly assess recommendations that are relevant for their work. We will also continue to improve our processes for ingesting and categorizing recommendations as oversight bodies issue new ones.