Page MenuHomePhabricator

Improve & better document cirrus debug & explainability APIs
Open, Needs TriagePublic5 Estimated Story Points

Description

In T410602 a contributor relied on some of the cirrus debug APIs to troubleshoot an issue with the search index. This was particularly useful since it allowed to give rapid feedbacks to other contributors about what is possibly happening but also write a detailed phab bug report that greatly sped up the troubleshooting done by the cirrus maintainers.

We should better document these APIs so that it becomes easier to understand & explain search behaviors.

Missing debug APIs:

  • dump the completion index document (i.e. action=cirrusSuggestDump)
  • possibly allow to specify the cluster with action=cirrusDump

Document existing APIs:

  • indexed documents: action=cirrusDump (and future action=cirrusSuggestDump)
  • search explainability: cirrusDumpQuery, cirrusDumResults, cirrusExplain
  • document building: cirrusbuilddoc & cirruscompsuggestbuilddoc

AC:

  • missing APIs are implemented
  • all APIs are documented in mw.org

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
dcausse renamed this task from Improve & better document cirrus debug & exaplainability APIs to Improve & better document cirrus debug & explainability APIs.Nov 27 2025, 10:56 AM

As the person who filed T410602, it would have been nice to have an API taking in a search query and returning a set of results, as well as explaining why these results were chosen (e.g title, related, displaytitle, defaultsort, etc.) and their ranking (pageviews, etc.) Possibly unrelated, but a Wikitech/Mediawiki page containing a step-by-step guide on how the Search system works (as well as the function names/files involved in each step) would have been helpful as well, since I found the codebase to be labrynthine.

pfischer set the point value for this task to 5.Dec 1 2025, 4:52 PM

As the person who filed T410602, it would have been nice to have an API taking in a search query and returning a set of results, as well as explaining why these results were chosen (e.g title, related, displaytitle, defaultsort, etc.) and their ranking (pageviews, etc.) Possibly unrelated, but a Wikitech/Mediawiki page containing a step-by-step guide on how the Search system works (as well as the function names/files involved in each step) would have been helpful as well, since I found the codebase to be labrynthine.

This does exist, but it's probably not too meaningful to people outside the search team:

https://www.mediawiki.org/w/index.php?search=opensearch&title=Special%3ASearch&profile=advanced&fulltext=1&cirrusDumpResult&cirrusExplain=pretty
https://www.mediawiki.org/w/index.php?search=opensearch&title=Special%3ASearch&profile=advanced&fulltext=1&cirrusDumpResult&cirrusExplain=verbose
https://www.mediawiki.org/w/index.php?search=opensearch&title=Special%3ASearch&profile=advanced&fulltext=1&cirrusDumpResult&cirrusExplain=raw

They are even less understandable on wikis that use ML for ranking:
https://en.wikipedia.org/w/index.php?search=opensearch&title=Special%3ASearch&profile=advanced&fulltext=1&cirrusDumpResult&cirrusExplain=pretty

Workin on the documentation at P86859. This is still very preliminary, but once it's all pinned down better it will end up somewhere on mediawiki.org

Change #1225644 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] Add unified cirrus-schema-dump API endpoint

https://gerrit.wikimedia.org/r/1225644

Change #1225661 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] Add action=cirrussuggestdump

https://gerrit.wikimedia.org/r/1225661

Documentation has been placed: https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug

This is not yet linked anywhere, it will probably go somewhere in Help:CirrusSearch but haven't decided where yet. There are a couple things documented in there that are waiting on patch merge / deploy.

Change #1225644 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Add unified cirrus-schema-dump API endpoint

https://gerrit.wikimedia.org/r/1225644

Change #1225661 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Add action=cirrussuggestdump

https://gerrit.wikimedia.org/r/1225661

Not entirely sure, but this might be reasonable to include in tech news.

@EBernhardson how should it be worded?

Not entirely sure, but this might be reasonable to include in tech news.

@STei-WMF

I think the existing lead section should be short enough, something like:

New documentation is available to debug on-site search features aimed at wiki editors who want to troubleshoot search issues, understand indexing behavior, or debug search relevancy. If a page isn't appearing in search results, results are ranked unexpectedly, or you want to see what content is actually being indexed, these tools can help. [https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug Learn more].

@EBernhardson thank you. Should it go out in Monday 16th's Tech News? (The wording suggests so, but I want to doublecheck)

@STei-WMF

I think the existing lead section should be short enough, something like:

New documentation is available to debug on-site search features aimed at wiki editors who want to troubleshoot search issues, understand indexing behavior, or debug search relevancy. If a page isn't appearing in search results, results are ranked unexpectedly, or you want to see what content is actually being indexed, these tools can help. [https://www.mediawiki.org/wiki/Help:CirrusSearch/Debug Learn more].

@EBernhardson thank you. Should it go out in Monday 16th's Tech News? (The wording suggests so, but I want to doublecheck)

Yup, this is ready to go. Thanks!