Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

scip-ctags: better error propagation#61712

Merged
jtibshirani merged 2 commits into
mainfrom
jtibs/ctags-panic
Apr 10, 2024
Merged

scip-ctags: better error propagation#61712
jtibshirani merged 2 commits into
mainfrom
jtibs/ctags-panic

Conversation

@jtibshirani

@jtibshirani jtibshirani commented Apr 9, 2024

Copy link
Copy Markdown
Contributor

This change corrects the error handling in scip-ctags, making it more
consistent with universal-ctags. We now split errors into two classes:

  • Fatal: user command is invalid or incorrectly formatted, I/O errors. This
    indicates a big issue in how we're using ctags and we should fail hard.
  • Non-fatal: any file parsing issue (invalid UTF-8, unsupported language, or all symbols
    can't be parsed).

Our go-ctags wrapper only reports fatal errors, and just logs non-fatal errors.
We fail indexing for an entire Zoekt repo on go-ctags errors, which helps
surface serious issues with ctags, while allowing parse failures for individual files.

Fixes https://github.com/sourcegraph/sourcegraph/issues/61785
Fixes ENG-23595

Test plan

Added a new snapshot test for an error case, and made sure the output looked
good. Manual testing, including indexing the github-linguist/linguist repo
locally and checking it now succeeds.

@cla-bot cla-bot Bot added the cla-signed label Apr 9, 2024
@github-actions github-actions Bot added team/product-platform team/search-platform Issues owned by the search platform team labels Apr 9, 2024
Ok(vals) => vals,
Err(err) => {
// TODO: Not sure I want to keep this or not
#[cfg(debug_assertions)]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this since I think we should always surface errors instead of silently squashing them.

@jtibshirani

jtibshirani commented Apr 9, 2024

Copy link
Copy Markdown
Contributor Author

Highlighting that I chose to make "unsupported language" a non-fatal error. In terms of user impact, this means we won't fail indexing an entire Zoekt repo if we can't parse a single file due to its language.

Background: this happens sometimes because we can misroute files. Zoekt performs precise language detection, so it classifies iamjs.pl as Javascript and passes it to scip-ctags. However, scip-ctags thinks this is Perl, which it can't handle. (We should fix this too, but it's been a problem for a while, both with universal-ctags and scip-ctags!)

Note: as a follow-up, we should improve Zoekt error alerting to clearly surface when we fail to parse a file, so we can iteratively fix these issues.

@jtibshirani jtibshirani marked this pull request as ready for review April 9, 2024 17:58
@mmanela mmanela requested a review from varungandhi-src April 9, 2024 18:27

@varungandhi-src varungandhi-src left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Looks directionally OK, but it seems a bit odd that we would start treating malformed requests as fatal errors -- is that for matching the behavior of universal-ctags?

Comment thread docker-images/syntax-highlighter/crates/syntax-analysis/testdata/example.pl Outdated
@jtibshirani jtibshirani merged commit 47ab5f5 into main Apr 10, 2024
@jtibshirani jtibshirani deleted the jtibs/ctags-panic branch April 10, 2024 16:13
sourcegraph-release-bot pushed a commit that referenced this pull request Apr 12, 2024
This change corrects the error handling in scip-ctags, making it more
consistent with universal-ctags. We now split errors into two classes:
* Fatal: user command is invalid or incorrectly formatted, I/O errors. This
indicates a big issue in how we're using ctags and we should fail hard.
* Non-fatal: any file parsing issue (invalid UTF-8, unsupported language, or all symbols
can't be parsed).

Our go-ctags wrapper only reports fatal errors, and just logs non-fatal errors.
We fail indexing for an entire Zoekt repo on go-ctags errors, which helps
surface serious issues with ctags, while allowing parse failures for individual files.

(cherry picked from commit 47ab5f5)
BolajiOlajide pushed a commit that referenced this pull request Apr 15, 2024
scip-ctags: better error propagation (#61712)

This change corrects the error handling in scip-ctags, making it more
consistent with universal-ctags. We now split errors into two classes:
* Fatal: user command is invalid or incorrectly formatted, I/O errors. This
indicates a big issue in how we're using ctags and we should fail hard.
* Non-fatal: any file parsing issue (invalid UTF-8, unsupported language, or all symbols
can't be parsed).

Our go-ctags wrapper only reports fatal errors, and just logs non-fatal errors.
We fail indexing for an entire Zoekt repo on go-ctags errors, which helps
surface serious issues with ctags, while allowing parse failures for individual files.

(cherry picked from commit 47ab5f5)

Co-authored-by: Julie Tibshirani <julietibs@apache.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect error propagation in scip-ctags

2 participants