Skip to content

Conversation

@turian
Copy link
Contributor

@turian turian commented Sep 11, 2022

Summary

This is a kludgy workaround that "what we can do for now is just add an exception catcher that skips trying to index those files if they throw encoding errors + display a warning like (> Warning: Skipped adding some files to full-text index as they are not in UTF-8 format)."

Related issues

#984

#1014

Adapted from #984 (comment)

since this bug is a showstopper for me as well as @jgoerzen

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

Notes

Ideally, we would have a conf config that disables or enables hard stop on UTF8 error.

I don't understand archivebox well enough to know that if, my workaround gets halfway through, and then we get archivebox > 0.6.3 and it fixes this bug, it will complete the rest of the pipeline.

@turian turian closed this Sep 11, 2022
@lgtm-com
Copy link

lgtm-com bot commented Sep 11, 2022

This pull request introduces 2 alerts when merging 2b58cce into 03eb7e5 - view on LGTM.com

new alerts:

  • 2 for Unused local variable

@pirate pirate merged commit 9b65639 into ArchiveBox:dev Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants