Skip to content

Bug: Exception in archive_methods.save_readability due to bytes string being passed to hint #706

@Valporaena

Description

@Valporaena

I'm encountering the same problem user @jrruethe already described some time ago. Seems like it was solved, but it reoccurred on my setup after installing the latest update and running archivebox setup command for some reason.

  1. Ran arcivebox update (several times, it reproduces)
  2. On a specific link it crashes, giving the following output
[√] [2021-04-15 10:56:49] "The Long War on Objectivity       | The New Republic"
    https://newrepublic.com/article/158497/long-war-objectivity
    √ ./archive/1617309812.979884
      > readability
    ! Failed to archive link: Exception: Exception in archive_methods.save_readability(Link(url=https://newrepublic.com/article/158497/long-war-objectivity))

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/archivebox/extractors/__init__.py", line 114, in archive_link
    log_archive_method_finished(result)
  File "/usr/lib/python3/dist-packages/archivebox/logging_util.py", line 435, in log_archive_method_finished
    hints = hints if isinstance(hints, (list, tuple)) else hints.split('\n')
TypeError: a bytes-like object is required, not 'str'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/archivebox", line 11, in <module>
    load_entry_point('archivebox==0.6.2', 'console_scripts', 'archivebox')()
  File "/usr/lib/python3/dist-packages/archivebox/cli/__init__.py", line 140, in main
    run_subcommand(
  File "/usr/lib/python3/dist-packages/archivebox/cli/__init__.py", line 80, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "/usr/lib/python3/dist-packages/archivebox/cli/archivebox_update.py", line 119, in main
    update(
  File "/usr/lib/python3/dist-packages/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/archivebox/main.py", line 783, in update
    archive_links(to_archive, overwrite=overwrite, **archive_kwargs)
  File "/usr/lib/python3/dist-packages/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/archivebox/extractors/__init__.py", line 181, in archive_links
    archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
  File "/usr/lib/python3/dist-packages/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/archivebox/extractors/__init__.py", line 130, in archive_link
    raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format(
Exception: Exception in archive_methods.save_readability(Link(url=https://newrepublic.com/article/158497/long-war-objectivity))
ArchiveBox v0.6.2
Cpython Linux Linux-5.4.0-71-generic-x86_64-with-glibc2.29 x86_64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/bin/archivebox                                                         
 √  PYTHON_BINARY         v3.8.5          valid     /usr/bin/python3.8                                                          
 √  DJANGO_BINARY         v2.2.12         valid     /usr/lib/python3/dist-packages/django/bin/django-admin.py                   
 √  CURL_BINARY           v7.68.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.20.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v10.19.0        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v0.3.16         valid     ./node_modules/single-file/cli/single-file                                  
 √  READABILITY_BINARY    v0.0.2          valid     ./node_modules/readability-extractor/readability-extractor                  
 √  MERCURY_BINARY        v1.0.0          valid     ./node_modules/@postlight/mercury-parser/cli.js                             
 √  GIT_BINARY            v2.25.1         valid     /usr/bin/git                                                                
 -  YOUTUBEDL_BINARY      -               disabled  /usr/bin/youtube-dl                                                         
 √  CHROME_BINARY         v89.0.4389.114  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v11.0.2         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /usr/lib/python3/dist-packages/archivebox                                   
 √  TEMPLATES_DIR         3 files         valid     /usr/lib/python3/dist-packages/archivebox/templates                         
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled                                                                              
 -  COOKIES_FILE          -               disabled                                                                              

[i] Data locations:
 √  OUTPUT_DIR            14 files        valid     /home/.../archivebox                                                     
 √  SOURCES_DIR           27 files        valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           9024 files      valid     ./archive                                                                   
 √  CONFIG_FILE           291.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             105.7 MB        valid     ./index.sqlite3             

Metadata

Metadata

Assignees

No one assigned

    Labels

    size: easystatus: wipWork is in-progress / has already been partially completed

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions