-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Description
Describe the bug
I can't seem to get archivebox to add any URLs from simple txt file with a newline separated list of URLs.
Based on error message it fails to parse it. I may be doing something wrong.
Steps to reproduce
- Create txt file with some URLs. Eg.
https://www.example.com/
https://example.com/
- Run
archivebox add /tmp/urls.txt
Screenshots or log output
Here's the output I get:
ross@xx> archivebox add /tmp/urls.txt /tmp/archivebox
[i] [2022-04-20 16:05:12] ArchiveBox v0.6.2: archivebox add /tmp/urls.txt
> /tmp/archivebox
[!] Warning: Missing 3 recommended dependencies
! SINGLEFILE_BINARY: single-file (unable to detect version)
Hint: To install all packages automatically run: archivebox setup
or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False
! READABILITY_BINARY: readability-extractor (unable to detect version)
Hint: To install all packages automatically run: archivebox setup
or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False
! MERCURY_BINARY: mercury-parser (unable to detect version)
Hint: To install all packages automatically run: archivebox setup
or to disable it and silence this warning: archivebox config --set SAVE_MERCURY=False
[+] [2022-04-20 16:05:13] Adding 1 links to index (crawl depth=0)...
> Saved verbatim input to sources/1650470713-import.txt
0.0% (0/240sec)[X] Error while loading link! [1650470713.151664] /tmp/urls.txt "None"
> Parsed 0 URLs from input (Failed to parse)
> Found 0 new URLs not already in index
[*] [2022-04-20 16:05:13] Writing 0 links to main index...
√ ./index.sqlite3
ArchiveBox version
ArchiveBox v0.6.2
Cpython Linux Linux-5.17.1-arch1-1-x86_64-with-glibc2.35 x86_64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep
[i] Dependency versions:
√ ARCHIVEBOX_BINARY v0.6.2 valid /home/ross/.local/bin/archivebox
√ PYTHON_BINARY v3.10.4 valid /usr/bin/python3.10
√ DJANGO_BINARY v3.1.14 valid /home/ross/.local/lib/python3.10/site-packages/django/bin/django-admin.py
√ CURL_BINARY v7.82.0 valid /usr/bin/curl
√ WGET_BINARY v1.21.3 valid /usr/bin/wget
√ NODE_BINARY v17.9.0 valid /usr/bin/node
X SINGLEFILE_BINARY ? invalid single-file
X READABILITY_BINARY ? invalid readability-extractor
X MERCURY_BINARY ? invalid mercury-parser
√ GIT_BINARY v2.35.2 valid /usr/bin/git
√ YOUTUBEDL_BINARY v2021.12.17 valid /home/ross/.local/bin/youtube-dl
√ CHROME_BINARY v100.0.4896.88 valid /usr/bin/chromium
√ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg
[i] Source-code locations:
√ PACKAGE_DIR 23 files valid /home/ross/.local/lib/python3.10/site-packages/archivebox
√ TEMPLATES_DIR 3 files valid /home/ross/.local/lib/python3.10/site-packages/archivebox/templates
- CUSTOM_TEMPLATES_DIR - disabled
[i] Secrets locations:
- CHROME_USER_DATA_DIR - disabled
- COOKIES_FILE - disabled
[i] Data locations:
√ OUTPUT_DIR 5 files valid /tmp/archivebox
√ SOURCES_DIR 3 files valid ./sources
√ LOGS_DIR 1 files valid ./logs
√ ARCHIVE_DIR 0 files valid ./archive
√ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf
√ SQL_INDEX 204.0 KB valid ./index.sqlite3
[!] Warning: Missing 3 recommended dependencies
! SINGLEFILE_BINARY: single-file (unable to detect version)
Hint: To install all packages automatically run: archivebox setup
or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False
! READABILITY_BINARY: readability-extractor (unable to detect version)
Hint: To install all packages automatically run: archivebox setup
or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False
! MERCURY_BINARY: mercury-parser (unable to detect version)
Hint: To install all packages automatically run: archivebox setup
or to disable it and silence this warning: archivebox config --set SAVE_MERCURY=False
Metadata
Metadata
Assignees
Labels
No labels