-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Closed
Copy link
Labels
expected: unlikely unless contributedThis change is unlikely to be made unless someone contributes a PR for review.This change is unlikely to be made unless someone contributes a PR for review.good first tickethelp wantedsize: easystatus: backlogWork is planned someday but is not the highest priority at the momentWork is planned someday but is not the highest priority at the momentwhy: functionalityIntended to improve ArchiveBox functionality or featuresIntended to improve ArchiveBox functionality or features
Description
Describe the bug
I saved a webpage which is terribly coded by hand, and has an empty title tag. (Literally: `<title></title>'.) The resulting snapshot is named "</title". Easy to change but odd.
TBH I'm not sure if you should care, since we may not care if horribly invalid documents create errors. But on the off chance that it's easy to check for and change this in the code, am filing bug report. (Perhaps such snapshots could be named "No document title found".)
Steps to reproduce
- Saved this page to ArchiveBox: http://wildwestcycle.com/f_oiltempdegradation.html
- Snapshot title is
</title
Screenshots or log output
ArchiveBox version
ArchiveBox v0.6.2
Cpython Linux Linux-4.4.302+-x86_64-with-glibc2.28 x86_64
IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep
[i] Dependency versions:
√ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox
√ PYTHON_BINARY v3.9.5 valid /usr/local/bin/python3.9
√ DJANGO_BINARY v3.1.10 valid /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py
√ CURL_BINARY v7.64.0 valid /usr/bin/curl
√ WGET_BINARY v1.20.1 valid /usr/bin/wget
√ NODE_BINARY v15.14.0 valid /usr/bin/node
√ SINGLEFILE_BINARY v0.3.16 valid /node/node_modules/single-file/cli/single-file
√ READABILITY_BINARY v0.0.2 valid /node/node_modules/readability-extractor/readability-extractor
√ MERCURY_BINARY v1.0.0 valid /node/node_modules/@postlight/mercury-parser/cli.js
- GIT_BINARY - disabled /usr/bin/git
- YOUTUBEDL_BINARY - disabled /usr/local/bin/youtube-dl
√ CHROME_BINARY v90.0.4430.93 valid /usr/bin/chromium
√ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg
[i] Source-code locations:
√ PACKAGE_DIR 22 files valid /app/archivebox
√ TEMPLATES_DIR 3 files valid /app/archivebox/templates
- CUSTOM_TEMPLATES_DIR - disabled
[i] Secrets locations:
- CHROME_USER_DATA_DIR - disabled
- COOKIES_FILE - disabled
[i] Data locations:
√ OUTPUT_DIR 5 files valid /data
√ SOURCES_DIR 136 files valid ./sources
√ LOGS_DIR 1 files valid ./logs
√ ARCHIVE_DIR 141 files valid ./archive
√ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf
√ SQL_INDEX 1.1 MB valid ./index.sqlite3
pirate
Metadata
Metadata
Assignees
Labels
expected: unlikely unless contributedThis change is unlikely to be made unless someone contributes a PR for review.This change is unlikely to be made unless someone contributes a PR for review.good first tickethelp wantedsize: easystatus: backlogWork is planned someday but is not the highest priority at the momentWork is planned someday but is not the highest priority at the momentwhy: functionalityIntended to improve ArchiveBox functionality or featuresIntended to improve ArchiveBox functionality or features
