Skip to content

Conversation

@pellaeon
Copy link
Contributor

Summary

bin_version is used to set user agent string in title extractor. bin_version calls external executables with --version and parses its output. The resulting version string is used to set user agent string.

Executables might output in localized languages, however, user agent strings can only be in latin-1, resulting in this error:

        Extractor failed:                                                                                                                                                                                                                   
            UnicodeEncodeError 'latin-1' codec can't encode characters in position 201-202: ordinal not in range(256) 

This problem is fixed by running executables with environment variable LANG=C.

Example of running wget --version when LANG="zh_TW.UTF-8":

GNU Wget 1.21,於 linux-gnu 上編譯。

-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls 
+ntlm +opie +psl +ssl/openssl 

Wgetrc: 
    /etc/wgetrc (系統)
語系: 
    /usr/share/locale 
編譯: 
    gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc" 
    -DLOCALEDIR="/usr/share/locale" -I. -I../../src -I../lib 
    -I../../lib -Wdate-time -D_FORTIFY_SOURCE=2 -DHAVE_LIBSSL -DNDEBUG 
    -g -O2 -ffile-prefix-map=/build/wget-OM48Vs/wget-1.21=. 
    -fstack-protector-strong -Wformat -Werror=format-security 
    -DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall 
連結: 
    gcc -DHAVE_LIBSSL -DNDEBUG -g -O2 
    -ffile-prefix-map=/build/wget-OM48Vs/wget-1.21=. 
    -fstack-protector-strong -Wformat -Werror=format-security 
    -DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall -Wl,-Bsymbolic-functions 
    -Wl,-z,relro -Wl,-z,now -lpcre2-8 -luuid -lidn2 -lssl -lcrypto -lz 
    -lpsl ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a 

版權所有 (C) 2015 自由軟體基金會
GPLv3+ 授權:GNU GPL 第三版或更新版本
<http://www.gnu.org/licenses/gpl.html>。
此為自由軟體:您能自由修改與重散布它。
在法律允許的範圍內沒有任何擔保。

最初由 Hrvoje Niksic <hniksic@xemacs.org> 編寫。
請將漏洞報告和問題寄到 <bug-wget@gnu.org>。

Related issues

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

@pirate
Copy link
Member

pirate commented Mar 5, 2022

woah nice debugging, this must have taken you a while to track down. thanks for the fix! A+ PR

@pirate pirate merged commit feafe9a into ArchiveBox:dev Mar 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants