Skip to content

Can not load tokoenizer from_pretrained through http_proxy since 0.14.0 #1373

@jtsai-quid

Description

@jtsai-quid

Hi hf,

I encountered an issue where I couldn't load the tokenizer using from_pretrained via the http_proxy in version 0.14.0, while it worked successfully in version 0.13.3.
This caused the fast tokenizer initialization issue in TGI 1.1.0.
huggingface/text-generation-inference#1108

Here is the code snippet that I use to test for testing.

//# tokenizers = { version = "0.14.0", features = ["http"] }

use tokenizers::tokenizer::{Result, Tokenizer};
use tokenizers::{FromPretrainedParameters};

fn main() -> Result<()> {
        let authorization_token = std::env::var("HUGGING_FACE_HUB_TOKEN").ok();
        let params = FromPretrainedParameters {
            revision: None.clone().unwrap_or("main".to_string()),
            auth_token: authorization_token.clone(),
            ..Default::default()
        };

        let tokenizer = Tokenizer::from_pretrained("TheBloke/Llama-2-13B-chat-GPTQ", Some(params))?;

        let encoding = tokenizer.encode("Hey there!", false)?;
        println!("{:?}", encoding.get_tokens());
    Ok(())
}

Error output

> http_proxy=http://squid:3128 https_proxy=http://squid:3128 cargo play run.rs
   Compiling p4u7iybabtwyzvxf2zdtkustjgod2 v0.1.0 (/tmp/cargo-play.4U7iybABTwyZVxF2ZDTKUstjgod2)
    Finished dev [unoptimized + debuginfo] target(s) in 3.14s
     Running `/tmp/cargo-play.4U7iybABTwyZVxF2ZDTKUstjgod2/target/debug/p4u7iybabtwyzvxf2zdtkustjgod2`
Error: RequestError(Transport(Transport { kind: Io, message: None, url: Some(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("huggingface.co")), port: None, path: "/TheBloke/Llama-2-13B-chat-GPTQ/resolve/main/tokenizer.json", query: None, fragment: None }), source: Some(Custom { kind: TimedOut, error: "timed out reading response" }) }))

I suspect that this is related to the client refactoring in here

Thanks and appreciate for any help from you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions