This is actually useful to run inference in models like qwen-1.5b which need tokenizers.
Right now apr is able to import and inspect something like qwen-1.5b but not do any inference. This would be a similar problem with realizar which doesn't have tokenizer support.
apr run qwen-1.5b.apr
=== APR Run ===
Source: qwen-1.5b.apr
Using mmap for 2944MB model
Loaded SafeTensors model with 338 tensors in 527.00ms
Output:
Model: qwen-1.5b.apr
Input: none
Tensors: 338
Load time: 527.00ms
Tensor names (first 10):
1. layers.27.self_attn.v_proj.weight
2. layers.8.self_attn.k_proj.bias
3. layers.12.self_attn.o_proj.weight
4. layers.12.self_attn.k_proj.weight
5. layers.19.input_layernorm.weight
6. layers.4.post_attention_layernorm.weight
7. layers.22.self_attn.v_proj.weight
8. layers.0.self_attn.k_proj.bias
9. layers.9.self_attn.q_proj.bias
10. layers.27.input_layernorm.weight
... and 328 more
Completed in 0.85s (cached)
This is actually useful to run inference in models like qwen-1.5b which need tokenizers.
Right now
apris able to import and inspect something likeqwen-1.5bbut not do any inference. This would be a similar problem withrealizarwhich doesn't have tokenizer support.