Conversation
…rnal into falcon-h1-clean
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Head branch was pushed to by a user without write access
|
PTAL at the failing models test |
|
The rest should be unrelated |
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Head branch was pushed to by a user without write access
|
hey @DarkLight1337 ,
|
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Looks good to me. Congrats on the launch!
|
The error is What is the version of transformers required for this model? |
|
If it's not in v4.51 then you need to set |
|
@DarkLight1337 , i think it is normal because FalconH1 PR to HF is not merged yet! |
|
As per huggingface/transformers#38249 (comment), you can set |
|
Also please merge from main to fix some CI failures |
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Head branch was pushed to by a user without write access
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae> Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
* Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup (vllm-project#18337) * [Misc] Fix typo (vllm-project#18330) * Neuron up mistral (vllm-project#18222) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> * fix CUDA_check redefinition in vllm-project#17918 (vllm-project#18287) Signed-off-by: Lucia Fang <fanglu@fb.com> Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com> * [neuron] fix authorization issue (vllm-project#18364) Signed-off-by: Liangfu Chen <liangfc@amazon.com> * [Misc] Allow `AutoWeightsLoader` to skip loading weights with specific substr in name (vllm-project#18358) Signed-off-by: Isotr0py <2037008807@qq.com> * [Core] [Bugfix]: tensor parallel with prompt embeds (vllm-project#18171) Signed-off-by: Nan2018 <nan@protopia.ai> Co-authored-by: Andrew Sansom <andrew@protopia.ai> * [release] Change dockerhub username for TPU release (vllm-project#18389) * [Bugfix] fix adding bias twice in ipex GPTQ quantization (vllm-project#18363) Signed-off-by: rand-fly <randfly@outlook.com> * [doc] update env variable export (vllm-project#18391) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [Misc] Add LoRA code owner (vllm-project#18387) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * Update cpu.txt (vllm-project#18398) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> * [CI] Add mteb testing to test the accuracy of the embedding model (vllm-project#17175) * [Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (vllm-project#18407) Co-authored-by: 松灵 <wpf272043@alibaba-inc.com> * [Misc] refactor prompt embedding examples (vllm-project#18405) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [Minor] Rename quantization nvfp4 to modelopt_fp4 (vllm-project#18356) Signed-off-by: mgoin <mgoin64@gmail.com> * [Model] use AutoWeightsLoader for bloom (vllm-project#18300) Signed-off-by: calvin chen <120380290@qq.com> * [Kernel] update comment for KV shape in unified triton attn (vllm-project#18099) Signed-off-by: haochengxia <xhc_1007@163.com> * fix:Build torch wheel inline rather than picking from nightly (vllm-project#18351) Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com> * [TPU] Re-enable the Pallas MoE kernel (vllm-project#18025) Signed-off-by: Michael Goin <mgoin64@gmail.com> * [Bugfix] config.head_dim is now explicitly set to None (vllm-project#18432) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> * [Bug] Fix moe_sum signature (vllm-project#18440) Signed-off-by: Bill Nell <bnell@redhat.com> * Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (vllm-project#18407)" (vllm-project#18456) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix][Failing Test] Fix nixl connector test when promt size < block size (vllm-project#18429) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> * [Misc] MultiConnector._connectors type (vllm-project#18423) Signed-off-by: nicklucche <nlucches@redhat.com> * [Frontend] deprecate `--device` arg (vllm-project#18399) Signed-off-by: Kebe <mail@kebe7jun.com> * [V1] Fix general plugins not loaded in engine for multiproc (vllm-project#18326) Signed-off-by: Yong Hoon Shin <yhshin@meta.com> * [Misc] refactor disaggregated-prefill-v1 example (vllm-project#18474) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [Bugfix][Failing Test] Fix test_events.py (vllm-project#18460) Signed-off-by: rabi <ramishra@redhat.com> * [MODEL] FalconH1 (vllm-project#18406) Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae> Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae> * [Doc] fix arg docstring in linear layers (vllm-project#18410) Signed-off-by: giantcroc <1204449533@qq.com> * [Bugfix] Reduce moe_sum test size to avoid OOM (vllm-project#18484) Signed-off-by: Bill Nell <bnell@redhat.com> * [Build] fix Dockerfile shell (vllm-project#18402) * [Misc] Update deprecation message for `--enable-reasoning` (vllm-project#18404) * [ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (vllm-project#17004) Signed-off-by: Hosang Yoon <hosang.yoon@amd.com> * Remove incorrect env value * Revert "[v1] Support multiple KV cache groups in GPU model runner (vllm-project#17945) (vllm-project#18459) Signed-off-by: Mark McLoughlin <markmc@redhat.com> * [FEAT][ROCm] Upgrade AITER MLA v1 backend (vllm-project#18338) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> * [Bugfix] Consistent ascii handling in tool parsers (vllm-project#17704) Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com> * [FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) (vllm-project#18500) Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae> Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae> * [MISC] update project urls in pyproject.toml (vllm-project#18519) Signed-off-by: Andy Xie <andy.xning@gmail.com> * [CI] Fix race condition with StatelessProcessGroup.barrier (vllm-project#18506) Signed-off-by: Russell Bryant <rbryant@redhat.com> * Intialize io_thread_pool attribute in the beginning. (vllm-project#18331) Signed-off-by: rabi <ramishra@redhat.com> * [Bugfix] Inconsistent token calculation compared to HF in llava family (vllm-project#18479) Signed-off-by: jaycha <jaycha@ncsoft.com> * [BugFix][DP] Send DP wave completion only from `dp_rank==0` (vllm-project#18502) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: kourosh hakhamaneshi <kourosh@anyscale.com> * [Bugfix][Model] Make Olmo2Model weight loading return loaded weights (vllm-project#18504) Signed-off-by: Shane A <shanea@allenai.org> * [Bugfix] Fix LoRA test (vllm-project#18518) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Doc] Fix invalid JSON in example args (vllm-project#18527) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Neuron] Update Dockerfile.neuron to use latest neuron release (2.23) (vllm-project#18512) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> * Update default neuron config for speculation (vllm-project#18274) Signed-off-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Shashwat Srijan <sssrijan@amazon.com> Co-authored-by: Aakash Shetty <sheaak@amazon.com> * Order sequence ids + config update to support specifying custom quantization layers (vllm-project#18279) Signed-off-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Tailin Pan <tailinpa@amazon.com> Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Maxwell Goldberg <mgld@amazon.com> Co-authored-by: Aakash Shetty <sheaak@amazon.com> * [Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (vllm-project#18526) Co-authored-by: 松灵 <wpf272043@alibaba-inc.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible (vllm-project#18513) Signed-off-by: Linkun <github@lkchen.net> * [CI/Build] Update bamba test model location (vllm-project#18544) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Doc] Support --stream arg in openai_completion_client.py script (vllm-project#18388) Signed-off-by: googs1025 <googs1025@gmail.com> * [Bugfix] Use random hidden states in dummy sampler run (vllm-project#18543) Signed-off-by: Bowen Wang <abmfy@icloud.com> * [Doc] Add stream flag for chat completion example (vllm-project#18524) Signed-off-by: calvin chen <120380290@qq.com> * [BugFix][CPU] Fix x86 SHM distributed module initialization (vllm-project#18536) Signed-off-by: jiang.li <jiang1.li@intel.com> * [Misc] improve Automatic Prefix Caching example (vllm-project#18554) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [Misc] Call `ndarray.tobytes()` directly instead of `ndarray.data.tobytes()` (vllm-project#18347) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> * [Bugfix] make `test_openai_schema.py` pass (vllm-project#18224) Signed-off-by: David Xia <david@davidxia.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Platform] Move platform check to right place (vllm-project#18470) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> * [Compile][Platform] Make PiecewiseBackend pluggable and extendable (vllm-project#18076) Signed-off-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: youkaichao <youkaichao@gmail.com> * [Build/CI] Fix CUDA 11.8 build (vllm-project#17679) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> * [Tool] Add NIXL installation script (vllm-project#18172) Signed-off-by: Linkun <github@lkchen.net> * [V1][Spec Decode][Bugfix] Load quantize weights for EAGLE (vllm-project#18290) * [Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (vllm-project#17917) Signed-off-by: Kai Wu <kaiwu@meta.com> * [Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (vllm-project#17926) Signed-off-by: Sanger Steel <sangersteel@gmail.com> * [AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (vllm-project#18568) Signed-off-by: Randall Smith <Randall.Smith@amd.com> * Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (vllm-project#18569) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> * [V1][Spec Decoding] Use model_loader.get_model() to load models (vllm-project#18273) Signed-off-by: Mark McLoughlin <markmc@redhat.com> * Enable hybrid attention models for Transformers backend (vllm-project#18494) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs (vllm-project#18482) Signed-off-by: googs1025 <googs1025@gmail.com> * [BugFix] Increase TP execute_model timeout (vllm-project#18558) Signed-off-by: Nick Hill <nhill@redhat.com> * [Bugfix] Set `KVTransferConfig.engine_id` in post_init (vllm-project#18576) Signed-off-by: Linkun Chen <github@lkchen.net> * [Spec Decode] Make EAGLE3 draft token ID mapping optional (vllm-project#18488) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Neuron] Remove bypass on EAGLEConfig and add a test (vllm-project#18514) Signed-off-by: Elaine Zhao <elaineyz@amazon.com> * [Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key (vllm-project#17291) Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com> * [Misc] Replace `cuda` hard code with `current_platform` (vllm-project#16983) Signed-off-by: shen-shanshan <467638484@qq.com> * [Hardware] correct method signatures for HPU,ROCm,XPU (vllm-project#18551) Signed-off-by: Andy Xie <andy.xning@gmail.com> * [V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (vllm-project#18034) Signed-off-by: Ronald Xu <ronaldxu@amazon.com> * [Feature]Add async tensor parallelism using compilation pass (vllm-project#17882) Signed-off-by: cascade812 <cascade812@outlook.com> * [Doc] Update quickstart and install for cu128 using `--torch-backend=auto` (vllm-project#18505) Signed-off-by: mgoin <mgoin64@gmail.com> * [Feature][V1]: suupports cached_tokens in response usage (vllm-project#18149) Co-authored-by: simon-mo <xmo@berkeley.edu> * [Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (vllm-project#18430) Signed-off-by: Yuqi Zhang <yuqizhang@google.com> Co-authored-by: Yuqi Zhang <yuqizhang@google.com> * Migrate docs from Sphinx to MkDocs (vllm-project#18145) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (vllm-project#18034)" (vllm-project#18600) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix][Model] Fix baichuan model loader for tp (vllm-project#18597) Signed-off-by: Mengqing Cao <cmq0113@163.com> * [V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled (vllm-project#17731) Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> * Add myself as docs code owner (vllm-project#18605) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to `requirements/cpu.txt` (vllm-project#18542) Signed-off-by: Kay Yan <kay.yan@daocloud.io> * [CI] fix kv_cache_type argument (vllm-project#18594) Signed-off-by: Andy Xie <andy.xning@gmail.com> * [Doc] Fix indent of contributing to vllm (vllm-project#18611) Signed-off-by: Zerohertz <ohg3417@gmail.com> * Replace `{func}` with mkdocs style links (vllm-project#18610) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [CI/Build] Fix V1 flag being set in entrypoints tests (vllm-project#18598) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * Fix examples with code blocks in docs (vllm-project#18609) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Bugfix] Fix transformers model impl ignored for mixtral quant (vllm-project#18602) Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com> * Include private attributes in API documentation (vllm-project#18614) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Misc] add Haystack integration (vllm-project#18601) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORTED_ARCHS (vllm-project#18579) * [Doc] Fix markdown list indentation for MkDocs rendering (vllm-project#18620) Signed-off-by: Zerohertz <ohg3417@gmail.com> * [Doc] Use a different color for the announcement (vllm-project#18616) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * Refactor pplx init logic to make it modular (prepare for deepep) (vllm-project#18200) Signed-off-by: youkaichao <youkaichao@gmail.com> * Fix figures in design doc (vllm-project#18612) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Docs] Change mkdocs to not use directory urls (vllm-project#18622) Signed-off-by: mgoin <mgoin64@gmail.com> * [v1] Redo "Support multiple KV cache groups in GPU model runner (vllm-project#17945)" (vllm-project#18593) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * [Doc] fix list formatting (vllm-project#18624) Signed-off-by: David Xia <david@davidxia.com> * [Doc] Fix top-level API links/docs (vllm-project#18621) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Avoid documenting dynamic / internal modules (vllm-project#18626) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar (vllm-project#18627) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [V1] Support Deepseek MTP (vllm-project#18435) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn> Co-authored-by: Rui Qiao <ruisearch42@gmail.com> * Use prebuilt FlashInfer x86_64 PyTorch 2.7 CUDA 12.8 wheel for CI (vllm-project#18537) Signed-off-by: Huy Do <huydhn@gmail.com> * [CI] Enable test_initialization to run on V1 (vllm-project#16736) Signed-off-by: mgoin <mgoin64@gmail.com> * [Doc] Update references to doc files (vllm-project#18637) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation (vllm-project#18160) Signed-off-by: Pavani Majety <pmajety@nvidia.com> * [Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (vllm-project#18454) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com> * [Bugfix][Nixl] Fix Preemption Bug (vllm-project#18631) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * config.py: Clarify that only local GGUF checkpoints are supported. (vllm-project#18623) Signed-off-by: Mathieu Bordere <mathieu@letmetweakit.com> * FIX MOE issue in AutoRound format (vllm-project#18586) Signed-off-by: wenhuach21 <wenhua.cheng@intel.com> * [V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (vllm-project#18424) Signed-off-by: qizixi <qizixi@meta.com> * [Frontend] improve vllm serve --help display (vllm-project#18643) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) (vllm-project#18647) * [V1][Spec Decode] Support multi-layer eagle draft model (vllm-project#18030) Signed-off-by: qizixi <qizixi@meta.com> * [Doc] Update README links, mark external links (vllm-project#18635) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [MISC][pre-commit] Add pre-commit check for triton import (vllm-project#17716) Signed-off-by: Mengqing Cao <cmq0113@163.com> * [Doc] Fix indentation problems in V0 Paged Attention docs (vllm-project#18659) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Add community links (vllm-project#18657) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Model] use AutoWeightsLoader for gpt2 (vllm-project#18625) Signed-off-by: zt2370 <ztang2370@gmail.com> * [Doc] Reorganize user guide (vllm-project#18661) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [CI/Build] `chmod +x` to `cleanup_pr_body.sh` (vllm-project#18650) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [MISC] typo fix and clean import (vllm-project#18664) Signed-off-by: Andy Xie <andy.xning@gmail.com> * [BugFix] Fix import error for fused_moe (vllm-project#18642) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> * [CI] enforce import regex instead of re (vllm-project#18665) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> * fix(regression): clone from reference items (vllm-project#18662) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> * [CI/Build] fix permission denied issue (vllm-project#18645) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding (vllm-project#18668) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [V1] Fix _pickle.PicklingError: Can't pickle <class 'transformers_modules.deepseek-ai.DeepSeek-V2-Lite... (vllm-project#18640) Signed-off-by: Seiji Eicher <seiji@anyscale.com> * [MISC] correct signature for LoaderFunction (vllm-project#18670) Signed-off-by: Andy Xie <andy.xning@gmail.com> * [Misc]Replace `cuda` hard code with `current_platform` in Ray (vllm-project#14668) Signed-off-by: noemotiovon <757486878@qq.com> * [Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE (vllm-project#18655) Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [VLM] Initialize video input support for InternVL models (vllm-project#18499) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * Speed up the `kernels/quantization/` tests (vllm-project#18669) Signed-off-by: mgoin <mgoin64@gmail.com> * [BUGFIX] catch subclass first for try...except (vllm-project#18672) Signed-off-by: Andy Xie <andy.xning@gmail.com> * [Misc] Reduce logs on startup (vllm-project#18649) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [doc] fix broken links (vllm-project#18671) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [doc] improve readability (vllm-project#18675) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [Bugfix] Fix cpu usage and cache hit stats reporting on cpu environment (vllm-project#18674) Signed-off-by: zzzyq <zhangyuqi94@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [CI/build] fix no regex (vllm-project#18676) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [Misc] small improve (vllm-project#18680) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [Bugfix] Fix profiling dummy data for Pixtral (vllm-project#18677) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Core][Multimodal] Convert PIL Image to array without data copy when hashing (vllm-project#18682) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> * [CI/Build][Doc] Update `gte-Qwen2-1.5B-instruct` usage (vllm-project#18683) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example (vllm-project#18644) Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com> Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com> Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com> * refactor: simplify request handler, use positive condition check for handler assignment (vllm-project#18690) Signed-off-by: googs1025 <googs1025@gmail.com> * [Bugfix] Fix the lm_head in gpt_bigcode in lora mode (vllm-project#6357) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com> * [CI] add missing argument (vllm-project#18694) Signed-off-by: Andy Xie <andy.xning@gmail.com> * [GH] Add issue template for reporting CI failures (vllm-project#18696) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Fix issue template format (vllm-project#18699) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Fix Mistral-format models with sliding window (vllm-project#18693) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [CI/Build] Replace `math.isclose` with `pytest.approx` (vllm-project#18703) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [CI] fix dump_input for str type (vllm-project#18697) Signed-off-by: Andy Xie <andy.xning@gmail.com> * [Model] Add support for YARN in NemotronNAS models (vllm-project#18427) Signed-off-by: Nave Assaf <nassaf@nvidia.com> * [CI/Build] Split pooling and generation extended language models tests in CI (vllm-project#18705) Signed-off-by: Isotr0py <2037008807@qq.com> * [Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test to HPU CI (vllm-project#18709) Signed-off-by: Lukasz Durejko <ldurejko@habana.ai> * [Misc] add AutoGen integration (vllm-project#18712) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM (vllm-project#18701) * [Doc] Improve API docs (vllm-project#18713) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Move examples and further reorganize user guide (vllm-project#18666) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Fix Llama GGUF initialization (vllm-project#18717) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs (vllm-project#18608) * Convert `examples` to `ruff-format` (vllm-project#18400) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Model][Gemma3] Simplify image input validation (vllm-project#18710) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> * [Misc] improve web section group title display (vllm-project#18684) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * [V1][Quantization] Add CUDA graph compatible v1 GGUF support (vllm-project#18646) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> * [Model][Gemma3] Cast image pixel values already on CPU (vllm-project#18732) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> * [FEAT] [ROCm] Upgrade AITER Fused MoE kernels. (vllm-project#18271) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> * [Doc] Update OOT model docs (vllm-project#18742) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Update reproducibility doc and example (vllm-project#18741) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Misc] improve docs (vllm-project#18734) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> * feat(rocm-support): support mamba2 on rocm (vllm-project#18565) Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai> Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai> * [Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the same name in run-hpu-test.sh (vllm-project#18752) Signed-off-by: Lukasz Durejko <ldurejko@habana.ai> * [Doc] cleanup deprecated flag for doc (vllm-project#18715) Signed-off-by: calvin chen <120380290@qq.com> * Minor fix about MooncakeStoreConnector (vllm-project#18721) Signed-off-by: baoloongmao <baoloongmao@tencent.com> * [Build] fix cpu build missing libtbbmalloc.so (vllm-project#18744) Signed-off-by: Kebe <mail@kebe7jun.com> * [BUG FIX] minicpm (vllm-project#18739) Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com> Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com> * [Doc] Convert Sphinx directives ( `{class}`, `{meth}`, `{attr}`, ...) to MkDocs format for better documentation linking (vllm-project#18663) Signed-off-by: Zerohertz <ohg3417@gmail.com> * [CI/Build] Remove imports of built-in `re` (vllm-project#18750) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [V1][Metrics] Add API for accessing in-memory Prometheus metrics (vllm-project#17010) Signed-off-by: Mark McLoughlin <markmc@redhat.com> * Disable prefix cache by default for benchmark (vllm-project#18639) Signed-off-by: cascade812 <cascade812@outlook.com> * optimize get_kv_cache_torch_dtype (vllm-project#18531) Signed-off-by: idellzheng <idellzheng@tencent.com> * [Core] Automatically cast multi-modal input dtype (vllm-project#18756) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Mistral tool calling when content is list (vllm-project#18729) Signed-off-by: mgoin <mgoin64@gmail.com> --------- Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Liangfu Chen <liangfc@amazon.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Nan2018 <nan@protopia.ai> Signed-off-by: rand-fly <randfly@outlook.com> Signed-off-by: reidliu41 <reid201711@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: calvin chen <120380290@qq.com> Signed-off-by: haochengxia <xhc_1007@163.com> Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: nicklucche <nlucches@redhat.com> Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: rabi <ramishra@redhat.com> Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae> Signed-off-by: giantcroc <1204449533@qq.com> Signed-off-by: Hosang Yoon <hosang.yoon@amd.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com> Signed-off-by: Andy Xie <andy.xning@gmail.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: jaycha <jaycha@ncsoft.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Shane A <shanea@allenai.org> Signed-off-by: Elaine Zhao <elaineyz@amazon.com> Signed-off-by: Linkun <github@lkchen.net> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: googs1025 <googs1025@gmail.com> Signed-off-by: Bowen Wang <abmfy@icloud.com> Signed-off-by: jiang.li <jiang1.li@intel.com> Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: David Xia <david@davidxia.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Signed-off-by: Kai Wu <kaiwu@meta.com> Signed-off-by: Sanger Steel <sangersteel@gmail.com> Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Linkun Chen <github@lkchen.net> Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com> Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Ronald Xu <ronaldxu@amazon.com> Signed-off-by: cascade812 <cascade812@outlook.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com> Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Signed-off-by: Kay Yan <kay.yan@daocloud.io> Signed-off-by: Zerohertz <ohg3417@gmail.com> Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn> Signed-off-by: Huy Do <huydhn@gmail.com> Signed-off-by: Pavani Majety <pmajety@nvidia.com> Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Mathieu Bordere <mathieu@letmetweakit.com> Signed-off-by: wenhuach21 <wenhua.cheng@intel.com> Signed-off-by: qizixi <qizixi@meta.com> Signed-off-by: zt2370 <ztang2370@gmail.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: noemotiovon <757486878@qq.com> Signed-off-by: zzzyq <zhangyuqi94@gmail.com> Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com> Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Nave Assaf <nassaf@nvidia.com> Signed-off-by: Lukasz Durejko <ldurejko@habana.ai> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai> Signed-off-by: baoloongmao <baoloongmao@tencent.com> Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com> Signed-off-by: idellzheng <idellzheng@tencent.com> Co-authored-by: sunyicode0012 <116338547+sunyicode0012@users.noreply.github.com> Co-authored-by: Gong Shufan <2624542821@qq.com> Co-authored-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com> Co-authored-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Nan Qin <nan@protopia.ai> Co-authored-by: Andrew Sansom <andrew@protopia.ai> Co-authored-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Random Fly <renfei8@live.cn> Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com> Co-authored-by: reidliu41 <reid201711@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: 燃 <wulipc@163.com> Co-authored-by: 松灵 <wpf272043@alibaba-inc.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Calvin Chen <45745657+calvin0327@users.noreply.github.com> Co-authored-by: Percy <xhc_1007@163.com> Co-authored-by: Dilip Gowda Bhagavan <110233170+dilipgb@users.noreply.github.com> Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: wwl2755 <wangwenlong2755@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Kebe <mail@kebe7jun.com> Co-authored-by: Yong Hoon Shin <48474650+sarckk@users.noreply.github.com> Co-authored-by: Rabi Mishra <ramishra@redhat.com> Co-authored-by: Dhia Eddine Rhaiem <163106757+dhiaEddineRhaiem@users.noreply.github.com> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae> Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae> Co-authored-by: GiantCroc <1204449533@qq.com> Co-authored-by: Hyogeun Oh (오효근) <ohg3417@gmail.com> Co-authored-by: Hosang <156028780+hyoon1@users.noreply.github.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Sebastian Schoennenbeck <sebastian.schoennenbeck@comma-soft.com> Co-authored-by: Ning Xie <andy.xning@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: youngrok cha <line0930@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: kourosh hakhamaneshi <kourosh@anyscale.com> Co-authored-by: Shane A <shanea@allenai.org> Co-authored-by: aws-elaineyz <elaineyz@amazon.com> Co-authored-by: Shashwat Srijan <sssrijan@amazon.com> Co-authored-by: Aakash Shetty <sheaak@amazon.com> Co-authored-by: Tailin Pan <tailinpa@amazon.com> Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Maxwell Goldberg <mgld@amazon.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: lkchen <github@lkchen.net> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: CYJiang <86391540+googs1025@users.noreply.github.com> Co-authored-by: Bowen Wang <abmfy@icloud.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: David Xia <david@davidxia.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Kai Wu <kaiwu@meta.com> Co-authored-by: Sanger Steel <sangersteel@gmail.com> Co-authored-by: rasmith <Randall.Smith@amd.com> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Teruaki Ishizaki <tell.ishi@gmail.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: RonaldBXu <72748153+RonaldBXu@users.noreply.github.com> Co-authored-by: cascade <cascade812@outlook.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Yuqi Zhang <zhangyuqi94@gmail.com> Co-authored-by: Yuqi Zhang <yuqizhang@google.com> Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Kay Yan <kay.yan@daocloud.io> Co-authored-by: Tristan Leclercq <49700633+tristanleclercq@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Jiayi Yao <82156730+YaoJiayi@users.noreply.github.com> Co-authored-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Huy Do <huydhn@gmail.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com> Co-authored-by: Feng XiaoLong <79261065+Crucifixion-Fxl@users.noreply.github.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Mathieu Borderé <mathieu@bordere.org> Co-authored-by: Wenhua Cheng <wenhua.cheng@intel.com> Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com> Co-authored-by: Yuanhao WU <Nalkey@users.noreply.github.com> Co-authored-by: ztang2370 <ztang2370@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com> Co-authored-by: Chenguang Li <757486878@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: AlexZhao <zhaohaidao2008@hotmail.com> Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com> Co-authored-by: Maximilien de Bayser <mbayser@br.ibm.com> Co-authored-by: Naveassaf <55059536+Naveassaf@users.noreply.github.com> Co-authored-by: Łukasz Durejko <lukasz.durejko@intel.com> Co-authored-by: dylan <xuhao296@qq.com> Co-authored-by: almersawi <43927639+almersawi@users.noreply.github.com> Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai> Co-authored-by: Łukasz Durejko <ldurejko@habana.ai> Co-authored-by: maobaolong <baoloongmao@tencent.com> Co-authored-by: Shawn Huang <57223022+huangyuxiang03@users.noreply.github.com> Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com> Co-authored-by: chunxiaozheng <55471457+chunxiaozheng@users.noreply.github.com>
This PR adds the support for FalconH1. The new Hybrid Falcon series of models developped at Technology Innovation Institute. More documentations and details are coming soon. @DarkLight1337 @tlrmchlsmth