[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default).#26414
Conversation
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
|
Documentation preview: https://vllm--26414.org.readthedocs.build/en/26414/ |
Signed-off-by: wang.yuqi <noooop@126.com>
|
examples/online_serving/pooling/openai_embedding_embed_dtype_client.py Do you ok with this api? Yes, this PR can even use fp8. The small-scale test results are quite good. A more detailed test will be provided tomorrow. |
|
@noooop Yes There are also optional enhancements: binary protocols, such as those in Postgres, always expect big-endian binary numbers; this is generally the de facto network standard for almost all binary protocols, but models typically operate in little-endian format; byte order conversion is always necessary. Adding the endian parameter also becomes useful. |
|
cc @DarkLight1337 @maxdebayser Ready for review
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
maxdebayser
left a comment
There was a problem hiding this comment.
Awesome. I've left a few comments but this looks good to me.
|
Is there anything else that needs to be modified in this PR? |
DarkLight1337
left a comment
There was a problem hiding this comment.
Nope, this LGTM now thanks
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>
|
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…e64 (Still uses fp32 by default). (vllm-project#26414) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Improve all pooling task
These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?
Purpose
FIX #26248
mteb test PTAL #17175
https://github.com/noooop/snippet/blob/main/benchmarks/test_mteb/test_embed_dtype.py
float32 ≈ float16 > bfloat16 > fp8_e4m3 >> fp8_e5m2
Even with fp8_e5m2, the gap is smaller than imagined.
Test Plan
tests/entrypoints/pooling/openai/test_embedding.py
tests/entrypoints/pooling/openai/test_pooling.py
Test Result
pass
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.