Skip to content

Basic attention kernel that supports cached KV + (multi-)prompts#24

Merged
suquark merged 25 commits intomainfrom
mixed_attn_kernel
Apr 5, 2023
Merged

Basic attention kernel that supports cached KV + (multi-)prompts#24
suquark merged 25 commits intomainfrom
mixed_attn_kernel

Conversation

@suquark
Copy link
Copy Markdown
Contributor

@suquark suquark commented Apr 3, 2023

This PR implements a basic and not highly optimized kernel to support cached KV with multiple import prompts.

@suquark suquark requested a review from WoosukKwon April 3, 2023 12:28
Copy link
Copy Markdown
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your effort. Left minor comments.

@suquark suquark requested a review from WoosukKwon April 3, 2023 22:37
@suquark
Copy link
Copy Markdown
Contributor Author

suquark commented Apr 4, 2023

@WoosukKwon any comments?

@WoosukKwon
Copy link
Copy Markdown
Collaborator

Thanks @suquark for this effort. Let's merge this!

@suquark suquark merged commit 21b3671 into main Apr 5, 2023
@suquark suquark deleted the mixed_attn_kernel branch April 5, 2023 03:34
slyalin pushed a commit to slyalin/vllm that referenced this pull request Apr 4, 2024
z103cb referenced this pull request in z103cb/opendatahub_vllm May 9, 2024
`format.sh` now has mypy checks after pulling in upstream changes. This
PR makes the mypy suggested modifications to our code.

---------

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
z103cb referenced this pull request in z103cb/opendatahub_vllm May 9, 2024
`format.sh` now has mypy checks after pulling in upstream changes. This
PR makes the mypy suggested modifications to our code.

---------

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
z103cb referenced this pull request in opendatahub-io/vllm May 9, 2024
`format.sh` now has mypy checks after pulling in upstream changes. This
PR makes the mypy suggested modifications to our code.

---------

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
njhill pushed a commit to njhill/vllm that referenced this pull request Nov 6, 2024
heheda12345 added a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
yma11 added a commit to yma11/vllm that referenced this pull request Nov 14, 2025
* fix vision attn for whisper/llama4

Signed-off-by: Yan Ma <yan.ma@intel.com>

* address comments

Signed-off-by: Yan Ma <yan.ma@intel.com>

---------

Signed-off-by: Yan Ma <yan.ma@intel.com>
dik654 pushed a commit to dik654/vllm-for-study that referenced this pull request Nov 18, 2025
New Industry Use Cases (vllm-project#21-30):
- vllm-project#21 Game Development: AI game testing + balance tuning
- vllm-project#22 Construction: Vision AI safety inspection
- vllm-project#23 Agriculture/Smart Farm: Crop monitoring + pest detection
- vllm-project#24 Government/Public: Document automation + citizen services
- vllm-project#25 Energy/Utilities: Grid monitoring + anomaly detection
- vllm-project#26 Environment/Sustainability: Carbon tracking + ESG reporting
- vllm-project#27 Fashion/Apparel: Trend analysis + inventory optimization
- vllm-project#28 Sports/Fitness: Performance analytics + tactical analysis
- vllm-project#29 Automotive/Mobility: Autonomous driving simulation
- vllm-project#30 Space/Aerospace: Satellite image analysis

Advanced Architecture Patterns:
1. Event-Driven Pattern: Webhook → Event Bus → Agent triggers
2. Streaming Pattern: Large dataset processing with chunking
3. Batch Processing Pattern: Celery-based parallel processing
4. Circuit Breaker Pattern: Fault tolerance + auto recovery
5. CQRS + Event Sourcing: Command/Query separation
6. Saga Pattern: Distributed transaction management

Guide now covers:
- 30+ industry-specific MCP implementations
- 6 production-ready architecture patterns
- Real-world scalability solutions
- Enterprise integration strategies
- Total: 8,672 lines (from 7,249)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants