Skip to content

Fix: decode KV cache layout#6

Merged
ZYHowell merged 1 commit intomainfrom
pr-fix-decode-kv-layout
Aug 12, 2024
Merged

Fix: decode KV cache layout#6
ZYHowell merged 1 commit intomainfrom
pr-fix-decode-kv-layout

Conversation

@ivanium
Copy link
Copy Markdown
Owner

@ivanium ivanium commented Aug 11, 2024

This PR fixes _get_decode_local_lens and uses it to initialize flashinfer decode kernel correctly.

Below are the correctness test results (output-len=40). Basically, the initial several tokens are the same but outputs can slowly diverge (probably due to numerical errors, which is also observed in pure TP settings).

Single GPU:

<|begin_of_text|>The capital of France is a city of many faces. It is a city of history, a city of culture, a city of art, a city of fashion, a city of gastronomy, a city of architecture, a city
<|begin_of_text|>The capital of the United Kindom is London. It is the largest city in the UK and the largest city in the European Union. London is the most visited city in the world. It is the most visited city in the world. It is
<|begin_of_text|>Today is a sunny day and I like to go out for a walk. I am going to the park. I am going to play with my friends. I am going to play with my friends. I am going to play with my friends.

TP2SP2:

<|begin_of_text|>The capital of France is a city of many faces. It is a city of history, a city of culture, a city of art, a city of fashion, a city of love, a city of food, a city of
<|begin_of_text|>The capital of the United Kindom is London. It is the largest city in the United Kingdom and the largest metropolitan area in the European Union. It is also the most populous city in the European Union. London is a leading global city and one
<|begin_of_text|>Today is a sunny day and I like to go out for a walk. I am going to the park. I am going to play with my friends. I am going to the park. I am going to play with my friends. I am

@ZYHowell ZYHowell merged commit 639e716 into main Aug 12, 2024
@ivanium ivanium deleted the pr-fix-decode-kv-layout branch September 6, 2024 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants