server : simplify state machine for slot#9283
Conversation
|
Actually I got timeout error locally, due to the time it takes to download the model from internet (we use Phi-2 model for passkey test) |
|
Benchmark with master (with PR: |
|
@ggerganov FYI, this PR also fixed the point about
As you see on the benchmark above, |
|
Is it normal for the |
No, it should not. I re-ran the test and I updated the result, so (I was having some network errors during the initial run, so that's why it had less tokens) |
slaren
left a comment
There was a problem hiding this comment.
Looks good to me, but I don't really know the server code well enough to review this.
|
According to 040fdde there was an address sanitizer issue. AFAICT the commit should not make a difference, except avoiding the copy of the task. Is the address sanitizer problem still present and how to reproduce? |
|
@ggerganov Thanks for pointing that out. You're right, indeed, there was a missing lock in Beside, I changed |
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : simplify state machine for slot * add SLOT_STATE_DONE_PROMPT * pop_deferred_task * add missing notify_one * fix passkey test * metrics : add n_busy_slots_per_decode * fix test step * add test * maybe fix AddressSanitizer? * fix deque ? * missing lock * pop_deferred_task: also notify * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : simplify state machine for slot * add SLOT_STATE_DONE_PROMPT * pop_deferred_task * add missing notify_one * fix passkey test * metrics : add n_busy_slots_per_decode * fix test step * add test * maybe fix AddressSanitizer? * fix deque ? * missing lock * pop_deferred_task: also notify * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : simplify state machine for slot * add SLOT_STATE_DONE_PROMPT * pop_deferred_task * add missing notify_one * fix passkey test * metrics : add n_busy_slots_per_decode * fix test step * add test * maybe fix AddressSanitizer? * fix deque ? * missing lock * pop_deferred_task: also notify * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : simplify state machine for slot * add SLOT_STATE_DONE_PROMPT * pop_deferred_task * add missing notify_one * fix passkey test * metrics : add n_busy_slots_per_decode * fix test step * add test * maybe fix AddressSanitizer? * fix deque ? * missing lock * pop_deferred_task: also notify * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Currently,
stateof a slot is actually controlled by 2 variables:slot.state(2 possible values) andslot.command(3 possible values). This makes the total number of states become 2x3=6, but some of them are in fact invalid states.This PR aims to simplify state machine of slot by unifying them into 4 states.
The state machine can be represent using the graph below:
graph TD; SLOT_STATE_IDLE-- new task -->SLOT_STATE_PROCESSING_PROMPT; SLOT_STATE_PROCESSING_PROMPT-- decode prompt -->SLOT_STATE_PROCESSING_PROMPT; SLOT_STATE_PROCESSING_PROMPT-- done processing prompt -->SLOT_STATE_DONE_PROMPT; SLOT_STATE_DONE_PROMPT-- is embedding -->SLOT_STATE_IDLE; SLOT_STATE_DONE_PROMPT-- is next-token prediction -->SLOT_STATE_GENERATING; SLOT_STATE_GENERATING-- decode next token -->SLOT_STATE_GENERATING; SLOT_STATE_GENERATING-- stop condition -->SLOT_STATE_IDLE;