Skip to content

Incorrect token usage with jump forward #173

@comaniac

Description

@comaniac

Since jump forward breaks a decoding process to multiple ones, the number of prompt_tokens and completion_tokens are incorrect. Here is an example:

Request:

regex = (r"""\{\n"""
    + r"""  "name": "[\w]{1,8}",\n"""
    + r"""  "description": "[\w\d\s]{1,64}"\n"""
    + r"""\}"""
)

response = requests.post(
    url + "/generate",
    json={
        "text": "Here is the info of France's capital: ",
        "sampling_params": {
            "temperature": 0,
            "max_new_tokens": 128,
            "regex": regex
        },
        "stream": True,
    },
    stream=True,
)

Streaming response by chunk:

Chunk (prompt 10, decode 1): {

Chunk (prompt 15, decode 1): {
  "name": "Paris

Chunk (prompt 22, decode 1): {
  "name": "Paris",
  "description": "Capital

Chunk (prompt 22, decode 2): {
  "name": "Paris",
  "description": "Capital city

Chunk (prompt 22, decode 10): {
  "name": "Paris",
  "description": "Capital city of France and one of the most beautiful

Chunk (prompt 37, decode 1): {
  "name": "Paris",
  "description": "Capital city of France and one of the most beautiful cities in"
}

Non-streaming response:

{'prompt_tokens': 37, 'completion_tokens': 1, 'id': '44f7ddf966de459da6954d0de1e4434d'}
{
  "name": "Paris",
  "description": "Capital city of France and one of the most beautiful cities in"
}

Note that the correct number of prompt tokens is 10 and the number of completion tokens is 28. We may fix the prompt token issue by taking the number of the first chunk, but we probably need to directly lookup the length of the final decoding output to fix the completion tokens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions