Skip to content

多轮对话数据集load时候key error #2095

@Browallia

Description

@Browallia

[rank0]: File "/root/LLM/ms-swift/swift/llm/utils/preprocess.py", line 45, in new_preprocess
[rank0]: row = preprocess(self, row)
[rank0]: File "/root/LLM/ms-swift/swift/llm/utils/preprocess.py", line 232, in preprocess
[rank0]: conversations = d[self.conversations_key]
[rank0]: File "/root/miniconda3/envs/swift/lib/python3.9/site-packages/datasets/formatting/formatting.py", line 271, in getitem
[rank0]: value = self.data[key]
[rank0]: KeyError: 'messages'

数据集格式为
{"messages": [{"role": "user", "content": "aaaaa"}, {"role": "assistant", "content": "bbbbb"}, {"role": "user", "content": "ccccc"}, {"role": "assistant", "content": "ddddd"}]}

应该是这个语句报错了 上面一个preprocess 解析出来的dataset没有这个key 请问这里为什么还要preprocess一次

dataset_list.append(preprocess_func(dataset))

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions