Skip to content

Transcript JSONL encoding corrupted on Windows (GBK/UTF-8 mix) #56106

@lujun2508

Description

@lujun2508

Bug Report: Transcript JSONL encoding corrupted on Windows

Version

OpenClaw 2026.3.23 (ccfeecb)

Environment

  • OS: Windows_NT 10.0.26200
  • Node.js: v24.14.0
  • Channel: QQBot

Problem

Chinese characters in chat transcripts (.jsonl files under agents/main/sessions/) are stored corrupted. Characters appear as replacement character or garbled mojibake.

Root Cause

The transcript writer appears to encode Chinese text as GBK bytes first, then interpret those bytes as UTF-8, causing permanent corruption. The original text cannot be recovered.

Impact

  • Historical chat search returns no results for Chinese queries
  • Memory retrieval unreliable
  • All past conversations with Chinese content permanently unreadable in transcript files
  • memory_search finds matches but context is garbled

Example

User message containing "有奖原创" is stored as:

"content":[{"type":"text","text":"[QQBot] to=qqbot:c2c:6B1D6D9CF1C53871FC42C880DFD44DB5\n\n???? ?????? ????????????????"}]

Expected Behavior

Chinese text should be stored correctly as UTF-8 JSON.

Workaround

None — corrupted data cannot be recovered.

Suggested Fix

Ensure all transcript writes use UTF-8 encoding explicitly. On Windows, the default system encoding (GBK/CP936) must not be used. Force JSON.stringify() output to be written as UTF-8 bytes, or convert to a lossless encoding (e.g. escape unicode sequences) before writing.

Files Affected

  • agents/main/sessions/*.jsonl — all session transcripts
  • workspace/memory/*.md — daily memory files may also be affected

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions