Skip to content

fix: handle non-UTF-8 files in OpenClaw migration script#8898

Open
chancelu wants to merge 1 commit into
NousResearch:mainfrom
chancelu:fix/migration-unicode-decode-error
Open

fix: handle non-UTF-8 files in OpenClaw migration script#8898
chancelu wants to merge 1 commit into
NousResearch:mainfrom
chancelu:fix/migration-unicode-decode-error

Conversation

@chancelu

Copy link
Copy Markdown

Problem

On Windows systems with non-UTF-8 default encodings (e.g. GBK on Chinese Windows), hermes claw migrate crashes with UnicodeDecodeError when the source OpenClaw directory contains:

  • Binary files (.jpg, .ogg, .sqlite) in media/, memory/, or workspace/
  • Text files with mixed or non-UTF-8 encodings (e.g. log files written by Windows processes)

The error occurs during the preview phase, so the migration cannot even show what would be imported:

✗ Migration preview failed: 'utf-8' codec can't decode byte 0xb3 in position 23114: invalid start byte

Root Cause

Several file-reading functions in openclaw_to_hermes.py use path.read_text(encoding="utf-8") without error handling. When any file in the OpenClaw directory tree contains non-UTF-8 bytes, the entire migration aborts.

Fix

  • read_text(): add errors="replace" to gracefully substitute invalid bytes with U+FFFD instead of crashing
  • load_yaml_file(): catch UnicodeDecodeError alongside yaml.YAMLError
  • parse_env_file(): catch UnicodeDecodeError when reading .env files
  • load_openclaw_config(): catch UnicodeDecodeError alongside json.JSONDecodeError
  • migrate_daily_memory(): add per-file try/except to skip unreadable files instead of aborting the entire loop

Testing

Tested on Windows 11 Pro (Chinese locale, GBK default encoding) with an OpenClaw directory containing .jpg, .ogg, .sqlite, and mixed-encoding .md files. Before the fix, hermes claw migrate --dry-run crashed immediately. After the fix, it completes successfully and correctly identifies 44 items for migration.

On Windows systems with non-UTF-8 default encodings (e.g. GBK on Chinese
Windows), the migration script crashes with UnicodeDecodeError when the
OpenClaw directory contains binary files (images, SQLite databases) or
text files with mixed encodings.

Changes:
- read_text(): use errors="replace" to gracefully handle non-UTF-8 bytes
- load_yaml_file(): catch UnicodeDecodeError alongside YAMLError
- parse_env_file(): catch UnicodeDecodeError when reading .env files
- load_openclaw_config(): catch UnicodeDecodeError alongside JSONDecodeError
- migrate_daily_memory(): add per-file error handling to skip unreadable files

Tested on Windows 11 (Chinese locale, GBK default) with an OpenClaw
directory containing .jpg, .ogg, .sqlite, and mixed-encoding .md files.
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to #9145 — both fix the same UnicodeDecodeError in OpenClaw migration (#8901). #9145 was already triaged as a duplicate of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants