Fix/windows hook stdio utf8#1280
Conversation
- 添加 mempalace.yaml 到 .gitignore - 添加 entities.json 到 .gitignore - 为 MemPalace 项目文件添加注释说明 - 解决 issue MemPalace#185 中提到的问题
- 实现了读取工具:状态查询、翅膀/房间列表、分类获取、语义搜索、重复检查 - 实现了写入工具:抽屉添加/删除/更新、知识图谱操作、代理日记功能 - 集成了 ChromaDB 后端和知识图谱存储 - 添加了写前日志(WAL)用于审计和回滚追踪 - 实现了向量搜索容量检测和禁用机制以防止崩溃 - 添加了标准输入输出保护避免 JSON-RPC 协议损坏 - 实现了缓存机制和文件系统变更检测以保持数据一致性 - 添加了 AAAK 记忆方言规范和宫殿协议定义
- 实现了完整的 ChromaCollection 适配器类,提供标准化的数据库操作接口 - 添加了 HNSW 索引优化配置,防止大规模数据插入时的 link_lists.bin 文件膨胀 - 实现了 HNSW 段健康检查和隔离机制,自动检测并隔离损坏的索引段 - 添加了安全的 pickle 反序列化机制,防止恶意文件执行任意代码 - 实现了 BLOB 序列ID到整数的迁移修复,解决 ChromaDB 0.6.x 到 1.5.x 升级问题 - 添加了向量搜索容量状态检查,预防 HNSW 索引与 SQLite 数据库之间的数据不一致 - 实现了客户端缓存和文件系统新鲜度检查,确保重建后能检测到新的数据库状态 - 添加了多线程安全的 HNSW 配置修复,防止并发写入时的竞争条件问题
|
Thanks @yangshare — this is the strict superset of the Windows UTF-8 stdio fixes and we'd like to make it the canonical for mcp_server + hooks_cli (closing #1259 as a duplicate pointing here). Two requests before merge into v3.3.5: 1. Please split out the unrelated changes into a separate PR:
These are legitimate but don't belong in a stdio-encoding fix. A clean diff makes review and revert much safer. 2. After the split, rebase against current Note: The CLI / fact_checker reconfigure side is being landed via #1282 (no overlap with this PR — they're complementary). Together they close out #1241/#1242/#1122/#1296. Once split + rebased, I'll authorize CI on the fork and we can merge. |
The `python -m mempalace.fact_checker --stdin` entry point reads non-ASCII text through the system ANSI codepage (cp1252/cp1251/cp950) on Windows, which mojibakes characters before claim-extraction sees them. Reconfigure stdin/stdout/stderr to UTF-8 with `errors="strict"`, wrapped in try/except so a replaced stream (Jupyter, test harness) logs a warning rather than crashing the CLI. Mirrors the same fix shipped for `mcp_server.py:main()` (MemPalace#400) and `hooks_cli.py:run_hook()` (MemPalace#1280) -- this is the third and last stdin-reading entry point in the package.
这个修复把 #363/#400 的 Windows UTF-8 stdin 处理扩展到 Claude/Codex hook 入口,避免非 ASCII hook payload 在 json.load(sys.stdin) 前被系统 ANSI codepage
解码坏。