feat: add Korean language support for memory search query expansion by ruypang · Pull Request #18899 · openclaw/openclaw

ruypang · 2026-02-17T05:23:00Z

Summary

Korean is the 3rd most common CJK language but had no stop words support in the query expansion module. This directly impacts memory search quality for Korean-speaking users.

Changes

Korean stop words (STOP_WORDS_KO): ~70 common Korean particles (조사), pronouns (대명사), auxiliary verbs, conjunctions, adverbs, vague time references, and question words
Hangul-aware tokenization: Detects Hangul syllables (\uAC00-\uD7AF) and jamo (\u3131-\u3163), splits on spaces, and strips common trailing particles (e.g. 서버에서 → 서버) to improve keyword extraction
Korean stop word filtering in extractKeywords() alongside existing EN and ZH checks
Tests for Korean queries: keyword extraction, particle stripping, stop word filtering, and mixed Korean/English queries

All existing tests continue to pass.

Greptile Summary

This PR adds Korean language support to the FTS (full-text search) query expansion module, completing CJK coverage alongside existing English and Chinese support. The implementation includes ~70 Korean stop words, Hangul-aware tokenization that detects syllables (\uAC00-\uD7AF) and jamo (\u3131-\u3163), and particle-stripping logic that removes common Korean grammatical particles (e.g., 서버에서 → 서버) to improve keyword extraction quality.

Korean stop words cover particles, pronouns, auxiliary verbs, conjunctions, adverbs, vague time references, and question words — matching the structure of the existing English and Chinese sets
Trailing particle stripping uses .toSorted() for robust longest-match-first ordering, with an isUsefulKoreanStem guard that prevents bogus single-syllable stems (e.g., 논의 is not incorrectly reduced to 논)
Both the original token and the stripped stem are emitted for FTS, maximizing match potential
8 new test cases cover keyword extraction, particle stripping, stop word filtering (including inflected forms), and mixed Korean/English queries
No issues found — the implementation is clean, follows existing patterns, and stays within repository coding guidelines

Confidence Score: 5/5

This PR is safe to merge — it adds additive functionality with no changes to existing behavior and comprehensive test coverage.
The changes are purely additive: new stop word sets, new tokenizer branch, and new tests. No existing code paths are modified in a way that could break current English or Chinese functionality. The Korean tokenizer branch is gated behind a Hangul regex check, so it only activates for Korean text. The .toSorted() usage is safe on the Node 22+ baseline. All edge cases (single-char stems, inflected stop words, mixed-language tokens) are handled with appropriate guards and tested.
No files require special attention

_{Last reviewed commit: b11191e}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 170524c567

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/memory/query-expansion.ts

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

src/memory/query-expansion.ts

ruypang · 2026-02-19T00:35:03Z

Friendly reminder — this PR is ready for review. All CI checks are passing. Would love to get a maintainer's eyes on this when you get a chance! 🙏

ruypang · 2026-02-20T02:11:44Z

Hi @gumadeiras @vincentkoc 👋 Just rebased onto latest main — all CI should be green. Would really appreciate a review when you get a chance. Happy to address any feedback. Thanks! 🙏

vincentkoc · 2026-02-22T01:20:33Z

Addressed the open review feedback and pushed follow-up fixes in 135fda872bb5dc5b14ede15aecce27d6653f03b2.

@greptileai review

vincentkoc · 2026-02-22T01:22:09Z

Changelog credit added for this PR in b11191e5df41fed6402b5f23888d161c7bd0e10a.

@greptileai review

…18899) * feat: add Korean stop words and tokenization for memory search * fix: address review comments on Korean query expansion * fix: lint errors - curly brace and toSorted * fix(memory): improve Korean stop words and deduplicate * Memory: tighten Korean query expansion filtering * Docs/Changelog: credit Korean memory query expansion --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>

…penclaw#18899) * feat: add Korean stop words and tokenization for memory search * fix: address review comments on Korean query expansion * fix: lint errors - curly brace and toSorted * fix(memory): improve Korean stop words and deduplicate * Memory: tighten Korean query expansion filtering * Docs/Changelog: credit Korean memory query expansion --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>

openclaw-barnacle bot added the size: S label Feb 17, 2026

chatgpt-codex-connector bot reviewed Feb 17, 2026

View reviewed changes

src/memory/query-expansion.ts Outdated Show resolved Hide resolved

src/memory/query-expansion.ts Outdated Show resolved Hide resolved

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

src/memory/query-expansion.ts Outdated Show resolved Hide resolved

src/memory/query-expansion.ts Show resolved Hide resolved

openclaw-barnacle bot added size: M and removed size: S labels Feb 17, 2026

ruypang added 3 commits February 20, 2026 11:10

feat: add Korean stop words and tokenization for memory search

2e4a267

fix: address review comments on Korean query expansion

8712273

fix: lint errors - curly brace and toSorted

ba569f7

ruypang force-pushed the feature/korean-query-expansion branch from fba41f4 to ba569f7 Compare February 20, 2026 02:11

ruypang and others added 3 commits February 22, 2026 00:08

fix(memory): improve Korean stop words and deduplicate

c801af0

Merge branch 'main' into feature/korean-query-expansion

b5f478a

Memory: tighten Korean query expansion filtering

135fda8

Docs/Changelog: credit Korean memory query expansion

b11191e

vincentkoc merged commit 853ae62 into openclaw:main Feb 22, 2026
24 of 25 checks passed

vincentkoc mentioned this pull request Feb 22, 2026

feat(memory): add Japanese query expansion support for FTS #23156

Merged

18 tasks

github-actions bot mentioned this pull request Feb 22, 2026

📡 Upstream Digest — 2026-02-22 04:09 UTC curtismercier/openclaw-mods#94

Open

This was referenced Feb 22, 2026

feat(memory): add Spanish and Portuguese FTS query expansion filtering #23710

Merged

feat(memory): add Arabic FTS query expansion filtering #23717

Merged

gemini-code-assist bot mentioned this pull request Feb 23, 2026

Sync/upstream 2026 02 23 QVerisAI/QVerisBot#70

Merged

gemini-code-assist bot mentioned this pull request Feb 23, 2026

Sync with remote: 2/23/2026 ArchitectVS7/OpenClaw#18

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Korean language support for memory search query expansion#18899

feat: add Korean language support for memory search query expansion#18899
vincentkoc merged 7 commits intoopenclaw:mainfrom
ruypang:feature/korean-query-expansion

ruypang commented Feb 17, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

ruypang commented Feb 19, 2026

Uh oh!

ruypang commented Feb 20, 2026

Uh oh!

vincentkoc commented Feb 22, 2026

Uh oh!

vincentkoc commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ruypang commented Feb 17, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Greptile Summary

Confidence Score: 5/5

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ruypang commented Feb 19, 2026

Uh oh!

ruypang commented Feb 20, 2026

Uh oh!

vincentkoc commented Feb 22, 2026

Uh oh!

vincentkoc commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ruypang commented Feb 17, 2026 •

edited by greptile-apps bot

Loading