Skip to content

Document nori default XPN stoptag behavior for Korean prefixes#151157

Merged
john-wagster merged 1 commit into
elastic:mainfrom
Incheonkirin:docs-nori-xpn-prefix-warning
Jun 15, 2026
Merged

Document nori default XPN stoptag behavior for Korean prefixes#151157
john-wagster merged 1 commit into
elastic:mainfrom
Incheonkirin:docs-nori-xpn-prefix-warning

Conversation

@Incheonkirin

Copy link
Copy Markdown
Contributor

Adds a warning to the nori_part_of_speech token filter docs: the default stoptags include XPN, so a meaning-carrying Korean prefix emitted as XPN is removed by the default nori analyzer. For example, 비급여 (non-covered) is analyzed as 급여 (covered), so the index cannot distinguish the two.

The warning shows the _analyze output and two remedies: registering such terms via user_dictionary_rules (preserves the full term as a single noun token), or defining a custom stoptags list that omits XPN (preserves prefix tokens, with possible prefix noise).

Closes #151094.

@Incheonkirin Incheonkirin requested a review from a team as a code owner June 14, 2026 00:02
@elasticsearchmachine elasticsearchmachine added v9.5.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jun 14, 2026
@john-wagster john-wagster self-assigned this Jun 15, 2026
@john-wagster john-wagster added :Search Relevance/Analysis How text is split into tokens >docs General docs changes labels Jun 15, 2026
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed needs:triage Requires assignment of a team area label labels Jun 15, 2026
@elasticsearchmachine

Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@john-wagster john-wagster left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@john-wagster

Copy link
Copy Markdown
Contributor

@Incheonkirin this looks good to me; thank you much for the docs update. I'll wait a bit in case the docs folks want to comment but otherwise will merge this shortly.

@shainaraskas

shainaraskas commented Jun 15, 2026

Copy link
Copy Markdown
Member

@john-wagster @Incheonkirin docs here - this is ok to merge. this is not the best spot for a big warning, but the page would need to be restructured a bit to make a better spot available - not worth the effort vs. getting this info in.

@john-wagster

Copy link
Copy Markdown
Contributor

@elasticmachine test this

@john-wagster john-wagster merged commit d666a6f into elastic:main Jun 15, 2026
13 checks passed
@Incheonkirin

Copy link
Copy Markdown
Contributor Author

Thanks @john-wagster and @shainaraskas for the quick review and merge — agreed that getting the caveat documented matters more than its placement here.

valeriy42 pushed a commit to valeriy42/elasticsearch that referenced this pull request Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>docs General docs changes external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document nori default XPN stop tag behavior for Korean prefixes

4 participants