refactor(hwpx): whitelist strikeout shapes instead of blacklist#154
Merged
edwardkim merged 1 commit intoApr 16, 2026
Merged
Conversation
Problem
-------
parser/hwpx/header.rs:293 treats any shape != "NONE" | "3D" as a real
strikethrough:
cs.strikethrough = !matches!(val.as_str(), "NONE" | "3D");
This is a blacklist of known placeholders. It is correct for the 3D
case Hancom emits today, but fails open: any future placeholder value
Hancom adds (e.g. "4D", "Ghost", internal experiments) will be
misinterpreted as a real strike line, and entire body-text paragraphs
will render with strikethrough.
Fix
---
Flip the predicate to a whitelist of OWPML LineSym2 values that shape.rs
already recognises (SOLID, DASH, DOT, DASH_DOT, DASH_DOT_DOT, LONG_DASH,
CIRCLE, DOUBLE_SLIM, SLIM_THICK, THICK_SLIM, SLIM_THICK_SLIM, WAVE,
DOUBLE_WAVE — 13 entries, same set as the enum mapping immediately
below). Unknown values — including NONE, 3D, and any future Hancom
placeholder — are fail-closed to no-strike.
Extracted into a `pub(crate) fn is_real_strike_shape(&str) -> bool`
so the predicate is independently testable and can be reused if other
parsers (e.g. underline, which has the same blacklist risk) adopt the
same approach later.
Evidence
--------
MDM (https://github.com/seunghan91/markdown-media) hit the exact same
bug earlier this month. Hancom's HWPX exporter writes <hh:strikeout
shape="3D"/> as a placeholder default on body-text charPr definitions.
Treating "anything but NONE" as strike caused press-release bodies
(e.g. "251113 벤처투자 보도자료") to render wrapped in ~~...~~.
Our fix (MDM commit ae15dd8) moved to the whitelist approach and
verified across 21 MOIS HWPX fixtures: zero false strikes remained.
The same logic fits rhwp one-for-one, so we're upstreaming it.
Tests
-----
Five new unit tests in parser::hwpx::header::tests:
- all 13 valid OWPML LineSym2 shapes → true
- "NONE" → false
- "3D" → false (the original bug)
- fail-closed cases: "4D", "Ghost", "", "solid" (case-sensitive)
Full suite: 789 passed, 0 failed.
No behaviour change for files Hancom currently emits — the set of
shapes that produce strike=true is identical to the old blacklist
path for every value shape.rs knows about. The only difference is
forward-compatibility: unknown future placeholders no longer get
mis-rendered as strike.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
edwardkim
approved these changes
Apr 16, 2026
edwardkim
left a comment
Owner
There was a problem hiding this comment.
로컬 검증 완료 ✅
cargo test789개 전체 통과 (strikeout 테스트 4개 포함)- 블랙리스트 → 화이트리스트 전환으로 fail-closed 보장
- OWPML LineSym2 13종만 취소선으로 인정, 미지의 placeholder는 안전하게 무시
- 문서화와 테스트 모두 충실합니다
좋은 리팩토링입니다! 👍
This was referenced Apr 16, 2026
edwardkim
added a commit
that referenced
this pull request
Apr 19, 2026
본 v0.5.0 → v0.7.3 (라이브러리) / 0.2.0 (확장) 배포 주기에 머지된 외부 기여자 6명을 README 3개 변경 이력에 추가: - @ahnbu — Ctrl+S file handle (PR #189, 기명시) - @bapdodi — 회전 도형 리사이즈 + Flip (PR #192) - @dreamworker0 — Windows CFB 경로 (PR #152) - @marsimon — HWP 그림 효과 SVG (PR #149) - @postmelee — 썸네일 + options CSP (PR #168) - @seunghan91 — HWPX Serializer + 다수 (PR #170, #161, #163, #153, #154) 각 README 끝에 "기여자 감사" 섹션 추가 — 6명 일괄 인정. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
요약
parser/hwpx/header.rs:293의 strikeout 판정 로직을 블랙리스트 → 화이트리스트로 전환합니다.문제
현재 코드는 "shape가 NONE/3D 외이면 모두 취소선"으로 가정합니다. 지금 한컴이 내보내는 placeholder(
shape="3D")는 처리하지만, fails open입니다:4D,Ghost, 내부 실험 값) 를 추가하면 다시 본문 전체가 취소선으로 렌더링됩니다.수정
is_real_strike_shape(&str) -> bool을pub(crate)헬퍼로 추출. OWPMLLineSym2열거 —shape.rs의strike_shape매핑에 이미 있는 13종(SOLID,DASH,DOT,DASH_DOT,DASH_DOT_DOT,LONG_DASH,CIRCLE,DOUBLE_SLIM,SLIM_THICK,THICK_SLIM,SLIM_THICK_SLIM,WAVE,DOUBLE_WAVE) — 만true.NONE/3D/미지 값은 모두 fail-closed로false(no-strike).기존 파일에 대한 동작 변화는 없습니다.
shape.rs가 알고 있는 값에 대해서는 블랙리스트와 화이트리스트가 동일한 결과를 냅니다. 달라지는 건 오직 "알 수 없는 값 처리": 이전엔 strike=true, 이제는 strike=false.근거
저희 쪽 MDM (markdown-media) 파서도 같은 버그를 먼저 겪었습니다. 한컴 HWPX 익스포터는 본문 charPr 기본값으로
<hh:strikeout shape=\"3D\"/>를 뿌리는데, "NONE 아니면 전부 취소선" 로직 때문에 보도자료 본문 전체가~~...~~로 감싸졌습니다.MDM commit
ae15dd8에서 동일한 화이트리스트 방식으로 고쳤고, 21개 MOIS HWPX 픽스처에서 false strike가 0건임을 확인했습니다. 로직이 작고 rhwp 구조에 그대로 맞아 역기여합니다.테스트
parser::hwpx::header::tests에 5개 유닛 테스트 추가:test_is_real_strike_shape_valid_shapes— 13종 OWPML 값 모두truetest_is_real_strike_shape_placeholder_none—\"NONE\"→falsetest_is_real_strike_shape_placeholder_3d—\"3D\"→false(원 버그)test_is_real_strike_shape_unknown_fail_closed—\"4D\",\"Ghost\",\"\",\"solid\"(대소문자) → 모두false전체:
cargo test --lib→ 789 passed, 0 failed (기존 784 + 신규 5).참고
underline블록(바로 위, header.rs:264-282)도 같은 구조의 코드지만 블랙리스트가 없어 이번 PR 범위에서 제외했습니다. 필요하시면 후속 PR로 헬퍼 이름/위치를 맞출 수 있습니다.🙇 감사합니다!