Skip to content

fix(jtk): prevent wiki markup conversion from mangling hyphens and tildes#178

Merged
zzwong merged 14 commits intomainfrom
fix/jtk-wiki-markup-false-positives
Mar 13, 2026
Merged

fix(jtk): prevent wiki markup conversion from mangling hyphens and tildes#178
zzwong merged 14 commits intomainfrom
fix/jtk-wiki-markup-false-positives

Conversation

@zzwong
Copy link
Copy Markdown
Contributor

@zzwong zzwong commented Mar 13, 2026

Summary

  • Fix false positive wiki markup detection that caused markdown descriptions with ## Heading to be routed through wiki-to-markdown conversion
  • Fix overly aggressive strikethrough/subscript/underline/superscript regex patterns that matched inside compound words like signal-webapp-frontend and three~tier
  • Extract shared replaceWikiFormatting helper and hoist regex compilation to package level

Problem

When creating issues with markdown descriptions containing compound words:

jtk issues create --description "## Overview\n\nDeploy signal-webapp-frontend"

The output was mangled:

  • signal-webapp-frontendsignal~~webapp~~frontend → rendered as signal<s>webapp</s>frontend
  • three~tierthree<sub>tier</sub>
  • 2026-03-12-design.md2026<sub>03</sub><sub>12</sub>~design.md

Root cause 1: looksLikeWikiNumberedList counted markdown ## Heading lines as wiki # list-item lines, triggering wiki-to-markdown conversion on pure markdown input.

Root cause 2: Wiki formatting patterns (-text- for strikethrough, ~text~ for subscript) matched inside compound words because they only checked for non-whitespace content, not word boundaries.

Fix

  1. looksLikeWikiNumberedList now returns false immediately when any ## heading is present. Only counts single-# lines as potential wiki numbered lists.
  2. All four inline formatting patterns (strike, underline, sub, sup) now require whitespace or string boundaries around delimiters.
  3. Extracted replaceWikiFormatting helper to eliminate code duplication and moved regex compilation to package-level var block.

Test plan

  • New TestIsWikiMarkup_MarkdownHeadings — verifies ## Heading not detected as wiki
  • New TestWikiToMarkdownPreservesMarkdown cases — hyphenated words, tildes, file paths
  • New TestConvertWikiTextFormatting_EdgeCases — 9 cases: start-of-line, end-of-string, consecutive, compound words, file paths
  • All existing tests pass (zero regressions)
  • Full go test ./... passes across all packages

zzwong added 7 commits March 13, 2026 16:24
…ldes

Markdown descriptions containing compound words like `signal-webapp-frontend`
or `three~tier` were being corrupted because:

1. `looksLikeWikiNumberedList` treated `## Heading` as wiki numbered lists,
   causing markdown input to be routed through wiki-to-markdown conversion.
2. Wiki text formatting patterns for strikethrough (`-text-`) and subscript
   (`~text~`) matched inside compound words and file paths.

Fixes:
- `looksLikeWikiNumberedList` now returns false when `## ` headings are present
  (these are always markdown, never wiki). Only counts single-`#` lines.
- Strikethrough, subscript, underline, and superscript patterns now require
  whitespace or string boundaries around delimiters.
- Extract `replaceWikiFormatting` helper and hoist regex compilation to package
  level (was recompiling inside closures on every match).
- looksLikeWikiNumberedList now checks for ANY multi-hash heading (##, ###,
  etc.) and requires consecutive # lines (no blank lines between them) to
  distinguish wiki numbered lists from markdown headings
- Add edge case tests: multiple h1 headings, h3 headings, tilde with numbers,
  punctuation-adjacent formatting, tab/newline whitespace
Replace goldmark extension.Strikethrough with hugo-goldmark-extensions/extras
which natively parses ~sub~, ^sup^, ~~del~~, and ++ins++ syntax. The ADF
converter now emits spec-compliant marks:

- ~text~ -> {"type": "subsup", "attrs": {"type": "sub"}}
- ^text^ -> {"type": "subsup", "attrs": {"type": "sup"}}
- ~~text~~ -> {"type": "strike"}
- ++text++ -> {"type": "underline"}

Previously, subscript/superscript/underline were converted to HTML tags
(<sub>, <sup>, <u>) by the wiki converter, which goldmark parsed as RawHTML
nodes that the ADF converter silently dropped.

Changes:
- shared/adf/convert.go: swap extension.Strikethrough for extras extension,
  add extrasKindToMark() mapping extras AST kinds to ADF marks
- wiki.go: remove sub/sup HTML conversion (goldmark handles natively),
  convert wiki +text+ to ++text++ for goldmark extras Insert extension
- Remove unused wikiSub/wikiSup regex patterns
- Document WikiToMarkdown as intended for MarkdownToADF/goldmark-extras
  consumption, not standalone markdown renderers (finding 2)
- Document punctuation-adjacent formatting limitation in
  replaceWikiFormatting comment (finding 3)
- Add test: adjacent h1 without blank line is intentionally treated as
  wiki numbered list (findings 1, 4)
- Add tests: WikiToMarkdown sub/sup passthrough and underline conversion
  for goldmark extras (finding 2)
Copy link
Copy Markdown

@monit-reviewer monit-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review

Reviewed commit: 4badf05

Summary

Reviewer Findings
security:code-auditor 2
security:code-auditor (2 findings)

⚠️ Should Fix - tools/jtk/api/wiki.go:14

Adjacent formatted spans separated by a single space will only partially convert. The regex (?:^|\s)-...-(?:\s|$) consumes surrounding whitespace, so -one- -two- only converts the first span. The existing 'consecutive strikethrough' test passes because it uses and between spans, not a single space. This is a real behavioral bug for single-space-separated adjacent spans.

💡 Suggestion - tools/jtk/api/wiki.go:88

The ## guard checks len(trimmed) >= 3 but a bare ## token (length 2) would slip past it. The correct bound is len(trimmed) >= 2.

1 info-level observations excluded. Run with --verbose to include.


Completed in 2m 29s | $0.32 | sonnet
Field Value
Model sonnet
Reviewers hybrid-synthesis, security:code-auditor
Reviewed by pr-review-daemon · monit-pr-reviewer
Duration 2m 29s (Reviewers: 2m 17s · Synthesis: 18s)
Cost $0.32
Tokens 62.2k in / 9.1k out
Turns 2

- Fix adjacent single-space-separated spans (-one- -two-) where the first
  match consumed the shared whitespace boundary. Run replacement twice to
  catch spans whose leading whitespace was consumed by the prior match.
- Fix ## guard: len >= 2 not >= 3, so bare "##" token is caught.
- Add test for adjacent strikethrough with single space.
@zzwong zzwong dismissed monit-reviewer’s stale review March 13, 2026 21:17

Both findings addressed in fddc6f7: adjacent spans fixed with double-pass replacement, ## length guard fixed.

zzwong added 6 commits March 13, 2026 17:19
- Widen formatting boundary from whitespace-only to whitespace + common
  punctuation (parens, brackets, quotes). Patterns like (-deleted-) now
  convert correctly while compound words are still protected.
- Add inline comment explaining consecutive h1 tradeoff in
  looksLikeWikiNumberedList.
- Add end-to-end TestMarkdownToADF_CompoundWordsEndToEnd covering the
  original bug: markdown with signal-webapp-frontend, file paths, and
  three-tier through the full MarkdownToADF pipeline.
- Expand comment on ## heuristic explaining Jira nested numbered list tradeoff
- Document intentional before/after boundary asymmetry (opening vs closing punctuation)
- Add test for superscript in compound word (x^2^y) to match existing tilde test
- Add test locking period-before-delimiter boundary behavior
The standard parser (for plain markdown) now uses extras.Delete only
(double-tilde ~~text~~ strikethrough). Subscript, superscript, and
insert are only enabled in the wiki parser, used when input is
detected as Jira wiki markup.

This prevents even-tilde compound words like "signal~webapp~frontend"
from being mangled by goldmark subscript processing in non-wiki input.

Also addresses review findings:
- Fix variable shadowing: rename loop var t -> s in end-to-end test
- Add ASCII delimiter assumption comment on replaceWikiFormatting
- Clarify ^ anchor scope in boundary pattern comments
- Add even-tilde and even-caret compound word tests
Coverage gaps filled:
- Shared-package parser split tests: ToDocument vs ToDocumentWiki vs ToJSON
  proving standard parser omits subsup/underline, wiki parser produces them
- Inline-only wiki formatting (H~2~O, x^2^, +important+) through
  MarkdownToADF: proves auto-detection correctly falls through to safe parser
- Nested wiki ## under # false negative test lock
- Square bracket boundary test ([-deleted-])

Fixes:
- Add ] to wikiBoundaryAfter so [-deleted-] converts correctly
- Strengthen WikiToMarkdown doc comment: explicitly notes pipeline coupling
  to adf.ToDocumentWiki and warns callers MUST use wiki parser
Adds contract note: auto prioritizes not corrupting plain markdown
over detecting wiki edge cases. Callers that know the input format
should bypass heuristics via WikiToMarkdown + adf.ToDocumentWiki.
Rename makes the pipeline coupling explicit: the output is not
general-purpose markdown but a dialect tuned for adf.ToDocumentWiki.

Strengthen CompoundWordsEndToEnd test to verify compound words appear
within single ADF text nodes rather than just checking concatenated text.

Also document ToJSON parser choice to match ToDocument/ToDocumentWiki.
Copy link
Copy Markdown

@monit-reviewer monit-reviewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR Review

Reviewed commit: 2c674e6

Summary

Reviewer Findings
security:code-auditor 3
security:code-auditor (3 findings)

⚠️ Should Fix - tools/jtk/api/wiki.go:165

WikiToADFMarkdown is exported but its output is only safe to parse with adf.ToDocumentWiki. If any caller uses adf.ToDocument on the output, text and ^text^ won't produce ADF marks and will appear as raw characters. There is no compile-time enforcement of this contract. Consider either unexporting the function or returning a typed wrapper that forces the correct parser.

💡 Suggestion - tools/jtk/api/wiki.go:47

wikiStrikeInner uses [^-]+ which is more permissive than the outer pattern's [^\s-][^-]*[^\s-]. If the outer regex ever matched a span the inner doesn't expect, FindStringSubmatch could return an unexpected capture. The two patterns should be aligned or the inner derived from the outer to avoid silent mismatches.

💡 Suggestion - tools/jtk/api/wiki_test.go:315

TestConvertWikiTextFormatting_EdgeCases does not cover strikethrough applied to text that is itself a valid markdown construct (e.g., "-code-" or "-bold-"). These exist in real Jira wiki content and the boundary regex may or may not handle them correctly.


Completed in 3m 12s | $0.30 | sonnet
Field Value
Model sonnet
Reviewers hybrid-synthesis, security:code-auditor
Reviewed by pr-review-daemon · monit-pr-reviewer
Duration 3m 12s (Reviewers: 3m 02s · Synthesis: 16s)
Cost $0.30
Tokens 67.0k in / 11.7k out
Turns 2

@zzwong zzwong merged commit 478b028 into main Mar 13, 2026
7 checks passed
@zzwong zzwong deleted the fix/jtk-wiki-markup-false-positives branch March 13, 2026 22:41
rianjs added a commit that referenced this pull request Mar 28, 2026
Catches up documentation for recent features and fixes:
- README: document --fields flag, auto-pagination, fields command group,
  users get subcommand, --assignee none, multi-value --field, and
  escape sequences in comment --body
- CHANGELOG: add entries for PRs #178, #180, #182, #186-189
- integration-tests: add test cases for auto-pagination, --fields,
  users get, --assignee none, multi-value --field, and escape sequences

Closes #183
rianjs added a commit that referenced this pull request Mar 28, 2026
#190)

Catches up documentation for recent features and fixes:
- README: document --fields flag, auto-pagination, fields command group,
  users get subcommand, --assignee none, multi-value --field, and
  escape sequences in comment --body
- CHANGELOG: add entries for PRs #178, #180, #182, #186-189
- integration-tests: add test cases for auto-pagination, --fields,
  users get, --assignee none, multi-value --field, and escape sequences

Closes #183
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants