Skip to content

Commit 7f724a6

Browse files
Copilotquantizor
andauthored
Fix HTML block parsing for sibling elements like dt/dd without blank lines (#783)
* Initial plan * Fix HTML block parsing for sibling elements like dt/dd without blank lines When parsing Type 6 HTML blocks (like dl, dt, dd), the parser now correctly finds the matching closing tag by tracking nesting depth. This ensures that sibling elements like <dt></dt><dd></dd> are parsed correctly when there are no blank lines between them. The fix ensures: 1. Nested elements with the same tag name are handled correctly 2. Sibling elements at the same level are parsed as separate elements 3. Content after the matching closing tag that starts with another HTML tag triggers the block to end at the closing tag 4. CommonMark spec compliance is maintained for HTML blocks that should continue until a blank line Adds regression tests for description list parsing. Co-authored-by: quantizor <570070+quantizor@users.noreply.github.com> * Optimize nested while loop for opening tag validation Improved the code based on code review feedback by simplifying the validation logic and adding clearer comments. Co-authored-by: quantizor <570070+quantizor@users.noreply.github.com> * Add changeset for HTML sibling elements fix Added multilingual changeset (English, Chinese, Hindi) documenting the fix for HTML block parsing of sibling elements without blank lines. Co-authored-by: quantizor <570070+quantizor@users.noreply.github.com> * Rebuild website and fix vite build configuration - Add mkdirSync import and ensure docs directory exists before writing - Add resolve alias for #entities import in vite config - Rebuild site with updated library code Co-authored-by: quantizor <570070+quantizor@users.noreply.github.com> * Fix self-closing tags incorrectly incrementing nesting depth Self-closing tags like <div/> were incorrectly being counted as opening tags because the validation accepted '/' as a valid character after the tag name. Now self-closing tags are properly skipped when tracking nesting depth. Added test to verify self-closing tags don't break nesting detection. Co-authored-by: quantizor <570070+quantizor@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: quantizor <570070+quantizor@users.noreply.github.com>
1 parent 58010ce commit 7f724a6

8 files changed

Lines changed: 634 additions & 384 deletions

File tree

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
"markdown-to-jsx": patch
3+
---
4+
5+
Fix HTML block parsing for sibling elements like `<dt>`/`<dd>` without blank lines between them.
6+
7+
Type 6 HTML blocks (such as `<dl>`, `<dt>`, `<dd>`, `<table>`, `<tr>`, `<td>`) were incorrectly parsed when sibling elements appeared without blank lines between them—the first element would consume all subsequent siblings as its content instead of treating them as separate elements.
8+
9+
This fix adds nesting-aware closing tag detection that properly handles:
10+
- Nested elements with the same tag name (e.g., `<div><div></div></div>`)
11+
- Sibling elements at the same level (e.g., `<dt></dt><dd></dd>`)
12+
- CommonMark compliance for HTML blocks that should extend to blank lines
13+
14+
---
15+
16+
修复了没有空行分隔的兄弟 HTML 元素(如 `<dt>`/`<dd>`)的块解析问题。
17+
18+
类型 6 HTML 块(如 `<dl>``<dt>``<dd>``<table>``<tr>``<td>`)在兄弟元素之间没有空行时解析错误——第一个元素会将所有后续兄弟元素作为其内容,而不是将它们视为单独的元素。
19+
20+
此修复添加了具有嵌套感知的关闭标签检测,正确处理:
21+
- 同名标签的嵌套元素(例如 `<div><div></div></div>`
22+
- 同级的兄弟元素(例如 `<dt></dt><dd></dd>`
23+
- 应延续到空行的 HTML 块的 CommonMark 合规性
24+
25+
---
26+
27+
रिक्त पंक्तियों के बिना भाई HTML तत्वों (जैसे `<dt>`/`<dd>`) के लिए HTML ब्लॉक पार्सिंग को ठीक किया।
28+
29+
टाइप 6 HTML ब्लॉक (जैसे `<dl>`, `<dt>`, `<dd>`, `<table>`, `<tr>`, `<td>`) गलत तरीके से पार्स हो रहे थे जब भाई तत्व बिना रिक्त पंक्तियों के दिखाई देते थे—पहला तत्व सभी अनुवर्ती भाई तत्वों को अपनी सामग्री के रूप में शामिल कर लेता था, उन्हें अलग तत्वों के रूप में मानने के बजाय।
30+
31+
यह सुधार नेस्टिंग-जागरूक क्लोजिंग टैग पहचान जोड़ता है जो सही ढंग से संभालता है:
32+
- समान टैग नाम वाले नेस्टेड तत्व (उदाहरण: `<div><div></div></div>`)
33+
- समान स्तर पर भाई तत्व (उदाहरण: `<dt></dt><dd></dd>`)
34+
- HTML ब्लॉक के लिए CommonMark अनुपालन जो रिक्त पंक्तियों तक विस्तारित होने चाहिए

docs/assets/index-DD0j3d9W.js

Lines changed: 0 additions & 370 deletions
This file was deleted.

docs/assets/index-nU-AZOKo.js

Lines changed: 373 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
import{r as c,j as de}from"./index-DD0j3d9W.js";function me(a){for(var n=[`fn map(pos: vec3f) -> f32 {
1+
import{r as c,j as de}from"./index-nU-AZOKo.js";function me(a){for(var n=[`fn map(pos: vec3f) -> f32 {
22
let k = u.elasticity;
33
let p0 = particles[0u];
44
let delta0 = pos - p0.position.xyz;

docs/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343

4444
gtag('config', 'G-T8TWRSBM1V')
4545
</script>
46-
<script type="module" crossorigin src="/assets/index-DD0j3d9W.js"></script>
46+
<script type="module" crossorigin src="/assets/index-nU-AZOKo.js"></script>
4747
<link rel="stylesheet" crossorigin href="/assets/index-D6QVjcGy.css">
4848
</head>
4949

src/parse.spec.ts

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1514,6 +1514,102 @@ describe('HTML tags interrupting lists', () => {
15141514
})
15151515
})
15161516

1517+
describe('description list parsing', () => {
1518+
it('should parse dt/dd siblings correctly without blank lines between them', () => {
1519+
// Regression test for GitHub issue - dt/dd pairs should be parsed as siblings
1520+
const md = `<dl data-variant='horizontalTable'>
1521+
<dt>title 1</dt>
1522+
<dd>description 1</dd>
1523+
<dt>title 2</dt>
1524+
<dd>description 2</dd>
1525+
<dt>title 3</dt>
1526+
<dd>description 3</dd>
1527+
</dl>`
1528+
const result = p.parser(md)
1529+
1530+
// Should have a single dl element
1531+
expect(result.length).toBe(1)
1532+
expect(result[0].type).toBe(RuleType.htmlBlock)
1533+
1534+
const dl = result[0] as MarkdownToJSX.HTMLNode
1535+
expect(dl.tag).toBe('dl')
1536+
expect(dl.attrs['data-variant']).toBe('horizontalTable')
1537+
1538+
// Should have 6 children (3 dt + 3 dd)
1539+
expect(dl.children?.length).toBe(6)
1540+
1541+
// Verify each child is correctly parsed
1542+
const children = dl.children!
1543+
expect((children[0] as MarkdownToJSX.HTMLNode).tag).toBe('dt')
1544+
expect((children[1] as MarkdownToJSX.HTMLNode).tag).toBe('dd')
1545+
expect((children[2] as MarkdownToJSX.HTMLNode).tag).toBe('dt')
1546+
expect((children[3] as MarkdownToJSX.HTMLNode).tag).toBe('dd')
1547+
expect((children[4] as MarkdownToJSX.HTMLNode).tag).toBe('dt')
1548+
expect((children[5] as MarkdownToJSX.HTMLNode).tag).toBe('dd')
1549+
1550+
// Verify content is correctly extracted
1551+
const dt1 = children[0] as MarkdownToJSX.HTMLNode
1552+
expect(dt1.children?.length).toBe(1)
1553+
expect((dt1.children![0] as MarkdownToJSX.TextNode).text).toBe('title 1')
1554+
1555+
const dd1 = children[1] as MarkdownToJSX.HTMLNode
1556+
expect(dd1.children?.length).toBe(1)
1557+
expect((dd1.children![0] as MarkdownToJSX.TextNode).text).toBe(
1558+
'description 1'
1559+
)
1560+
})
1561+
1562+
it('should parse single dt tag correctly', () => {
1563+
const md = '<dt>title 1</dt>'
1564+
const result = p.parser(md)
1565+
1566+
expect(result.length).toBe(1)
1567+
expect(result[0].type).toBe(RuleType.htmlBlock)
1568+
1569+
const dt = result[0] as MarkdownToJSX.HTMLNode
1570+
expect(dt.tag).toBe('dt')
1571+
expect(dt.children?.length).toBe(1)
1572+
expect((dt.children![0] as MarkdownToJSX.TextNode).text).toBe('title 1')
1573+
})
1574+
1575+
it('should parse dt followed by dd on next line correctly', () => {
1576+
const md = `<dt>title 1</dt>
1577+
<dd>description 1</dd>`
1578+
const result = p.parser(md)
1579+
1580+
// Should have 2 separate elements
1581+
expect(result.length).toBe(2)
1582+
1583+
const dt = result[0] as MarkdownToJSX.HTMLNode
1584+
expect(dt.tag).toBe('dt')
1585+
expect(dt.children?.length).toBe(1)
1586+
expect((dt.children![0] as MarkdownToJSX.TextNode).text).toBe('title 1')
1587+
1588+
const dd = result[1] as MarkdownToJSX.HTMLNode
1589+
expect(dd.tag).toBe('dd')
1590+
expect(dd.children?.length).toBe(1)
1591+
expect((dd.children![0] as MarkdownToJSX.TextNode).text).toBe(
1592+
'description 1'
1593+
)
1594+
})
1595+
1596+
it('should handle self-closing tags without incorrectly incrementing nesting depth', () => {
1597+
// Self-closing tags like <div/> should not increment nesting depth
1598+
const md = `<div>
1599+
<div/>
1600+
<span>content</span>
1601+
</div>`
1602+
const result = p.parser(md)
1603+
1604+
// Should parse as a single div with proper nesting
1605+
expect(result.length).toBe(1)
1606+
expect(result[0].type).toBe(RuleType.htmlBlock)
1607+
1608+
const div = result[0] as MarkdownToJSX.HTMLNode
1609+
expect(div.tag).toBe('div')
1610+
})
1611+
})
1612+
15171613
describe('tables in lists', () => {
15181614
it('should parse tables within list items (regression test for issue #1)', () => {
15191615
const md = `- **Browser Stats**:

src/parse.ts

Lines changed: 122 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7682,18 +7682,73 @@ function parseHTML(
76827682
sourceLen
76837683
)
76847684

7685-
// For type 6 blocks, check if there's a closing tag (even beyond the blank line)
7686-
// If there is AND there's markdown syntax, extend to include the closing tag
7687-
// Exception: for JSX components, always extend to closing tag (proper parent-child nesting)
7685+
// For type 6 blocks, check if there's a closing tag before the blank line
7686+
// If found AND the next content is another HTML tag, stop at the closing tag
7687+
// This ensures proper nesting of sibling elements (e.g., <dt></dt><dd></dd>)
76887688
if (blockType === 'type6' && !tagResult.isClosing) {
76897689
// For JSX components, preserve case; for HTML, use lowercase
76907690
const tagNameForClosing = isJSXComponent
76917691
? tagResult.tagName
76927692
: tagResult.tagLower || tagResult.tagName.toLowerCase()
76937693
var closingTagPattern = '</' + tagNameForClosing
7694-
var closingIdx = source.indexOf(closingTagPattern, tagEnd)
7695-
if (closingIdx !== -1) {
7696-
// Found a closing tag
7694+
var openingTagPattern = '<' + tagNameForClosing
7695+
7696+
// Find the matching closing tag by tracking nesting depth
7697+
var searchPos = tagEnd
7698+
var depth = 1 // We already have one opening tag (depth starts at 1)
7699+
var closingIdx = -1
7700+
while (searchPos < blockEnd && depth > 0) {
7701+
var nextOpenIdx = source.indexOf(openingTagPattern, searchPos)
7702+
var nextCloseIdx = source.indexOf(closingTagPattern, searchPos)
7703+
7704+
// Validate and find next valid opening tag (followed by whitespace or >)
7705+
// Note: We don't accept / because that indicates a self-closing tag
7706+
while (nextOpenIdx !== -1 && nextOpenIdx < blockEnd) {
7707+
var afterOpenPos = nextOpenIdx + openingTagPattern.length
7708+
if (afterOpenPos >= sourceLen) {
7709+
nextOpenIdx = -1
7710+
break
7711+
}
7712+
var charAfterOpen = source[afterOpenPos]
7713+
if (
7714+
charAfterOpen === ' ' ||
7715+
charAfterOpen === '\t' ||
7716+
charAfterOpen === '\n' ||
7717+
charAfterOpen === '\r' ||
7718+
charAfterOpen === '>'
7719+
) {
7720+
break // Valid opening tag found
7721+
}
7722+
// Not valid (could be self-closing like <div/> or partial match), search for next
7723+
nextOpenIdx = source.indexOf(openingTagPattern, afterOpenPos)
7724+
}
7725+
7726+
if (nextOpenIdx === -1 || nextOpenIdx >= blockEnd) {
7727+
nextOpenIdx = blockEnd
7728+
}
7729+
if (nextCloseIdx === -1 || nextCloseIdx >= blockEnd) {
7730+
nextCloseIdx = blockEnd
7731+
}
7732+
7733+
if (nextOpenIdx < nextCloseIdx) {
7734+
// Found an opening tag first - increase depth
7735+
depth++
7736+
searchPos = nextOpenIdx + openingTagPattern.length
7737+
} else if (nextCloseIdx < blockEnd) {
7738+
// Found a closing tag first - decrease depth
7739+
depth--
7740+
if (depth === 0) {
7741+
closingIdx = nextCloseIdx
7742+
break
7743+
}
7744+
searchPos = nextCloseIdx + closingTagPattern.length
7745+
} else {
7746+
break // No more tags found
7747+
}
7748+
}
7749+
7750+
if (closingIdx !== -1 && closingIdx < blockEnd) {
7751+
// Found the matching closing tag before the blank line
76977752
// Check if it's valid
76987753
var afterClosingTag = closingIdx + closingTagPattern.length
76997754
while (
@@ -7704,16 +7759,72 @@ function parseHTML(
77047759
afterClosingTag++
77057760
}
77067761
if (afterClosingTag < sourceLen && source[afterClosingTag] === '>') {
7707-
// Valid closing tag found
7762+
// Valid closing tag found before blank line
7763+
// Check if the content immediately after the closing tag (after newline) starts with another HTML tag
7764+
var closingTagEndPos = afterClosingTag + 1
7765+
var nextContentPos = closingTagEndPos
7766+
// Skip to next line
7767+
while (
7768+
nextContentPos < sourceLen &&
7769+
source[nextContentPos] !== '\n'
7770+
) {
7771+
nextContentPos++
7772+
}
7773+
if (nextContentPos < sourceLen) {
7774+
nextContentPos++ // Skip the newline
7775+
}
7776+
// Skip leading whitespace on next line
7777+
while (
7778+
nextContentPos < sourceLen &&
7779+
(source[nextContentPos] === ' ' ||
7780+
source[nextContentPos] === '\t')
7781+
) {
7782+
nextContentPos++
7783+
}
7784+
// Check if next content is another HTML tag (that is NOT a closing tag for our current tag)
7785+
if (
7786+
nextContentPos < sourceLen &&
7787+
source[nextContentPos] === '<' &&
7788+
!util.startsWith(source.slice(nextContentPos), closingTagPattern)
7789+
) {
7790+
var nextTag = parseHTMLTag(source, nextContentPos)
7791+
if (nextTag) {
7792+
// Next content is a different HTML tag - stop at our closing tag
7793+
blockEnd = closingTagEndPos
7794+
}
7795+
}
7796+
// Otherwise, continue to blank line as per CommonMark
7797+
}
7798+
} else {
7799+
// No matching closing tag found before blank line
7800+
// Check if there's a closing tag after the blank line
7801+
closingIdx = source.indexOf(closingTagPattern, tagEnd)
7802+
if (closingIdx !== -1) {
7803+
// Closing tag found but after blank line
7804+
// Check if there's block content that would warrant extending to the closing tag
77087805
var extendedContent = source.slice(tagEnd, closingIdx)
7709-
// For JSX components, always extend to closing tag for proper nesting
7710-
// For HTML elements, only extend if there's block content
77117806
var shouldExtend =
77127807
isJSXComponent || hasBlockContent(extendedContent)
77137808
if (shouldExtend) {
77147809
// Extend block to include closing tag
7715-
var closingLineEnd = util.findLineEnd(source, afterClosingTag + 1)
7716-
blockEnd = closingLineEnd
7810+
var afterClosingTag2 = closingIdx + closingTagPattern.length
7811+
while (
7812+
afterClosingTag2 < sourceLen &&
7813+
(source[afterClosingTag2] === ' ' ||
7814+
source[afterClosingTag2] === '\t')
7815+
) {
7816+
afterClosingTag2++
7817+
}
7818+
if (
7819+
afterClosingTag2 < sourceLen &&
7820+
source[afterClosingTag2] === '>'
7821+
) {
7822+
var closingLineEnd = util.findLineEnd(
7823+
source,
7824+
afterClosingTag2 + 1
7825+
)
7826+
blockEnd = closingLineEnd
7827+
}
77177828
}
77187829
}
77197830
}

vite.config.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ import * as recast from 'recast'
55
import { createRequire } from 'module'
66
import tailwindcss from '@tailwindcss/vite'
77
import packageJson from './package.json'
8-
import { readFileSync, writeFileSync } from 'fs'
8+
import { readFileSync, writeFileSync, mkdirSync } from 'fs'
99
import { join } from 'path'
1010

1111
const require = createRequire(import.meta.url)
@@ -142,6 +142,7 @@ function copyLlmsTxtPlugin(): Plugin {
142142
var readmePath = join(process.cwd(), 'README.md')
143143
var outputPath = join(process.cwd(), 'docs', 'llms.txt')
144144
var readmeContent = readFileSync(readmePath, 'utf-8')
145+
mkdirSync(join(process.cwd(), 'docs'), { recursive: true })
145146
writeFileSync(outputPath, readmeContent, 'utf-8')
146147
},
147148
}
@@ -191,6 +192,11 @@ export default defineConfig({
191192
define: {
192193
VERSION: JSON.stringify(packageJson.version.split('.')[0]),
193194
},
195+
resolve: {
196+
alias: {
197+
'#entities': resolve(__dirname, 'src/entities.generated.ts'),
198+
},
199+
},
194200
root: '.',
195201
publicDir: 'public',
196202
build: {

0 commit comments

Comments
 (0)