Skip to content

Major performance issue when parsing a long list of reference links #996

Description

@RomanHotsiy

We noticed a major performance problem when parsing a long list of references similar to this benchmark: https://github.com/markdown-it/markdown-it/blob/master/benchmark/samples/block-ref-list.md

In our case we have a list of 1000+ references.

The root cause seems to be this termination logic:

const terminatorRules = state.md.block.ruler.getRules('reference')
const oldParentType = state.parentType
state.parentType = 'reference'
for (; nextLine < endLine && !state.isEmpty(nextLine); nextLine++) {
// this would be a code block normally, but after paragraph
// it's considered a lazy continuation regardless of what's there
if (state.sCount[nextLine] - state.blkIndent > 3) { continue }
// quirk for blockquotes, this line should already be checked by that rule
if (state.sCount[nextLine] < 0) { continue }
// Some tags can terminate paragraph without empty line.
let terminate = false
for (let i = 0, l = terminatorRules.length; i < l; i++) {
if (terminatorRules[i](state, nextLine, endLine, true)) {
terminate = true
break
}
}
if (terminate) { break }
}

Removing this logic doesn't break any tests and improves speed of parsing our long list 30x 🙀

I tried to find some similar problems and found this thread: #54

I believe this table is incorrect but I'm not sure:

// First 2 params - rule name & source. Secondary array - list of rules,
// which can be terminated by this one.
[ 'table', require('./rules_block/table'), [ 'paragraph', 'reference' ] ],
[ 'code', require('./rules_block/code') ],
[ 'fence', require('./rules_block/fence'), [ 'paragraph', 'reference', 'blockquote', 'list' ] ],
[ 'blockquote', require('./rules_block/blockquote'), [ 'paragraph', 'reference', 'blockquote', 'list' ] ],
[ 'hr', require('./rules_block/hr'), [ 'paragraph', 'reference', 'blockquote', 'list' ] ],
[ 'list', require('./rules_block/list'), [ 'paragraph', 'reference', 'blockquote' ] ],
[ 'reference', require('./rules_block/reference') ],
[ 'html_block', require('./rules_block/html_block'), [ 'paragraph', 'reference', 'blockquote' ] ],
[ 'heading', require('./rules_block/heading'), [ 'paragraph', 'reference', 'blockquote' ] ],
[ 'lheading', require('./rules_block/lheading') ],
[ 'paragraph', require('./rules_block/paragraph') ]

From the CommonMark spec I can't see that reference can be terminated by other rules and it's the other way around actually - the reference can terminate some of the rules. Am I correct?

I tried modifying the code above to the variant below and all the tests are passing performance is still fast:

const _rules = [
  // First 2 params - rule name & source. Secondary array - list of rules,
  // which can be terminated by this one.
  ['table',      r_table,      ['paragraph']],
  ['code',       r_code],
  ['fence',      r_fence,      ['paragraph', 'blockquote', 'list']],
  ['blockquote', r_blockquote, ['paragraph', 'blockquote', 'list']],
  ['hr',         r_hr,         ['paragraph', 'blockquote', 'list']],
  ['list',       r_list,       ['paragraph', 'blockquote']],
  ['reference',  r_reference, ['table', 'fence', 'blockquote', 'hr', 'list', 'html_block', 'heading']],
  ['html_block', r_html_block, ['paragraph', 'blockquote']],
  ['heading',    r_heading,    ['paragraph', 'blockquote']],
  ['lheading',   r_lheading],
  ['paragraph',  r_paragraph]
]

Could someone check if my understanding is correct? I would be happy to open a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions