Skip to content

Need to escape special characters in HTML when reading alt text for PDF documents #18133

@NSoiffer

Description

@NSoiffer

Summary of Issue

In #17276, I added support for alt text in a formula (a fallback when MathML is not detected) by wrapping inside of MathML's mtext tag. That text might have < and & characters which should be "escaped" but were not. Someone recently reported the bug to me.

The fix is trivial: < -> &lt; and & -> &amp;.

Note: if someone had HTML as the alt text, you would still want to do these replacements so that parsing the HTML turns into what looks like real HTML. E.g., <p>a</p> -> &lt;p>a&lt;/p> -> (after parsing the enclosing MathML) is <p>a</p>.

Steps to reproduce:

  1. Open alt.pdf in Adobe reader
  2. Open the speech panel
  3. Arrow to the first equation.

Actual behavior:

After arrowing, it will read it but generate

OSError: Invalid MathML input:
<math><mtext>LaTeX formula starts \begin {align} a&=b\\ 11&=22 \end {align} LaTeX formula ends </mtext></math>

Expected behavior:

It should speak, but not generate an error.

A bigger, different question is whether MathCAT should force the speech for punctuation characters. In the example, the alt text is LaTeX which includes \ , &, and braces. These are not spoken with default NVDA punctuation settings, so the LaTeX isn't really intelligible. That's not good. On the other hand, alt text in other situations (such as on an image) follows the punctuation settings in NVDA and I'm inclined to believe it should do the same here unless some code is written that can detect the start and end of LaTeX in alt text. In this case, there is text around the LaTeX, so any detection code would need to find the start/end and can't just assume that if LateX is present, it is the entire alt text.

NVDA logs, crash dumps and other attachments:

None

System configuration

NVDA installed/portable/running from source:

Installed, but should affect all versions

NVDA version:

2025.1 beta 5

Windows version:

Windows 10

Name and version of other software in use when reproducing the issue:

  • MathCAT addon
  • Adobe Reader

Other information about your system:

Other questions

Does the issue still occur after restarting your computer?

Yes

Have you tried any other versions of NVDA? If so, please report their behaviors.

N/A (PDF support for math added to beta)

If NVDA add-ons are disabled, is your problem still occurring?

N/A

Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions