Skip to content

DOCX reader skips tables missing <w:tblGrid> #11380

@vvalchev

Description

@vvalchev

Explain the problem.
I have a word document, where the tables were not read completely.
After carefully analyzing the file, I found out, that the reason for that is that <w:tbl> tag is missing <w:tblGrid> inside.

If I manually add the <w:tblGrid> tag, pandoc can read the document correctly.

I've also tracked that the issue has been introduced in pandoc 2.14:

$ ~/Downloads/pandoc-2.14/bin/pandoc -t gfm -f docx ~/Downloads/tblGrid.docx
||
||
||
||

$ ~/Downloads/pandoc-2.13/bin/pandoc -t gfm -f docx ~/Downloads/tblGrid.docx
|                    |            |              |
|--------------------|------------|--------------|
| **Question**       | **Answer** | **Comments** |
| Company legal name |            |              |

Note 1:
Opening the file with Open/Libre Office/Apple Pages and saving the file again allowed pandoc to read the document. I analyzed the differences and found out that it was the missing <w:tblGrid> tag.

Note 2:
It is possible that the original file was machine generated. I saw that the document.xml contains a comment:

Modified by docx4j 11.5.2 (Apache licensed) using REFERENCE JAXB in Amazon.com Inc. Java 17.0.13 on Linux

That can explain how the original document missed the tag. But anyway, the tag only sets the column widths, it shouldn't prevent pandoc from reading the content.

Pandoc version?
Pandoc Version: 3.8.3
OS: MacOS 26.1

Test document
tblGrid.docx

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions