-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Explain the problem.
I have a word document, where the tables were not read completely.
After carefully analyzing the file, I found out, that the reason for that is that <w:tbl> tag is missing <w:tblGrid> inside.
If I manually add the <w:tblGrid> tag, pandoc can read the document correctly.
I've also tracked that the issue has been introduced in pandoc 2.14:
$ ~/Downloads/pandoc-2.14/bin/pandoc -t gfm -f docx ~/Downloads/tblGrid.docx
||
||
||
||
$ ~/Downloads/pandoc-2.13/bin/pandoc -t gfm -f docx ~/Downloads/tblGrid.docx
| | | |
|--------------------|------------|--------------|
| **Question** | **Answer** | **Comments** |
| Company legal name | | |
Note 1:
Opening the file with Open/Libre Office/Apple Pages and saving the file again allowed pandoc to read the document. I analyzed the differences and found out that it was the missing <w:tblGrid> tag.
Note 2:
It is possible that the original file was machine generated. I saw that the document.xml contains a comment:
Modified by docx4j 11.5.2 (Apache licensed) using REFERENCE JAXB in Amazon.com Inc. Java 17.0.13 on Linux
That can explain how the original document missed the tag. But anyway, the tag only sets the column widths, it shouldn't prevent pandoc from reading the content.
Pandoc version?
Pandoc Version: 3.8.3
OS: MacOS 26.1
Test document
tblGrid.docx