Skip to content

Trivial docx file fails to be parsed with 'couldn't parse docx file' error #5277

@danielrbrowne

Description

@danielrbrowne

Pandoc fails to parse the attached trivial docx file (generated from Microsoft Word Online). Microsoft Word for Mac v16.21, Word Online (i.e. part of Office 365 online) and Pages for Mac all open the file without reporting any errors.

Given how trivial the file's contents are, I would expect Pandoc to parse this file without a problem.

trivial.docx

Pandoc version: 2.6 (also reproducible with 1.15.0.6, which is an old version that is used by a codebase I work on)

pandoc 2.6
Compiled with pandoc-types 1.17.5.4, texmath 0.11.2, skylighting 0.7.5
Default user data directory: /Users/danbrowne/.pandoc
Copyright (C) 2006-2019 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.

Command used with v2.6:

pandoc --extract-media=/Users/admin/Documents --from=docx --to=html --email-obfuscation=none --standalone +RTS -K128m -RTS --wrap=none ~/Documents/trivial.docx

Command-line output (v2.6):

couldn't parse docx file

Command used with v1.15.0.6:

pandoc --extract-media=/Users/admin/Documents --from=docx --to=html --email-obfuscation=none --standalone +RTS -K128m -RTS --no-wrap ~/Documents/trivial.docx

Command-line output (v1.15.0.6):

pandoc: couldn't parse docx file

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions