Skip to content

Incomplete MIME type support #369

@apyrgio

Description

@apyrgio

When Dangerzone first encounters a file, it needs to detect its MIME type, so that it can choose the proper converter. The list of supported mime types (and the associated converters) is the following:

# .pdf
"application/pdf": {"type": None},
# .docx
"application/vnd.openxmlformats-officedocument.wordprocessingml.document": {
"type": "libreoffice",
"libreoffice_output_filter": "writer_pdf_Export",
},
# .doc
"application/msword": {
"type": "libreoffice",
"libreoffice_output_filter": "writer_pdf_Export",
},
# .docm
"application/vnd.ms-word.document.macroEnabled.12": {
"type": "libreoffice",
"libreoffice_output_filter": "writer_pdf_Export",
},
# .xlsx
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": {
"type": "libreoffice",
"libreoffice_output_filter": "calc_pdf_Export",
},
# .xls
"application/vnd.ms-excel": {
"type": "libreoffice",
"libreoffice_output_filter": "calc_pdf_Export",
},
# .pptx
"application/vnd.openxmlformats-officedocument.presentationml.presentation": {
"type": "libreoffice",
"libreoffice_output_filter": "impress_pdf_Export",
},
# .ppt
"application/vnd.ms-powerpoint": {
"type": "libreoffice",
"libreoffice_output_filter": "impress_pdf_Export",
},
# .odt
"application/vnd.oasis.opendocument.text": {
"type": "libreoffice",
"libreoffice_output_filter": "writer_pdf_Export",
},
# .odg
"application/vnd.oasis.opendocument.graphics": {
"type": "libreoffice",
"libreoffice_output_filter": "impress_pdf_Export",
},
# .odp
"application/vnd.oasis.opendocument.presentation": {
"type": "libreoffice",
"libreoffice_output_filter": "impress_pdf_Export",
},
# .ops
"application/vnd.oasis.opendocument.spreadsheet": {
"type": "libreoffice",
"libreoffice_output_filter": "calc_pdf_Export",
},
# .jpg
"image/jpeg": {"type": "convert"},
# .gif
"image/gif": {"type": "convert"},
# .png
"image/png": {"type": "convert"},
# .tif
"image/tiff": {"type": "convert"},
"image/x-tiff": {"type": "convert"},

Using Dangerzone on a large set of files, we discovered that there are two MIME types that very common but are not supported:

application/zip
application/octet-stream

For instance, this file currently fails on Dangerzone: https://github.com/freedomofpress/dangerzone-test-set/blob/4cbf14ac31ac986ced60e83867aac8a6d2d4a81b/all_documents/HTMLImage.odt. For an association between MIME types and file extensions, you can see the following, taken from a list of 200 documents:

02_doc_macros_signed_by_attacker_manipulated.odt: application/zip                                                      
02_doc_signed_by_attacker_manipulated2.odt: application/zip                                                            
02_doc_signed_by_attacker_manipulated.odt: application/zip 
02_doc_signed_by_attacker_manipulated_triple.odt: application/zip                                                      
02_doc_signed_by_trusted_person_manipulated.odt: application/zip                                                       
1_page.docx: application/octet-stream                      
82fff64a-0a21-4b09-bbdc-2914a5a150f0.odt: application/zip                                                              
BackgroundImageTest.odt: application/zip                                                                               
CUSTOM.odt: application/zip                                                                                                                                                                                                                    
CVE-2003-0820-1.doc: application/octet-stream                                                                          
CVE-2005-0941-1.doc: application/octet-stream                                                                          
CVE-2006-2389-1.doc: application/octet-stream                                                                          
CVE-2006-3059-1.xls: application/octet-stream                                                                          
CVE-2006-3086-1.xls: application/octet-stream                                                                          
CVE-2006-3493-1.doc: application/octet-stream                                                                          
CVE-2006-3655-1.ppt: application/octet-stream                                                                          
CVE-2006-3656-1.ppt: application/octet-stream                                                                          
CVE-2006-3660-1.ppt: application/octet-stream                                                                          
CVE-2006-5296-1.ppt: application/octet-stream                                                                          
CVE-2006-6561-1.doc: application/octet-stream                                                                                                                                                                                                  
CVE-2006-6628-1.doc: application/octet-stream                                                                          
CVE-2007-0031-1.xls: application/octet-stream                                                                          
CVE-2007-1347-1.doc: application/octet-stream                                                                          
CVE-2007-3490-1.xls: application/octet-stream                                                                          
CVE-2008-2752-1.doc: application/octet-stream                                                                                                                                                                                                  
CVE-2008-2752-2.doc: application/octet-stream                                                                          
CVE-2008-2752-3.doc: application/octet-stream                                                                          
CVE-2008-2752-4.doc: application/octet-stream                                                                          
CVE-2008-4841-1.doc: application/octet-stream                                                                          
CVE-2009-0200-1.doc: application/octet-stream                                                                          
CVE-2009-0201-1.doc: application/octet-stream              
CVE-2009-0259-1.doc: application/octet-stream                                                                          
CVE-2009-3129-1.xls: application/octet-stream                                                                          
CVE-2009-3301-1.doc: application/octet-stream              
CVE-2009-3302-1.doc: application/octet-stream              
CVE-2009-3302-2.doc: application/octet-stream              
CVE-2010-0033-1.ppt: application/octet-stream              
CVE-2010-1245-1.xls: application/octet-stream              
CVE-2010-1246-1.xls: application/octet-stream                                                                          
CVE-2010-1248-1.xls: application/octet-stream                                                                          
CVE-2010-3200-1.doc: application/octet-stream                                                                          
CVE-2011-0105-1.xls: application/octet-stream              
CVE-2011-0978-1.xls: application/octet-stream                                                                          
CVE-2012-4233-1.odt: application/octet-stream                                                                          
CVE-2012-4233-2.odg: application/octet-stream              
CVE-2014-6356-1.doc: application/octet-stream              
CVE-2014-6361.xls: application/octet-stream                
EDB-18952-1.doc: application/octet-stream                                                                                                                                                                                                      
HTMLImage.odt: application/zip  

From this list, it's evident that application/octet-stream can refer to many file types. application/zip refers just to .odt, but we can't be definitely sure about that. Ideally then, if a file does not have a known MIME type and instead uses one of those two, we should also check the file extension.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions