When Dangerzone first encounters a file, it needs to detect its MIME type, so that it can choose the proper converter. The list of supported mime types (and the associated converters) is the following:
|
# .pdf |
|
"application/pdf": {"type": None}, |
|
# .docx |
|
"application/vnd.openxmlformats-officedocument.wordprocessingml.document": { |
|
"type": "libreoffice", |
|
"libreoffice_output_filter": "writer_pdf_Export", |
|
}, |
|
# .doc |
|
"application/msword": { |
|
"type": "libreoffice", |
|
"libreoffice_output_filter": "writer_pdf_Export", |
|
}, |
|
# .docm |
|
"application/vnd.ms-word.document.macroEnabled.12": { |
|
"type": "libreoffice", |
|
"libreoffice_output_filter": "writer_pdf_Export", |
|
}, |
|
# .xlsx |
|
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": { |
|
"type": "libreoffice", |
|
"libreoffice_output_filter": "calc_pdf_Export", |
|
}, |
|
# .xls |
|
"application/vnd.ms-excel": { |
|
"type": "libreoffice", |
|
"libreoffice_output_filter": "calc_pdf_Export", |
|
}, |
|
# .pptx |
|
"application/vnd.openxmlformats-officedocument.presentationml.presentation": { |
|
"type": "libreoffice", |
|
"libreoffice_output_filter": "impress_pdf_Export", |
|
}, |
|
# .ppt |
|
"application/vnd.ms-powerpoint": { |
|
"type": "libreoffice", |
|
"libreoffice_output_filter": "impress_pdf_Export", |
|
}, |
|
# .odt |
|
"application/vnd.oasis.opendocument.text": { |
|
"type": "libreoffice", |
|
"libreoffice_output_filter": "writer_pdf_Export", |
|
}, |
|
# .odg |
|
"application/vnd.oasis.opendocument.graphics": { |
|
"type": "libreoffice", |
|
"libreoffice_output_filter": "impress_pdf_Export", |
|
}, |
|
# .odp |
|
"application/vnd.oasis.opendocument.presentation": { |
|
"type": "libreoffice", |
|
"libreoffice_output_filter": "impress_pdf_Export", |
|
}, |
|
# .ops |
|
"application/vnd.oasis.opendocument.spreadsheet": { |
|
"type": "libreoffice", |
|
"libreoffice_output_filter": "calc_pdf_Export", |
|
}, |
|
# .jpg |
|
"image/jpeg": {"type": "convert"}, |
|
# .gif |
|
"image/gif": {"type": "convert"}, |
|
# .png |
|
"image/png": {"type": "convert"}, |
|
# .tif |
|
"image/tiff": {"type": "convert"}, |
|
"image/x-tiff": {"type": "convert"}, |
Using Dangerzone on a large set of files, we discovered that there are two MIME types that very common but are not supported:
application/zip
application/octet-stream
For instance, this file currently fails on Dangerzone: https://github.com/freedomofpress/dangerzone-test-set/blob/4cbf14ac31ac986ced60e83867aac8a6d2d4a81b/all_documents/HTMLImage.odt. For an association between MIME types and file extensions, you can see the following, taken from a list of 200 documents:
02_doc_macros_signed_by_attacker_manipulated.odt: application/zip
02_doc_signed_by_attacker_manipulated2.odt: application/zip
02_doc_signed_by_attacker_manipulated.odt: application/zip
02_doc_signed_by_attacker_manipulated_triple.odt: application/zip
02_doc_signed_by_trusted_person_manipulated.odt: application/zip
1_page.docx: application/octet-stream
82fff64a-0a21-4b09-bbdc-2914a5a150f0.odt: application/zip
BackgroundImageTest.odt: application/zip
CUSTOM.odt: application/zip
CVE-2003-0820-1.doc: application/octet-stream
CVE-2005-0941-1.doc: application/octet-stream
CVE-2006-2389-1.doc: application/octet-stream
CVE-2006-3059-1.xls: application/octet-stream
CVE-2006-3086-1.xls: application/octet-stream
CVE-2006-3493-1.doc: application/octet-stream
CVE-2006-3655-1.ppt: application/octet-stream
CVE-2006-3656-1.ppt: application/octet-stream
CVE-2006-3660-1.ppt: application/octet-stream
CVE-2006-5296-1.ppt: application/octet-stream
CVE-2006-6561-1.doc: application/octet-stream
CVE-2006-6628-1.doc: application/octet-stream
CVE-2007-0031-1.xls: application/octet-stream
CVE-2007-1347-1.doc: application/octet-stream
CVE-2007-3490-1.xls: application/octet-stream
CVE-2008-2752-1.doc: application/octet-stream
CVE-2008-2752-2.doc: application/octet-stream
CVE-2008-2752-3.doc: application/octet-stream
CVE-2008-2752-4.doc: application/octet-stream
CVE-2008-4841-1.doc: application/octet-stream
CVE-2009-0200-1.doc: application/octet-stream
CVE-2009-0201-1.doc: application/octet-stream
CVE-2009-0259-1.doc: application/octet-stream
CVE-2009-3129-1.xls: application/octet-stream
CVE-2009-3301-1.doc: application/octet-stream
CVE-2009-3302-1.doc: application/octet-stream
CVE-2009-3302-2.doc: application/octet-stream
CVE-2010-0033-1.ppt: application/octet-stream
CVE-2010-1245-1.xls: application/octet-stream
CVE-2010-1246-1.xls: application/octet-stream
CVE-2010-1248-1.xls: application/octet-stream
CVE-2010-3200-1.doc: application/octet-stream
CVE-2011-0105-1.xls: application/octet-stream
CVE-2011-0978-1.xls: application/octet-stream
CVE-2012-4233-1.odt: application/octet-stream
CVE-2012-4233-2.odg: application/octet-stream
CVE-2014-6356-1.doc: application/octet-stream
CVE-2014-6361.xls: application/octet-stream
EDB-18952-1.doc: application/octet-stream
HTMLImage.odt: application/zip
From this list, it's evident that application/octet-stream can refer to many file types. application/zip refers just to .odt, but we can't be definitely sure about that. Ideally then, if a file does not have a known MIME type and instead uses one of those two, we should also check the file extension.
When Dangerzone first encounters a file, it needs to detect its MIME type, so that it can choose the proper converter. The list of supported mime types (and the associated converters) is the following:
dangerzone/container/dangerzone.py
Lines 138 to 203 in a33dcfb
Using Dangerzone on a large set of files, we discovered that there are two MIME types that very common but are not supported:
For instance, this file currently fails on Dangerzone: https://github.com/freedomofpress/dangerzone-test-set/blob/4cbf14ac31ac986ced60e83867aac8a6d2d4a81b/all_documents/HTMLImage.odt. For an association between MIME types and file extensions, you can see the following, taken from a list of 200 documents:
From this list, it's evident that
application/octet-streamcan refer to many file types.application/ziprefers just to.odt, but we can't be definitely sure about that. Ideally then, if a file does not have a known MIME type and instead uses one of those two, we should also check the file extension.