There are http.DetectContentType([]byte) function in net/http package. But only limited number of types are supported. How to add support of docx, doc, xls, xlsx, ppt, pps, odt, ods, odp files not by extension, but by the content.
As far as I know, there are some problems, because docx/xlsx/pptx/odp/odt files has the same signature as the zip file (50 4B 03 04).
-
1golang.org/pkg/mimeSalvador Dali– Salvador Dali2015-04-24 03:27:15 +00:00Commented Apr 24, 2015 at 3:27
-
1@SalvadorDali The mime package is useful, but the question specifically asks about detection based on content, not extension.captncraig– captncraig2015-04-24 04:09:02 +00:00Commented Apr 24, 2015 at 4:09
4 Answers
Disclaimer: I'm the author of mimetype.
For anyone having the same problem 3 years later, nowadays the packages for mime type detection based on the content are the following:
-
- pure go, no c bindings
- can be extented to detect new mime types
- has issues with files which pass as more than one mime type (ex: xlsx and docx passing as zip) because it stores matching functions in a map, thus it does not guarantee the order of traversal
- limited number of detected mime types
-
- needs libmagic-dev installed
- of the 3, it has highest number of detected mime types
- can be extended, albeit harder...
man magic - libmagic is not thread safe
-
- pure go, no c bindings
- higher number of detected mime types than
filetype - is thread safe
- can be extended
Comments
For files with x at the end are relatively easy to detect. Just unzip it and read .rels/_rels file. It contains path to the main file in document. It denoted by namespace http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument. Just check its name. It's document.xml for docx, workbook.xml for xlsx and presentation.xml for pptx.
More info here can be found here ECMA-376.
Binary formats harder to detect. Basically you need to read MS-CFB filesystem and check for entries:
WordDocumentfor docWorkbookorBookfor xlsPowerPoint Documentfor pptEncryptedPackagemeans file is encrypted.
1 Comment
There's currently no way to extend http.DetectContentType as it uses a fixed, unexported slice of "sniffers": https://golang.org/src/net/http/sniff.go (sniffSignatures on line 49 at the time of writing).
Also, I looked quickly through godoc.org in search of a better package but didn't find any that is extensible and content-oriented as you require.
My advice would be: build your own package, guided by Go's content sniffer implementation (which follows https://mimesniff.spec.whatwg.org/).
Edit: If you're willing to use CGO and you're on nix, you could use libmagic bindings like for example https://github.com/jteeuwen/magic.