Skip to content

Binary pixel stream is detected as text #10

@pombredanne

Description

@pombredanne

This is originally from aboutcode-org/scancode-toolkit#50
In this archive https://rubygems.org/downloads/chunky_png-1.2.8.gem the file:
chunky_png-1.2.8.gem-extract/data.tar.gz-extract/spec/resources/pixelstream.rgb
is detected as text even though this is clearly a binary data stream.

libmagic file detects it as octet/stream or data which is correct (for instance with this libmagic ctypes binding: https://github.com/nexB/scancode-toolkit/blob/master/src/typecode/magic2.py )

The issue comes from the fact that bytes above 127 are treated as text, and the test file first 1024 bytes are FF. The original Perl binary detection looked for bytes below 127 only and did some extra stuffs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions