Skip to content

Determination of archive format #68

@eggyal

Description

@eggyal

I see that cached-path currently determines how to extract an archive according to its filename extension:

if resource.ends_with(".tar.gz") {
Ok(Self::TarGz)
} else if resource.ends_with(".zip") {
Ok(Self::Zip)
} else {
Err(Error::ExtractionError("unsupported archive format".into()))
}

The problem that I have is that some archives do not use the expected extension format (in my case, gzipped tarballs are using .tgz rather than .tar.gz). While this could be addressed by expanding/customising the extension list used by cached-path, perhaps it's also an opportunity to consider some alternative approaches:

  • HTTP headers (namely Content-Type and Content-Encoding);
  • detection "magic" as per (or via) the file(1) utility (there's also the magic and bindet crates—the former a wrapper around the libmagic C library and the latter not widely used, but both possibly useful here); or
  • a user-provided format specifier?

Personally I feel that HTTP headers would be best (if available: obviously not the case for local resources), perhaps falling-back to magic and/or file extensions if no other option is available.

Happy to submit a PR with whatever approach you feel is most suitable for this library, even if only adding .tgz to existing extension list?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions