Implement parsing for iTunes metadata atoms#182
Conversation
also write a helper for bool
| /// is parsed. | ||
| #[derive(Debug, Default, Clone)] | ||
| pub struct UserdataBox { | ||
| pub meta: Option<MetadataBox> |
There was a problem hiding this comment.
While calling the udta box userdata in the MediaContext is probably a reasonable decision, I'm not sure if the field for the nested meta box should be named metadata, since "metadata" can be ambiguous between track metadata, and metadata tags relating to the media the file or stream contains. Since the name of the box itself is meta, and it's a relatively self-explanatory FourCC, keeping it as meta makes sense, but naming it metadata would also be more consistent.
If you have an opinion either way, would be happy to change it.
|
@kinetiknz could I get a review please? |
kinetiknz
left a comment
There was a problem hiding this comment.
Thanks for the PR! This generally looks good; a few nits and one important issue/query to resolve.
mp4parse/src/lib.rs
Outdated
| Ok(()) | ||
| } | ||
|
|
||
| fn read_bool_data<T: Read>(src: &mut BMFFBox<T>) -> Result<Option<bool>> { |
There was a problem hiding this comment.
Rename these new read_* additions to read_ilst_* or move them inside read_ilst to make it clear they're ILST-specific.
mp4parse/src/lib.rs
Outdated
| fn read_data<T: Read>(src: &mut BMFFBox<T>) -> Result<Vec<u8>> { | ||
| // Skip past the padding bytes | ||
| skip(&mut src.content, src.head.offset as usize)?; | ||
| let size = src.content.limit() as usize; |
There was a problem hiding this comment.
Remove extra space after =.
mp4parse/src/lib.rs
Outdated
| ) | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Drop double newlines here and below.
mp4parse/src/lib.rs
Outdated
| } | ||
|
|
||
|
|
||
| fn read_multiple_u8_data<T: Read>(src: &mut BMFFBox<T>) -> Result<Vec<Vec<u8>>> { |
There was a problem hiding this comment.
It'd be nice if we could make most of read_multiple_u8_data and read_u8_data common.
mp4parse/src/lib.rs
Outdated
| vec_push(&mut context.psshs, pssh)?; | ||
| } | ||
| BoxType::UserdataBox => { | ||
| let udta = read_udta(&mut b)?; |
There was a problem hiding this comment.
My primary concern with these additions is that the main user of mp4parse-rust (Gecko) is exposed to new forms of parse failure if invalid udta (or children) are encountered. We don't want to reject MP4s we previously accepted due to content Gecko isn't interested in.
I think a simple solution to this is to make MediaContext::userdata a Result<Option<Userdata>> and avoid bubbling an error result up from here instead. That does make the MediaContext API slightly odd, though.
There was a problem hiding this comment.
Instead of Result<Option<UserdataBox>>, I elected to make it Option<Result<UserdataBox>> instead, since Result<Option<_>> doesn't implement Default.
I considered just ignoring the error and returning None if udta parsing fails, so as to not introduce any weirdness into the MediaContext API, but that seems to me that it would conflate the lack of an udta atom with a corrupt udta atom.
Also make use of read_ilst_multiple_u8_data in read_ilst_u8_data
47f8d95 to
275b1e0
Compare
|
Thanks, looks good! |
With PR mozilla#182, mp4parse will now attempt to read metadata, including potential large items like cover art. With read_buf's previous limit of 1MB, this would be fairly trivial to hit with a large cover art image. This addresses a parsing failure reported in BMO 1609573. In the event that 10MB is still too restrictive, it probably makes sense to audit and split read_buf uses into small and large size classes.
With PR #182, mp4parse will now attempt to read metadata, including potential large items like cover art. With read_buf's previous limit of 1MB, this would be fairly trivial to hit with a large cover art image. This addresses a parsing failure reported in BMO 1609573. In the event that 10MB is still too restrictive, it probably makes sense to audit and split read_buf uses into small and large size classes.
Fixes #150.
Metadata in .MP4 files can be stored in either iTunes-style, or 3GPP-style formats. The most commonly found today is the iTunes-style format, which this Pull Request implements parsing for. This is useful for mp4parse to serve as a Rust library that can read these tags off MP4 audio and video files, and can be used as a foundation for a pure-Rust TagLib-like library, along with other metadata-parsing libraries available on Cargo.
iTunes metadata is stored within a Metadata
metaatom within a root-level User dataudtaatom. While thismetaatom may contain other atoms such askeys, we are mostly concerned with theilstList atom, which holds a list of various named atoms that represent metadata.Each metadata box or entry within
ilsthas a unique key and is parsed as a full box, within which adataatom that contains the actual data, in either a string or binaryu8format, is contained. Each metadata entry may contain only onedataatom, with the exception of the cover art albumcovr, which may have multipledataatoms that each contain JPEG or PNG data. With the exception of©lyrandldes, string metadata tags are limited to 256 characters in length, while binary metadata often have idiosyncratic formats that must be special-cased.Parsing for all if not most common tags have been implemented, with the exception of the iTunes specific iTMS
xxIDentries, due to lack of documentation on these tags.The
udtaandmetaboxes are not exposed in the C API, since Firefox does not have a current need to use these atoms. They are currently only exposed in the Rust API.This public domain picture of a black hole is embedded within
metadata.mp4andmetadata_gnre.mp4for testing thecovratom. The test files themselves are just copies ofminimal.mp4with a bunch of tags added to them.A huge thanks to the AtomicParsley project for documentation on the format of individual tag boxes.