Improve PDF / AI (Adobe Illustrator) recognition#396
Conversation
|
What I was going to do (but didn't get to due to school and time constraints..) was peek around.. 5mb of data at a time, check if the sequence for AIPrivateData is found, and if so, return. Of course this'd mean that we could peek over the entire file... So, also not an optimal thing, however thanks to Adobe being adobe.. we gotta do it.. |
Sounds good @vladfrangu.
Yeah the embedded metadata looks amazingly bad. Why bothering to put that in if it is always the same? |
I'm aware Also, pro tip: you can skip the entire first metadata part if you parse the header (it includes a length of bytes you can skip to get past the whole xml in pdf ordeal) |
Exactly.
Bingo, en then use |
|
Processing the "PDF blocks" is easier said then done. The COS ("Carousel" Object Structure) format which PDF is based on, requires semi text line oriented processing. Which is complex and tends to cross the binary format scope boundaries. |
|
Can you fix the merge conflict? |
|
Is there any progress on this? Any help needed? |
8020a8d to
ea97c62
Compare
Done. |
In line with what @vladfrangu suggested, it searched for
AIPrivateDatato detect.ai(Adobe Illustrator) format.Not a perfect solution:
AIPrivateDataappearing in the content.I removed the
fixture.aibecause I suspect it is truncated. I don't own Adobe Illustrator myself, so I could not test it with that.But it is probably does the job for most cases.
Fix #360