[bug-57342] Excel compatible Zip64 implementation#154
[bug-57342] Excel compatible Zip64 implementation#154rzymek wants to merge 2 commits intoapache:trunkfrom
Conversation
For more information see https://github.com/rzymek/opczip
|
Can one of the admins verify this patch? |
src/ooxml/java/org/apache/poi/xssf/streaming/OpcOutputStream.java
Outdated
Show resolved
Hide resolved
|
thanks - merged with https://svn.apache.org/repos/asf/poi/trunk@1861196 |
…f Rzymkowski. This closes #154 git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1861196 13f79535-47bb-0310-9956-ffa450edef68
…f Rzymkowski. This closes apache#154 git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1861196 13f79535-47bb-0310-9956-ffa450edef68
|
@rzymek Could you give me an example because i try generate a large excel with 37000 rows and 2500 coluns and file still corrupted using apache 4.1.2 |
|
@rzymek my current code failed `import java.io.File; import org.apache.commons.compress.archivers.zip.Zip64Mode; public class TestExcel { }` |
|
Zip64Mode.AsNeeded is more correct - always may mean you use zip64 mode when you don't need it |
|
Are you getting "corrupted file" error from Excel or OpenOffice or something else? OpenOffice Calc has a limit of 1024 columns (Excel's limit is 16k columns). Other than that, the code looks ok. Zip64Mode needs to be Always in this case to enable ZIP64 handling compatible with Excel. |
|
@rzymek thanks for clarifying - do you know what effect setting Zip64Mode.Always has if you create a small spreadsheet - will this file cause problems for Excel? |
|
As far as I checked, Zip64Mode.Always does not cause problem with Excel even in small files. When it comes to Excel and big files (XML over 4Gb), then ZIP64 must be declared in the zip entry header before the actual zip entry contents. |
|
Thanks @rzymek - we might want to make Zip64Mode.Always the default - needs some experimentation before we'd make that change though |
|
Exactly. I think that custom zip64 implementation should sit as an option for a few versions (it's only enabled when Zip64Mode.Always). |
|
@rzymek I tested with Libre Office... now i tested with MS Excel and the problem was solved, Is a limitation of Libre office, with you told us? Tks a lot! |
|
@rzymek We recently had a user start a thread about the Zip64 support in POI
Commons-Compress added support for Zip64Mode.AlwaysWithCompatibility since this was added to POI. You may no longer have an interest in this topic but I'm wondering if you still have interest, would you be able to help in evaluating ZipArchiveOutputStream with Zip64Mode.AlwaysWithCompatibility ? |
We found out that it does not solve it since at least EXCEL-2016 can't open those files (although LibeOffice will open those, with all the new checks passing.)
This particular class seem to write "holes/unallocated blocks" as shown here https://bugs.documentfoundation.org/show_bug.cgi?id=163384#c5 |
https://bz.apache.org/bugzilla/show_bug.cgi?id=57342
I did an in depth analysis of this issue. Turns out the problem is not with the OOXML data generated by POI. The problem has to do with the ZIP format. Specifically with ZIP64 extension. That's why it's all OK up until sheet1.xml reaches over 4GB (uncompressed).
I have all the details written up in a blog post: https://rzymek.github.io/post/excel-zip64/
Short story: Excel will want to repair the file if uncompressed size of a zip entry exceeds 4GB and ZIP's Local File Header (LFH) does not specify zip spec version 4.5
This pull request uses custom (Excel compatible) Zip64 implementation when Zip64Mode is set to Always.