Skip to content

[bug-57342] Excel compatible Zip64 implementation#154

Closed
rzymek wants to merge 2 commits intoapache:trunkfrom
rzymek:trunk
Closed

[bug-57342] Excel compatible Zip64 implementation#154
rzymek wants to merge 2 commits intoapache:trunkfrom
rzymek:trunk

Conversation

@rzymek
Copy link

@rzymek rzymek commented Jun 12, 2019

https://bz.apache.org/bugzilla/show_bug.cgi?id=57342

I did an in depth analysis of this issue. Turns out the problem is not with the OOXML data generated by POI. The problem has to do with the ZIP format. Specifically with ZIP64 extension. That's why it's all OK up until sheet1.xml reaches over 4GB (uncompressed).
I have all the details written up in a blog post: https://rzymek.github.io/post/excel-zip64/
Short story: Excel will want to repair the file if uncompressed size of a zip entry exceeds 4GB and ZIP's Local File Header (LFH) does not specify zip spec version 4.5

This pull request uses custom (Excel compatible) Zip64 implementation when Zip64Mode is set to Always.

@asfgit
Copy link

asfgit commented Jun 12, 2019

Can one of the admins verify this patch?

@asfgit asfgit closed this in cb64b73 Jun 12, 2019
@pjfanning
Copy link
Member

pjfanning commented Jun 12, 2019

thanks - merged with https://svn.apache.org/repos/asf/poi/trunk@1861196

asfgit pushed a commit that referenced this pull request Oct 6, 2019
Alain-Bearez pushed a commit to cuali/poi that referenced this pull request Dec 12, 2019
@bosofelipe
Copy link

@rzymek Could you give me an example because i try generate a large excel with 37000 rows and 2500 coluns and file still corrupted using apache 4.1.2

@bosofelipe
Copy link

bosofelipe commented May 29, 2020

@rzymek my current code failed

`import java.io.File;
import java.io.FileOutputStream;

import org.apache.commons.compress.archivers.zip.Zip64Mode;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;

public class TestExcel {

public static void main(String[] args) throws Exception {
	SXSSFWorkbook workbook = null;
	workbook = new SXSSFWorkbook(null, 10000, true, false);
	workbook.setZip64Mode(Zip64Mode.Always);
	int ROWS_COUNT = 37000;
	int COLS_COUNT = 2500;

	org.apache.poi.ss.usermodel.Sheet sheet = workbook.createSheet("1");
	for (int i = 1; i <= ROWS_COUNT; i++) {
		Row row = sheet.createRow(i);
		for (int j = 1; j <= COLS_COUNT; j++) {
			row.createCell(j).setCellValue("1");
			System.out.println(i + " " + j);
		}
	}

	FileOutputStream out = new FileOutputStream("~/Downloads/Excel.xlsx");
	workbook.write(out);
	out.close();
	workbook.close();
	workbook.dispose();

	File file = new File("~/Downloads/Excel.xlsx");
	System.out.println(file.length());

}

}`

@pjfanning
Copy link
Member

Zip64Mode.AsNeeded is more correct - always may mean you use zip64 mode when you don't need it

@rzymek
Copy link
Author

rzymek commented May 31, 2020

Are you getting "corrupted file" error from Excel or OpenOffice or something else? OpenOffice Calc has a limit of 1024 columns (Excel's limit is 16k columns). Other than that, the code looks ok.

Zip64Mode needs to be Always in this case to enable ZIP64 handling compatible with Excel.

@pjfanning
Copy link
Member

@rzymek thanks for clarifying - do you know what effect setting Zip64Mode.Always has if you create a small spreadsheet - will this file cause problems for Excel?

@rzymek
Copy link
Author

rzymek commented May 31, 2020

As far as I checked, Zip64Mode.Always does not cause problem with Excel even in small files. When it comes to Excel and big files (XML over 4Gb), then ZIP64 must be declared in the zip entry header before the actual zip entry contents.

@pjfanning
Copy link
Member

Thanks @rzymek - we might want to make Zip64Mode.Always the default - needs some experimentation before we'd make that change though

@rzymek
Copy link
Author

rzymek commented May 31, 2020

Exactly. I think that custom zip64 implementation should sit as an option for a few versions (it's only enabled when Zip64Mode.Always).

@bosofelipe
Copy link

@rzymek I tested with Libre Office... now i tested with MS Excel and the problem was solved, Is a limitation of Libre office, with you told us?

Tks a lot!

@pjfanning
Copy link
Member

@rzymek We recently had a user start a thread about the Zip64 support in POI

Commons-Compress added support for Zip64Mode.AlwaysWithCompatibility since this was added to POI.
At least for the reporter on the lists.apache.org thread, using Zip64Mode.AlwaysWithCompatibility with the standard ZipArchiveOutputStream sorted their problem. This bypasses the OpcZipArchiveOutputStream class that you wrote.

You may no longer have an interest in this topic but I'm wondering if you still have interest, would you be able to help in evaluating ZipArchiveOutputStream with Zip64Mode.AlwaysWithCompatibility ?

@manticore-projects
Copy link

Commons-Compress added support for Zip64Mode.AlwaysWithCompatibility since this was added to POI. At least for the reporter on the lists.apache.org thread, using Zip64Mode.AlwaysWithCompatibility with the standard ZipArchiveOutputStream sorted their problem.

We found out that it does not solve it since at least EXCEL-2016 can't open those files (although LibeOffice will open those, with all the new checks passing.)

This bypasses the OpcZipArchiveOutputStream class that you wrote.

This particular class seem to write "holes/unallocated blocks" as shown here https://bugs.documentfoundation.org/show_bug.cgi?id=163384#c5
If this problem could be solved we would have the perfect solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants