Describe the bug
Looking at the size of the installer to save users disk space and to save ourselves time in CI (the size of the files included in the installer pushes CI time out for building the installer and testing the installer images)
The example_data files are a substantial portion of the disk footprint (20% of the sas directory):
- some of these files are in
sasview.git
- some of these files are from
sasdata.git
- even if the files remain in git, they might not need to be in the binary distributions (wheel, installer)
Specific files
example_data/coordinate_data/mag_cylinder.sld is the output of an example (src/sas/qtgui/Calculators/media/gsc_ex_magnetic_cylinder.rst), does it need to be shipped at all? If it is needed, can it be gzipped? (numpy.savetxt and numpy.readtxt support that transparently)
example_data/coordinate_data/*.vtk are the outputs of an example (src/sas/qtgui/Calculators/media/gsc_ex_magnetic_spheres.rst), do they need to be shipped at all? Moreover, there also copies of these files inside the documentation (meaning multiple copies are shipped in the binaries), so is it ok for just the versions in the documentation to be kept? Also, can they be compressed?
example_data/2d_data/BAM_2D.h5, this file isn't referenced anywhere and there's no description of what it is. It looks like it might be a nice example for demonstrating batch fitting? But with no info, can a user use this? Does it need to be in the installer?
example_data/1d_data/VTMA.h5, this file isn't referenced anywhere and there's no description of what it is. With no info, can a user use this? Does it need to be in the installer?
It's worth noting that compressing the data files won't make the installer smaller, just reduce the on-disk footprint. It might make creating the installer slightly faster for the IO-bound steps
SasView version (please complete the following information):
- Version:
release-6.1.0 branch of sasview; sasdata_0.10.0 branch of sasdata
Additional context
These are just the 7 biggest files shipped in example_data - I'm looking at them here because they are by far the biggest (they are 78% of that directory on their own), hence dealing with them makes a substantial impact on the total file sizes shipped.
Describe the bug
Looking at the size of the installer to save users disk space and to save ourselves time in CI (the size of the files included in the installer pushes CI time out for building the installer and testing the installer images)
The
example_datafiles are a substantial portion of the disk footprint (20% of thesasdirectory):sasview.gitsasdata.gitSpecific files
example_data/coordinate_data/mag_cylinder.sldis the output of an example (src/sas/qtgui/Calculators/media/gsc_ex_magnetic_cylinder.rst), does it need to be shipped at all? If it is needed, can it be gzipped? (numpy.savetxtandnumpy.readtxtsupport that transparently)example_data/coordinate_data/*.vtkare the outputs of an example (src/sas/qtgui/Calculators/media/gsc_ex_magnetic_spheres.rst), do they need to be shipped at all? Moreover, there also copies of these files inside the documentation (meaning multiple copies are shipped in the binaries), so is it ok for just the versions in the documentation to be kept? Also, can they be compressed?example_data/2d_data/BAM_2D.h5, this file isn't referenced anywhere and there's no description of what it is. It looks like it might be a nice example for demonstrating batch fitting? But with no info, can a user use this? Does it need to be in the installer?example_data/1d_data/VTMA.h5, this file isn't referenced anywhere and there's no description of what it is. With no info, can a user use this? Does it need to be in the installer?It's worth noting that compressing the data files won't make the installer smaller, just reduce the on-disk footprint. It might make creating the installer slightly faster for the IO-bound steps
SasView version (please complete the following information):
release-6.1.0branch of sasview;sasdata_0.10.0branch of sasdataAdditional context
These are just the 7 biggest files shipped in
example_data- I'm looking at them here because they are by far the biggest (they are 78% of that directory on their own), hence dealing with them makes a substantial impact on the total file sizes shipped.