Skip to content

Disk footprint of example_data files is excessively large - a quick survey #3383

@llimeht

Description

@llimeht

Describe the bug
Looking at the size of the installer to save users disk space and to save ourselves time in CI (the size of the files included in the installer pushes CI time out for building the installer and testing the installer images)

The example_data files are a substantial portion of the disk footprint (20% of the sas directory):

  • some of these files are in sasview.git
  • some of these files are from sasdata.git
  • even if the files remain in git, they might not need to be in the binary distributions (wheel, installer)

Specific files

  • example_data/coordinate_data/mag_cylinder.sld is the output of an example (src/sas/qtgui/Calculators/media/gsc_ex_magnetic_cylinder.rst), does it need to be shipped at all? If it is needed, can it be gzipped? (numpy.savetxt and numpy.readtxt support that transparently)
  • example_data/coordinate_data/*.vtk are the outputs of an example (src/sas/qtgui/Calculators/media/gsc_ex_magnetic_spheres.rst), do they need to be shipped at all? Moreover, there also copies of these files inside the documentation (meaning multiple copies are shipped in the binaries), so is it ok for just the versions in the documentation to be kept? Also, can they be compressed?
  • example_data/2d_data/BAM_2D.h5, this file isn't referenced anywhere and there's no description of what it is. It looks like it might be a nice example for demonstrating batch fitting? But with no info, can a user use this? Does it need to be in the installer?
  • example_data/1d_data/VTMA.h5, this file isn't referenced anywhere and there's no description of what it is. With no info, can a user use this? Does it need to be in the installer?

It's worth noting that compressing the data files won't make the installer smaller, just reduce the on-disk footprint. It might make creating the installer slightly faster for the IO-bound steps

SasView version (please complete the following information):

  • Version: release-6.1.0 branch of sasview; sasdata_0.10.0 branch of sasdata

Additional context
These are just the 7 biggest files shipped in example_data - I'm looking at them here because they are by far the biggest (they are 78% of that directory on their own), hence dealing with them makes a substantial impact on the total file sizes shipped.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions