Hi,
I noticed that the pandas.SparseDataFrame returned by Table.to_dataframe is not really sparse. For instance for the American Gut data:
In [15]: bm = load_table("deblur_125nt_no_blooms.biom")
In [16]: bm
Out[16]: 32954 x 9511 <class 'biom.table.Table'> with 1829490 nonzero entries (0% dense)
In [17]: tab = bm.to_dataframe()
In [19]: type(tab)
Out[19]: pandas.core.sparse.frame.SparseDataFrame
In [20]: tab.density
Out[20]: 1.0
In [21]: tab.info()
<class 'pandas.core.sparse.frame.SparseDataFrame'>
Index: 32954 entries, AACGTAGGGTGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGCAGGCGGAAGGCTAAGTCTGATGTGAAAGCCCGGGGCTCAACCCCGGTACTGCATTGGAAACTGGTCATCTAGAGTG to TACGGGGGATGCGAGCGTTATCCGGATTCATTGGGTTTAAAGGGTGCGCAGGCCGAGGTTCAAGTCAGCGGTGAAACCCCCGCGCTCAACGCGGGGCATGCCGTTGATACTGTATCTCTGGAGTA
Columns: 9511 entries, 10317.000012326 to 10317.000038478
dtypes: Sparse[float64, nan](9511)
memory usage: 2.3+ GB
This is basically the memory use of the full table including zeros. Also the densities of the original table and the SparseDataTable are pretty different (~0% vs 100%).
Hi,
I noticed that the
pandas.SparseDataFramereturned byTable.to_dataframeis not really sparse. For instance for the American Gut data:This is basically the memory use of the full table including zeros. Also the densities of the original table and the
SparseDataTableare pretty different (~0% vs 100%).