Skip to content

MAINT: Optimize loadtxt usecols.#19618

Merged
charris merged 2 commits intonumpy:mainfrom
anntzer:loadtxtusecols
Aug 23, 2021
Merged

MAINT: Optimize loadtxt usecols.#19618
charris merged 2 commits intonumpy:mainfrom
anntzer:loadtxtusecols

Conversation

@anntzer
Copy link
Copy Markdown
Contributor

@anntzer anntzer commented Aug 5, 2021

7-10% speedup in usecols benchmarks; it appears that even in the
single-usecol case, avoiding the iteration over usecols more than
compensates the cost of the extra function call to usecols_getter.

       before           after         ratio
     [cc7f1504]       [649b0461]
     <main>           <loadtxtusecols>
-     6.96±0.03ms      6.46±0.03ms     0.93  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv(2)
-      11.5±0.1ms      10.4±0.04ms     0.90  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3, 5, 7])
-      9.39±0.1ms      8.47±0.05ms     0.90  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3])

Comment on lines +1008 to +1010
if usecols:
vals = [vals[j] for j in usecols]
if len(vals) != ncols:
vals = usecols_getter(vals)
elif len(vals) != ncols:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this change to an elif?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If usecols is not None, then we already know that usecols_getter will (by construction) return a list with the right number of elements. The performance gain is negligible, but it seems just good to skip the unneeded check.

@charris
Copy link
Copy Markdown
Member

charris commented Aug 11, 2021

@anntzer Needs rebase.

@anntzer
Copy link
Copy Markdown
Contributor Author

anntzer commented Aug 16, 2021

rebased

@charris
Copy link
Copy Markdown
Member

charris commented Aug 22, 2021

#19693 is in, everything needs a rebase :)

7-10% speedup in usecols benchmarks; it appears that even in the
single-usecol case, avoiding the iteration over `usecols` more than
compensates the cost of the extra function call to usecols_getter.
@charris charris merged commit 6d8eacd into numpy:main Aug 23, 2021
@charris
Copy link
Copy Markdown
Member

charris commented Aug 23, 2021

Thanks @anntzer .

@charris charris changed the title PERF: Optimize loadtxt usecols. MAINT: Optimize loadtxt usecols. Aug 23, 2021
@anntzer anntzer deleted the loadtxtusecols branch August 23, 2021 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants