Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Apr 23, 2018

Also refactor the type inference visitor and remove the superfluous separate SeqVisitor; improve inference visitor performance by 30%; and add a struct type inference benchmark.

Also refactor the type inference visitor, improve visitor performance by ~30%,
and a benchmark for struct type inference.
@pitrou pitrou force-pushed the ARROW-2074-infer-dict-lists branch from bfc1f3b to 3baa2ea Compare April 23, 2018 15:32
@pitrou
Copy link
Member Author

pitrou commented Apr 23, 2018

Benchmark numbers here:

  • before:
[100.00%] ··· Running convert_builtins.InferPyListToArray.time_infer                                                                                     ok
[100.00%] ···· 
               ============ =============
                   type                  
               ------------ -------------
                  int64       11.0±0.1ms 
                 float64     10.3±0.07ms 
                   bool      9.37±0.04ms 
                 decimal      297±0.9ms  
                  binary      14.9±0.2ms 
                  ascii       17.3±0.3ms 
                 unicode      29.7±0.8ms 
                int64 list    96.8±0.6ms 
               ============ =============
  • after:
[100.00%] ··· Running convert_builtins.InferPyListToArray.time_infer                                                                                     ok
[100.00%] ···· 
               ============ =============
                   type                  
               ------------ -------------
                  int64       7.41±0.2ms 
                 float64     6.68±0.04ms 
                   bool      5.75±0.01ms 
                 decimal      292±0.8ms  
                  binary      11.4±0.2ms 
                  ascii       14.1±0.3ms 
                 unicode      26.3±0.7ms 
                int64 list    74.8±0.6ms 
                  struct       70.7±4ms  
               ============ =============

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@xhochy xhochy closed this in 3d7a5a6 Apr 25, 2018
@pitrou pitrou deleted the ARROW-2074-infer-dict-lists branch April 25, 2018 23:02
pitrou added a commit that referenced this pull request May 1, 2018
Speeds up list to Arrow conversions by up to 15%. Also fixes a bug where creating a list array would not check that all input items are sequences.

Based on PR #1935.

Author: Antoine Pitrou <antoine@python.org>

Closes #1940 from pitrou/ARROW-2499-python-iteration-refactor and squashes the following commits:

ac31c6c <Antoine Pitrou> Fix Ndarray1DIndexer::is_strided (unused)
91c5af1 <Antoine Pitrou> Add TODO for performance issue
00cab9a <Antoine Pitrou> ARROW-2499:  Refactor Python iteration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants