ARROW-11989: [C++][Python] Improve ChunkedArray's complexity for the access of elements #12055

edponce · 2021-12-30T22:40:36Z

Improves search time for finding a chunk in ChunkedArray using a binary search, O(log n) for random access.
Chunks are searched when invoking GetScalar() (C++) and index operator (Python).

github-actions · 2021-12-30T22:40:54Z

https://issues.apache.org/jira/browse/ARROW-11989

edponce · 2021-12-30T22:46:15Z

Here are some preliminary measurements showing an improvement for a particular example (1024 chunks each with 1024 values and using an incremental search).

The following are still missing in this PR:

Update the internal data structure for tracking chunk indices during slicing operations
Gather measurements for examples with other behaviors (few chunks, random access)

pitrou · 2022-01-04T16:01:03Z

I think we should simply factor out and reuse the chunk resolver from arrow/compute/kernels/chunked_internal.h

edponce · 2022-01-04T16:12:08Z

@pitrou Thanks! I had no idea of the existence of ChunkResolver.

pitrou · 2022-01-04T16:12:51Z

Sorry. I should have mentioned it on the JIRA.

pitrou · 2022-01-27T16:01:02Z

@edponce Are you willing to push this forward?

eelxpeng · 2022-02-25T07:15:23Z

@edponce @pitrou Are you able to complete this pr soon? This issue is bothering us significantly.

edponce · 2022-02-25T15:55:45Z

Hi @eelxpeng, I will work on this today. Apologies for the delay.

eelxpeng · 2022-03-06T00:39:58Z

Verified the speed improvement on the example provided above. Thanks for the good work! @edponce

I have one question, does the change revises the arrow file and saves necessary information in the file? Or is it simply does not touch the arrow file structure but just resolve the index with more efficient search? I asked this because I experienced longer access time for existing large multi-chunk arrow file with the change.

pitrou

Thanks for the update @edponce, here are some comments.

cpp/src/arrow/chunk_resolver.h

python/pyarrow/table.pxi

pitrou · 2022-03-10T18:51:13Z

@eelxpeng

Verified the speed improvement on the example provided above. Thanks for the good work!

Can you post the speed numbers you got?

I have one question, does the change revises the arrow file and saves necessary information in the file?

No, it doesn't. It's all in memory.

pitrou · 2022-03-10T18:51:49Z

@ursabot please benchmark lang=C++

ursabot · 2022-03-10T18:51:52Z

Benchmark runs are scheduled for baseline = 93b192c and contender = 95a3d30e5c78cea6ea40748e268ceb270b3092a8. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['Python'] langs are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.0% ⬆️0.0% ⚠️ Contender and baseline run contexts do not match] test-mac-arm
[Skipped ⚠️ Only ['JavaScript', 'Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x
[Finished ⬇️3.74% ⬆️0.09%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

cpp/src/arrow/chunked_array.cc

edponce · 2022-03-12T07:45:31Z

@ursabot please benchmark lang=C++

ursabot · 2022-03-12T07:45:35Z

Benchmark runs are scheduled for baseline = 5592cf7 and contender = a711336db6a3095a4be320adceaf147826c2bdaf. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['Python'] langs are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.8% ⬆️0.54%] test-mac-arm
[Skipped ⚠️ Only ['JavaScript', 'Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x
[Finished ⬇️0.51% ⬆️0.17%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

cpp/src/arrow/compute/kernels/chunked_internal.h

edponce · 2022-03-14T07:32:15Z

There are 2 main solutions for the type of chunk_resolver_ in ChunkedArray:

Using a std::unique_ptr for lazy initialization (valid approach bc ChunkedArray is not copyable/assignable). Drawbacks are:
- the indirect access in the critical path
Using a ChunkResolver instance. Drawbacks are:
- requires defining a default constructor which allows creating empty ChunkResolver instances which are not that usable
- requires defining a move assignment
- requires defining a default copy constructor
- making offsets_ non-const bc assignment is necessary for lazy evaluation
- a boolean ChunkedArray member variable to track lazy initialization

Based on this info, I favor the std::unique_ptr solution.

edponce · 2022-04-13T21:38:56Z

@pitrou This PR is ready for final reviews. Thanks for all the great reviews.

pitrou · 2022-04-14T07:01:21Z

@ursabot please benchmark lang=C++

ursabot · 2022-04-14T07:01:29Z

Benchmark runs are scheduled for baseline = 5d5cceb and contender = 76b25ff. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['Python'] langs are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Finished ⬇️3.01% ⬆️0.46%] test-mac-arm
[Skipped ⚠️ Only ['JavaScript', 'Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x
[Finished ⬇️4.04% ⬆️0.43%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/491| 76b25ffa test-mac-arm>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/501| 76b25ffa ursa-thinkcentre-m75q>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/504| 5d5ccebe ec2-t3-xlarge-us-east-2>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/490| 5d5ccebe test-mac-arm>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/488| 5d5ccebe ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/500| 5d5ccebe ursa-thinkcentre-m75q>
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

pitrou

+1, will just wait for benchmark results

ursabot · 2022-04-15T11:51:05Z

Benchmark runs are scheduled for baseline = 7dc8683 and contender = 4e49aa8. 4e49aa8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️2.93% ⬆️0.08%] test-mac-arm
[Failed ⬇️1.43% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️5.44% ⬆️0.6%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/511| 4e49aa8e ec2-t3-xlarge-us-east-2>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/498| 4e49aa8e test-mac-arm>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/497| 4e49aa8e ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/508| 4e49aa8e ursa-thinkcentre-m75q>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/510| 7dc86837 ec2-t3-xlarge-us-east-2>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/497| 7dc86837 test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/496| 7dc86837 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/507| 7dc86837 ursa-thinkcentre-m75q>
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

eelxpeng · 2022-05-01T06:35:56Z

Not sure about why. I have a large arrow file with 115545630 rows and 488 chunks. Even using the fix in this pr, the time to randomly access 1000 rows is 2102 seconds. While if I re-process the arrow file to make it 115545630 rows and 1 chunk, the time is 72 seconds. The difference is huge.

edponce · 2022-05-01T16:32:01Z

Hi @eelxpeng, the implementation in this PR caches the previous chunk used, so it is expected to be faster when using 1 chunk because it results in perfect caching. Randomly accessed chunks are searched using a binary search and the chunk caching operations would only add a very small penalty instead.

How do your results compare to Arrow before this patch got merged?

edponce · 2022-05-01T16:48:03Z

I can set up benchmarks this week to investigate further performance with varying chunk sizes.

pitrou · 2022-05-01T18:02:05Z

Not sure about why. I have a large arrow file with 115545630 rows and 488 chunks. Even using the fix in this pr, the time to randomly access 1000 rows is 2102 seconds

This sounds really huge and unrelated to this PR. Are you accessing the file over local storage or a remote filesystem?

eelxpeng · 2022-05-01T18:05:22Z

@edponce with previous Arrow version (7.0.0) before this patch, the time is 1510 seconds for the 488 chunks file.

eelxpeng · 2022-05-01T18:07:18Z

@pitrou How does it not related to this PR? In this case, 1 chunk is significantly faster than 488 chunks. I'm accessing via efs (https://aws.amazon.com/efs/).

pitrou · 2022-05-01T18:08:48Z

It's not related because the runtimes are incompatible with the CPU time needed for calculating chunk indices. That said, it's worth opening an issue for this.

github-actions bot added Component: C++ Component: Python labels Dec 30, 2021

edponce force-pushed the ARROW-11989-Improve-ChunkedArrays-complexity-for-the branch from 7eec701 to afd4022 Compare February 27, 2022 03:49

edponce marked this pull request as ready for review February 28, 2022 08:50

edponce force-pushed the ARROW-11989-Improve-ChunkedArrays-complexity-for-the branch 3 times, most recently from 45167f3 to 95a3d30 Compare March 3, 2022 05:20

pitrou reviewed Mar 10, 2022

View reviewed changes

cpp/src/arrow/chunk_resolver.h Outdated Show resolved Hide resolved

cpp/src/arrow/chunk_resolver.h Outdated Show resolved Hide resolved

python/pyarrow/table.pxi Outdated Show resolved Hide resolved

edponce force-pushed the ARROW-11989-Improve-ChunkedArrays-complexity-for-the branch from 95a3d30 to adaf18b Compare March 12, 2022 05:40

edponce commented Mar 12, 2022

View reviewed changes

cpp/src/arrow/chunked_array.cc Outdated Show resolved Hide resolved

edponce force-pushed the ARROW-11989-Improve-ChunkedArrays-complexity-for-the branch from b137a83 to a711336 Compare March 12, 2022 07:44

edponce force-pushed the ARROW-11989-Improve-ChunkedArrays-complexity-for-the branch from a711336 to 7ac8aef Compare March 13, 2022 05:25

edponce commented Mar 13, 2022

View reviewed changes

cpp/src/arrow/compute/kernels/chunked_internal.h Outdated Show resolved Hide resolved

edponce force-pushed the ARROW-11989-Improve-ChunkedArrays-complexity-for-the branch from 7ac8aef to dda5d95 Compare March 14, 2022 06:57

edponce force-pushed the ARROW-11989-Improve-ChunkedArrays-complexity-for-the branch from dda5d95 to 5f709cc Compare March 14, 2022 08:20

edponce added 8 commits April 13, 2022 12:43

fully allocate offsets during construction

7126456

simplify ChunkResolver construction

bb4a252

add atomic and shared_ptr

458ab21

Update error message and slice checks

37c2892

remove constexpr

0b43f29

Initialize ChunkResolver during construction

b37332a

use list initialization

ced0b1a

use brace list initialization, add comment on hacky copy constructor

6515bdc

edponce force-pushed the ARROW-11989-Improve-ChunkedArrays-complexity-for-the branch from b68ea2a to 6515bdc Compare April 13, 2022 18:20

edponce added 3 commits April 13, 2022 16:11

add move/assigment constructor to ChunkResolver

5575290

moved copy constructor to ChunkedArrayResolver

a0ab9ff

document member variables

69093c1

edponce force-pushed the ARROW-11989-Improve-ChunkedArrays-complexity-for-the branch from beeafb7 to 9f7d8a6 Compare April 13, 2022 22:08

IWYU

76b25ff

edponce force-pushed the ARROW-11989-Improve-ChunkedArrays-complexity-for-the branch from 9f7d8a6 to 76b25ff Compare April 13, 2022 23:09

Fix move assignment operator argument constness, make inline

5d2922c

pitrou approved these changes Apr 14, 2022

View reviewed changes

pitrou closed this in 4e49aa8 Apr 14, 2022

wjones127 mentioned this pull request Sep 20, 2022

ARROW-9773: [C++] Implement Take kernel for ChunkedArray #13857

Closed

ARROW-11989: [C++][Python] Improve ChunkedArray's complexity for the access of elements #12055

ARROW-11989: [C++][Python] Improve ChunkedArray's complexity for the access of elements #12055

Uh oh!

Conversation

edponce commented Dec 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 30, 2021

Uh oh!

edponce commented Dec 30, 2021

Uh oh!

pitrou commented Jan 4, 2022

Uh oh!

edponce commented Jan 4, 2022

Uh oh!

pitrou commented Jan 4, 2022

Uh oh!

pitrou commented Jan 27, 2022

Uh oh!

eelxpeng commented Feb 25, 2022

Uh oh!

edponce commented Feb 25, 2022

Uh oh!

eelxpeng commented Mar 6, 2022

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pitrou commented Mar 10, 2022

Uh oh!

pitrou commented Mar 10, 2022

Uh oh!

ursabot commented Mar 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

edponce commented Mar 12, 2022

Uh oh!

ursabot commented Mar 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

edponce commented Mar 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edponce commented Apr 13, 2022

Uh oh!

pitrou commented Apr 14, 2022

Uh oh!

ursabot commented Apr 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

ursabot commented Apr 15, 2022

Uh oh!

eelxpeng commented May 1, 2022

Uh oh!

edponce commented May 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edponce commented May 1, 2022

Uh oh!

pitrou commented May 1, 2022

Uh oh!

eelxpeng commented May 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eelxpeng commented May 1, 2022

Uh oh!

pitrou commented May 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

edponce commented Dec 30, 2021 •

edited

Loading

ursabot commented Mar 10, 2022 •

edited

Loading

ursabot commented Mar 12, 2022 •

edited

Loading

edponce commented Mar 14, 2022 •

edited

Loading

ursabot commented Apr 14, 2022 •

edited

Loading

edponce commented May 1, 2022 •

edited

Loading

eelxpeng commented May 1, 2022 •

edited

Loading