ARROW-1572: [C++] Implement "value counts" kernels for tabulating value frequencies #1970

andrioni · 2018-05-01T01:49:42Z

From the tests it seems to work, although I'm still learning how everything works. The implementation is based on how the other hash kernels work.

…subprocess This enables this test to pass in an in-place build without running `setup.py develop` Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1640 from wesm/ARROW-2131 and squashes the following commits: 460ded1 [Wes McKinney] Use more idiomatic option for getting module parent directory faa247a [Wes McKinney] Prepend development module path when spawning subprocess in test_serialization.py

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1638 from wesm/ARROW-2180 and squashes the following commits: a13931c [Wes McKinney] Remove deprecated C++ APIs from 0.8.0 cycle

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1636 from wesm/ARROW-2132 and squashes the following commits: dbbef5d [Wes McKinney] Add link to Plasma in main README, tweak language

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1641 from wesm/ARROW-2069 and squashes the following commits: e66d1b6 [Wes McKinney] Add note that Plasma is not supported on Windows

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1639 from wesm/ARROW-2185 and squashes the following commits: adf038d Fix regex 75c3b17 Do not eat closing braces baa7e2c Strip CI directives from commit messages when preparing squashed commit message

…re brackets Change-Id: Iad77d14c9116c30249e3e7f13ed40ed82e579832

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1637 from wesm/ARROW-2093 and squashes the following commits: e3040a9 <Wes McKinney> Do not install PyTorch in Travis CI

Author: Brian Hulette <brian.hulette@ccri.com> Closes apache#1647 from TheNeuralBit/js-doc-site and squashes the following commits: e5ca4df [Brian Hulette] Add JS docs to website

Original question: apache#1510 Improvement story: https://issues.apache.org/jira/browse/ARROW-2066 Author: rrussell <rrussell@adobe.com> Closes apache#1544 from rjrussell77/arrow-2066-docs-azure-parquet and squashes the following commits: 0d3972c <rrussell> Add missing byte_stream declaration/assignment a5addb0 <rrussell> use more common 'df' instead of 'pd' for pandas dataframe variable, remove head() call and instead use comment to indicate generic fill-in code, add comment re: stream closure in finally block f056888 <rrussell> Clean up white space 1fe9866 <rrussell> Add try/except/finally blocks to ensure closure of the byte stream 36f7378 <rrussell> Replace usage of tempfile buffer with BytesIO stream 654a6f9 <rrussell> Add back original Notes bullets 5d450fc <rrussell> fix 4770de1 <rrussell> fix 4c75824 <rrussell> Try moving the bullet to remove italics 803cbca <rrussell> Use asterisks for list 051b91d <rrussell> Fix formatting a015deb <rrussell> Fix formatting 1815816 <rrussell> fix formatting 599e04f <rrussell> Fix formatting 34c5a16 <rrussell> Fix formatting 6fd9f70 <rrussell> remove inline edits 83a38c4 <rrussell> Try to fix italics f130e04 <rrussell> Change wording a bit 718bd94 <rrussell> Fix unintended italics 7bab640 <rrussell> Refine indented bullet and fix title underline 26a53e4 <rrussell> Fix formatting 5fbea89 <rrussell> Add a note about keys and add polish 5365a9c <rrussell> Add helpful notes about Azure properties 6841116 <rrussell> Polish the formatting eb643e4 <rrussell> ARROW-2066 Add documentation for Arrow/Azure/Parquet solution

Author: Antoine Pitrou <antoine@python.org> Closes apache#1644 from pitrou/ARROW-2197-doc-undefined-symbol and squashes the following commits: 99db6e0 <Antoine Pitrou> ARROW-2197: Document C++ ABI issue and workaround

…ng shared_ptr to OutputStream Add constructors to return pointers to the base interface Author: Panchen Xue <pan.panchen.xue@gmail.com> Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1642 from xuepanchen/ARROW-2184 and squashes the following commits: 43dc483 <Wes McKinney> Update Python bindings to use new FileOutputStream API d093062 <Panchen Xue> Add test cases f4e3b6a <Panchen Xue> Add static constructor for FileOutputStream returning pointer to base OutputStream

Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1633 from xhochy/ARROW-2191 and squashes the following commits: 5f67465 <Uwe L. Korn> ARROW-2191: Only use specific version of jemalloc

Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1655 from xhochy/ARROW-2204 and squashes the following commits: 1a21f17 [Uwe L. Korn] ARROW-2204: Fix TLS errors in manylinux1 build

… to its indices' nullBitmap Resolves https://issues.apache.org/jira/browse/ARROW-2214 Author: Paul Taylor <paul.e.taylor@me.com> Closes apache#1659 from trxcllnt/ARROW-2214 and squashes the following commits: 8c4ba46 <Paul Taylor> add nullBitmap getter to DictionaryData that proxies to its indices' nullBitmap

Saves ~6min build time. Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1662 from xhochy/ARROW-2212 and squashes the following commits: acf2319 <Uwe L. Korn> ARROW-2212: Build Protobuf in base manylinux1 docker image

…g toolchain C++-only libprotobuf is now being provided by conda-forge Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1664 from wesm/ARROW-2094 and squashes the following commits: cec6c88 <Wes McKinney> Install libprotobuf and set PROTOBUF_HOME when using toolchain

This should resolve https://issues.apache.org/jira/browse/ARROW-2213 Author: Paul Taylor <paul.e.taylor@me.com> Closes apache#1658 from trxcllnt/js-fix-npm-release and squashes the following commits: 71273fa <Paul Taylor> specify unpkg key in apache-arrow module's package.json fd4d5be <Paul Taylor> npm publish via lerna exec

@TheNeuralBit

@TheNeuralBit I think this is my bad as "indicies" isn't a real word. what do you think, is this PR worth it? should we name this "keys" instead? Author: Paul Taylor <paul.e.taylor@me.com> Closes apache#1666 from trxcllnt/js-fix-indicies-typo and squashes the following commits: 711571a <Paul Taylor> fix typo -- rename indicies to indices

See https://issues.apache.org/jira/browse/ARROW-2206 + https://github.com/deepankarsharma Author: lmeyerov <lmeyerov@gmail.com> Closes apache#1652 from lmeyerov/patch-2 and squashes the following commits: 904e172 <lmeyerov> fix(reduce perspective description) 91291a0 <lmeyerov> fix(Perspective description) b8ef13e <lmeyerov> fix(missing colon) 80a4630 <lmeyerov> Document Perspective project (main site) cc938c1 <lmeyerov> ARROW-2206 - Document Perspective project

… enable ASAN builds, add more dev docs More to do here, see discussion in https://issues.apache.org/jira/browse/ARROW-1589 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1503 from wesm/ARROW-2023 and squashes the following commits: e3a590e <Wes McKinney> Do not invoke memcpy unless number of bytes to read greater than 0. Return more informative error message 90787ec <Wes McKinney> Add ASAN instructions, llvm-symbolizer instructions for fuzzers. Remove Apache Kudu cruft from run-test.sh 53d2c92 <Wes McKinney> Add missing lsan-suppressions.txt to enable ASAN to run with unittests 69cb5b7 <Wes McKinney> Start C++ stream reader tests for malformed inputs

Author: Antoine Pitrou <antoine@python.org> Closes apache#1665 from pitrou/ARROW-1035-streaming-benchmark and squashes the following commits: 32b1956 <Antoine Pitrou> ARROW-1035: Add streaming dataframe reconstruction benchmark

Output stream that just writes to stderr. Adapted from StdoutStream, but use `cerr` instead of `cout`. Author: rvernica <rvernica@gmail.com> Closes apache#1657 from rvernica/patch-1 and squashes the following commits: 535dd25 <rvernica> Fix lint issue - trim whitespace 4aa45b0 <rvernica> ARROW-2203: StderrStream class

Author: Antoine Pitrou <antoine@python.org> Closes apache#1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization

Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1672 from xhochy/ARROW-2210 and squashes the following commits: 6e5febe <Uwe L. Korn> ARROW-2210: Reset ptr on failed memory allocation

This ensures the `bin` files are uniformly es5/commonjs for all target modules. Now this works: ```sh npm i @apache-arrow/esnext-esm npx arrow2csv -f some-file.arrow ``` Resolves https://issues.apache.org/jira/browse/ARROW-2223 Author: Paul Taylor <paul.e.taylor@me.com> Closes apache#1669 from trxcllnt/js-add-bin-to-umd-targets and squashes the following commits: 9c6b787 <Paul Taylor> compile src/bin as es5-cjs to all output targets

Author: Korn, Uwe <Uwe.Korn@blue-yonder.com> Closes apache#1676 from xhochy/ARROW-2230 and squashes the following commits: b46c1a2 <Korn, Uwe> ARROW-2230: Strip catch-all tag matching from git-describe

Author: Antoine Pitrou <antoine@python.org> Closes apache#1674 from pitrou/ARROW-2218-infer-python-file-mode and squashes the following commits: a5d704e <Antoine Pitrou> ARROW-2218: PythonFile should infer mode when not given

Resolves https://issues.apache.org/jira/browse/ARROW-2226 Author: Paul Taylor <paul.e.taylor@me.com> Author: Brian Hulette <brian.hulette@ccri.com> Closes apache#1671 from trxcllnt/js-fix-dictionary-data and squashes the following commits: ccecf55 <Paul Taylor> Merge pull request apache#5 from TheNeuralBit/dictionary-vector-tests 3fb9a26 <Brian Hulette> Fix bug in DictionaryVector with nullable indices 2888657 <Brian Hulette> Add dictionary vector unit tests b0a0c08 <Paul Taylor> use indicies.offset in DictionaryData constructor

@TheNeuralBit

@TheNeuralBit somewhere along the way we lost this. Resolves https://issues.apache.org/jira/browse/ARROW-2225 Author: Paul Taylor <paul.e.taylor@me.com> Closes apache#1670 from trxcllnt/js-fix-multiple-buffers and squashes the following commits: 869cd74 <Paul Taylor> make the reader work for tables split across buffers again

This makes GetFileSize thread-safe and also reduces its cost. Author: Antoine Pitrou <antoine@python.org> Closes apache#1934 from pitrou/ARROW-2470-getfilesize and squashes the following commits: 5239207 <Antoine Pitrou> ARROW-2470: Avoid seeking in GetFileSize

When compiled with GPU support, the PlasmaClient ABI would differ, leading to a crash in the Python bindings to Plasma. Author: Antoine Pitrou <antoine@python.org> Closes apache#1933 from pitrou/ARROW-2489-plasma-client-abi and squashes the following commits: 7d28354 <Antoine Pitrou> ARROW-2489: Fix PlasmaClient ABI variation

This PR restores the windows compatibility for the current master. The memory pool abstraction is not used anywhere else within the code base and so is not included in windows builds. Next I plan to add [CI](https://issues.apache.org/jira/browse/ARROW-2436?filter=12343557&jql=project%20%3D%20ARROW%20AND%20component%20%3D%20Rust%20AND%20status%20%3D%20Open) for windows. Followed by [ARROW-2474](https://issues.apache.org/jira/browse/ARROW-2474?filter=12343557&jql=project%20%3D%20ARROW%20AND%20component%20%3D%20Rust%20AND%20status%20%3D%20Open), which I will add windows compatibility to the memory pool abstraction, allowing it to be used throughout the code base. Author: Paddy <paddyhoran@hotmail.com> Closes apache#1938 from paddyhoran/ARROW-2502 and squashes the following commits: 3e00f13 <Paddy> Fixes ARROW-2502

Author: Philipp Moritz <pcmoritz@gmail.com> Author: Antoine Pitrou <pitrou@free.fr> Closes apache#1944 from pcmoritz/fix-pytest-msg and squashes the following commits: 00b2cd4 <Antoine Pitrou> Use `match` argument as intended by the test 6d6cc68 <Philipp Moritz> fix pytest.raises msg to message

Also refactor the type inference visitor and remove the superfluous separate SeqVisitor; improve inference visitor performance by 30%; and add a struct type inference benchmark. Author: Antoine Pitrou <antoine@python.org> Closes apache#1935 from pitrou/ARROW-2074-infer-dict-lists and squashes the following commits: 13ed6c3 <Antoine Pitrou> Fix tests on 2.7 3baa2ea <Antoine Pitrou> ARROW-2074: Infer lists of dicts as struct arrays

This is a followup to apache#1933 which does reference counting of the PlasmaClient held by PlasmaBuffers to avoid the segfault in ARROW-2448. Author: Philipp Moritz <pcmoritz@gmail.com> Closes apache#1939 from pcmoritz/autoget-sharedptr and squashes the following commits: f1e6e8b <Philipp Moritz> fix test 2da395f <Philipp Moritz> update 13b1204 <Philipp Moritz> fixes b68b15e <Philipp Moritz> fix ObjectStatus 6d560db <Philipp Moritz> remove headers 94cdfd7 <Philipp Moritz> add test 6798ed0 <Philipp Moritz> Give shared_ptr of PlasmaClient::Impl to PlasmaBuffer

Author: Krisztián Szűcs <szucs.krisztian@gmail.com> Closes apache#1943 from kszucs/ARROW-2286 and squashes the following commits: bbd496c <Krisztián Szűcs> fix review issues f848858 <Krisztián Szűcs> implement StructValue.__getitem__ 708e78f <Krisztián Szűcs> cpp unittests f4a9bab <Krisztián Szűcs> GetChildByName and GetChildIndex for StructType

Author: Andy Grove <andy.grove@rms.com> Closes apache#1936 from agrove-rms/jdk8 and squashes the following commits: d5dca81 <Andy Grove> remove jdk7 from CI matrix ef01df4 <Andy Grove> use java 1.8 instead of java 1.7

…tead of JDK 7 Author: Andy Grove <andy.grove@rms.com> Closes apache#1956 from agrove-rms/restore_java_tests and squashes the following commits: 69309d0 <Andy Grove> re-instate JDK tests in matrix, but with JDK 8 instead of JDK 7

The build itself fails too: ```bash [info] Compiling 103 Scala sources and 6 Java sources to /apache-arrow/spark/streaming/target/scala-2.11/classes... [error] Compile failed at Apr 13, 2018 9:50:50 AM [1:05.284s] [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 17.349 s] [INFO] Spark Project Tags ................................. SUCCESS [ 20.587 s] [INFO] Spark Project Sketch ............................... SUCCESS [ 12.047 s] [INFO] Spark Project Local DB ............................. SUCCESS [ 8.086 s] [INFO] Spark Project Networking ........................... SUCCESS [ 18.759 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 10.423 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 19.453 s] [INFO] Spark Project Launcher ............................. SUCCESS [ 17.220 s] [INFO] Spark Project Core ................................. SUCCESS [12:40 min] [INFO] Spark Project ML Local Library ..................... SUCCESS [ 32.734 s] [INFO] Spark Project GraphX ............................... SUCCESS [01:02 min] [INFO] Spark Project Streaming ............................ FAILURE [01:09 min] [INFO] Spark Project Catalyst ............................. SKIPPED [INFO] Spark Project SQL .................................. SKIPPED [INFO] Spark Project ML Library ........................... SKIPPED [INFO] Spark Project Tools ................................ SKIPPED [INFO] Spark Project Hive ................................. SKIPPED [INFO] Spark Project REPL ................................. SKIPPED [INFO] Spark Project Assembly ............................. SKIPPED [INFO] Spark Integration for Kafka 0.10 ................... SKIPPED [INFO] Kafka 0.10 Source for Structured Streaming ......... SKIPPED [INFO] Spark Project Examples ............................. SKIPPED [INFO] Spark Integration for Kafka 0.10 Assembly .......... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 17:29 min [INFO] Finished at: 2018-04-13T09:50:50Z [INFO] Final Memory: 59M/741M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on project spark-streaming_2.11: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed.: C ompileFailed -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :spark-streaming_2.11 ``` Should I create a JIRA ticket? Author: Krisztián Szűcs <szucs.krisztian@gmail.com> Closes apache#1890 from kszucs/ARROW-2452 and squashes the following commits: 224e9dd <Krisztián Szűcs> forward all arguments to docker-compose 1d34704 <Krisztián Szűcs> fix order of arguments 2641080 <Krisztián Szűcs> don't pass group- and userid to docker-compose

…d dictionaries This introduces a scalar value class DictionaryValue, which fixes a couple bugs involving dictionaries nested inside of ListArrays or inside of other DictionaryArrays. This also includes a new test, which failed previous to this commit but now passes. See https://issues.apache.org/jira/browse/ARROW-2515. This is my first time contributing, so feedback would be most welcome. Author: Brent Kerby <blkerby@gmail.com> Closes apache#1954 from blkerby/DictionaryValue and squashes the following commits: 1e06963 <Brent Kerby> ARROW-2515: Add DictionaryValue class, fixing bugs with nested dictionaries

…and dictionary array Implement missing Python properties for DictionaryType Author: Marco Neumann <marco.neumann@blue-yonder.com> Closes apache#1951 from crepererum/ARROW-2513 and squashes the following commits: 788179c <Marco Neumann> implement DictionaryType.dictionary b703ecf <Marco Neumann> implement DictionaryType.index_type

wesm · 2018-05-01T04:01:39Z

cpp/src/arrow/compute/kernels/hash.cc

+    COUNT_VALUES_CASE(BinaryType);
+    COUNT_VALUES_CASE(StringType);
+    COUNT_VALUES_CASE(FixedSizeBinaryType);
+    COUNT_VALUES_CASE(Decimal128Type);


Is there a way to promote some code reuse with the other unary (single-argument) hash kernels?

I think so, although I am not sure how to do it. Maybe moving everything to a macro? I am willing to try if somebody could give me some pointers on what's the best way to do it.

The pattern described here might be useful: https://mortoray.com/2014/09/10/using-macros-to-simplify-type-visitors-and-enums/

emkornfield

I guess it is redundant information since Null count is already in the array, but it might be good to either:

Clarify in documentation that null elements are not included in counts
Have an option to include them
Always include them.
(1 seems like a reasonable thing for now with a follow-up JIRA)

emkornfield · 2019-01-24T11:34:44Z

cpp/src/arrow/compute/kernels/hash.h

+/// \note API not yet finalized
+ARROW_EXPORT
+Status CountValues(FunctionContext* context, const Datum& value,
+                   std::shared_ptr<Array>* out_uniques,


it seems more natural to me to have the output type be a struct (but maybe there was discussion on this previously, I guess the existing API had this)?

emkornfield · 2019-01-24T12:07:21Z

Hi @andrioni I'm trying to triage some old bugs and do first pass code reviews. Thanks for the contribution, I think with a rebase it looks ok to me. @xhochy it looks like this JIRA is assigned to you, was there any particular reason for that?

xhochy · 2019-01-24T12:31:38Z

@emkornfield I once wanted to implement this but never got to it. :(

wesm · 2019-01-24T16:56:40Z

This PR may need quite a bit of work since @pitrou has revamped the hash kernels since this was written

emkornfield · 2019-01-30T08:12:03Z

@andrioni do you think you would be able to update the pull request for this release? If not would you mind if it take over?

emkornfield · 2019-02-06T06:14:35Z

@andrioni I'm going to try to pickup were you left off.

emkornfield · 2019-02-07T09:51:28Z

I picked up this up in #3579

andrioni · 2019-02-07T12:38:26Z

Sorry for the delay in answering, and thanks for picking it up!

…ue frequencies Picked up from: #1970 I mostly reused the unit tests as is and modified the rest based on feedback on that PR and adapting to new code. Some other changes I mae: 1. Remove the HashKernel and getter from the public API. I think we can add these back once we have a better idea on how we are doing stateful kernels (i.e. #3407) 2. Add operator[] to NumericBuilder to modify previously added values (this might not be the right place, and I had a little bit of trouble figuring out how to integrate this into the existing TypedTest so the testing is a little bit weak). This seemed better then using a vector to collect values. If this approach looks OK for now. I'm also going to open up a JIRA to try refactor the unittest for Hash kernels (and maybe the headers) since I think there might be a clearer more granular way of laying these out. other things to discuss: 1. Handling null values in Count/Unique (it looks like they are dropped today, should this configurable/turned on). 2. Hashing edge cases for floating point numbers (just opened a JIRA on this). Author: Micah Kornfield <emkornfield@gmail.com> Author: Alessandro Andrioni <alessandroandrioni@gmail.com> Closes #3579 from emkornfield/count_kernel and squashes the following commits: 9c55f7b <Micah Kornfield> make 64 bit dd0d8a1 <Micah Kornfield> fix link and warning 72095eb <Micah Kornfield> Templatize whether to use return status 54afb2b <Micah Kornfield> change from std::copy to memcopy 2973fcc <Micah Kornfield> Address code review comments e7e624b <Micah Kornfield> Add constants, per code review 4770d99 <Micah Kornfield> fix warning c6f6ad7 <Micah Kornfield> address feedback d99e52f <Micah Kornfield> add guard to CopyValue in cases where vector is empty 8c26b01 <Micah Kornfield> fix format b7d5492 <Micah Kornfield> add null test f964bd6 <Micah Kornfield> Rebase e8e58a5 <Micah Kornfield> Address output type code review feedback defb4f1 <Micah Kornfield> remove export from .cc 0152f2f <Micah Kornfield> plumb through status on hash visitors afeb1ad <Micah Kornfield> add real jira 96858bd <Micah Kornfield> Use macro inversion to reduce boiler plate 0dd0077 <Micah Kornfield> minimal test 57349f7 <Micah Kornfield> unit tests passing 34834f7 <Alessandro Andrioni> First try at implementing a CountValues kernel

…ue frequencies Picked up from: apache/arrow#1970 I mostly reused the unit tests as is and modified the rest based on feedback on that PR and adapting to new code. Some other changes I mae: 1. Remove the HashKernel and getter from the public API. I think we can add these back once we have a better idea on how we are doing stateful kernels (i.e. apache/arrow#3407) 2. Add operator[] to NumericBuilder to modify previously added values (this might not be the right place, and I had a little bit of trouble figuring out how to integrate this into the existing TypedTest so the testing is a little bit weak). This seemed better then using a vector to collect values. If this approach looks OK for now. I'm also going to open up a JIRA to try refactor the unittest for Hash kernels (and maybe the headers) since I think there might be a clearer more granular way of laying these out. other things to discuss: 1. Handling null values in Count/Unique (it looks like they are dropped today, should this configurable/turned on). 2. Hashing edge cases for floating point numbers (just opened a JIRA on this). Author: Micah Kornfield <emkornfield@gmail.com> Author: Alessandro Andrioni <alessandroandrioni@gmail.com> Closes #3579 from emkornfield/count_kernel and squashes the following commits: 9c55f7ba6 <Micah Kornfield> make 64 bit dd0d8a155 <Micah Kornfield> fix link and warning 72095ebc4 <Micah Kornfield> Templatize whether to use return status 54afb2bac <Micah Kornfield> change from std::copy to memcopy 2973fccbe <Micah Kornfield> Address code review comments e7e624b1f <Micah Kornfield> Add constants, per code review 4770d9924 <Micah Kornfield> fix warning c6f6ad72f <Micah Kornfield> address feedback d99e52fb7 <Micah Kornfield> add guard to CopyValue in cases where vector is empty 8c26b0154 <Micah Kornfield> fix format b7d54929a <Micah Kornfield> add null test f964bd6da <Micah Kornfield> Rebase e8e58a5b9 <Micah Kornfield> Address output type code review feedback defb4f1a1 <Micah Kornfield> remove export from .cc 0152f2fa5 <Micah Kornfield> plumb through status on hash visitors afeb1ad04 <Micah Kornfield> add real jira 96858bd52 <Micah Kornfield> Use macro inversion to reduce boiler plate 0dd007718 <Micah Kornfield> minimal test 57349f7ea <Micah Kornfield> unit tests passing 34834f711 <Alessandro Andrioni> First try at implementing a CountValues kernel

wesm and others added 30 commits February 22, 2018 12:09

[JS] Fix typo in npm target for esNext/CommonJS. (apache#1645)

bcbcf02

ARROW-2180: [C++] Remove deprecated APIs from 0.8.0 cycle

5f10067

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1638 from wesm/ARROW-2180 and squashes the following commits: a13931c [Wes McKinney] Remove deprecated C++ APIs from 0.8.0 cycle

ARROW-2132: Add link to Plasma in main README

cdc347c

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1636 from wesm/ARROW-2132 and squashes the following commits: dbbef5d [Wes McKinney] Add link to Plasma in main README, tweak language

ARROW-2069: [Python] Add note that Plasma is not supported on Windows

27f7eba

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1641 from wesm/ARROW-2069 and squashes the following commits: e66d1b6 [Wes McKinney] Add note that Plasma is not supported on Windows

[Dev] Follow-up, use angle brackets for commit author instead of squa…

d52f2ff

…re brackets Change-Id: Iad77d14c9116c30249e3e7f13ed40ed82e579832

ARROW-2093: [Python] Do not install PyTorch in Travis CI

c2865d0

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1637 from wesm/ARROW-2093 and squashes the following commits: e3040a9 <Wes McKinney> Do not install PyTorch in Travis CI

ARROW-2201: [Website] Publish JS API Docs

2f01658

Author: Brian Hulette <brian.hulette@ccri.com> Closes apache#1647 from TheNeuralBit/js-doc-site and squashes the following commits: e5ca4df [Brian Hulette] Add JS docs to website

ARROW-2197: Document C++ ABI issue and workaround

cca4a74

Author: Antoine Pitrou <antoine@python.org> Closes apache#1644 from pitrou/ARROW-2197-doc-undefined-symbol and squashes the following commits: 99db6e0 <Antoine Pitrou> ARROW-2197: Document C++ ABI issue and workaround

ARROW-2191: [C++] Only use specific version of jemalloc

c0b0e33

Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1633 from xhochy/ARROW-2191 and squashes the following commits: 5f67465 <Uwe L. Korn> ARROW-2191: Only use specific version of jemalloc

ARROW-2204: Fix TLS errors in manylinux1 build

2fd8f0a

Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1655 from xhochy/ARROW-2204 and squashes the following commits: 1a21f17 [Uwe L. Korn] ARROW-2204: Fix TLS errors in manylinux1 build

ARROW-1937: [Python] Document nested array initialization

887e893

Author: Antoine Pitrou <antoine@python.org> Closes apache#1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization

ARROW-2210: [C++] Reset ptr on failed memory allocation

482fc58

Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1672 from xhochy/ARROW-2210 and squashes the following commits: 6e5febe <Uwe L. Korn> ARROW-2210: Reset ptr on failed memory allocation

ARROW-2230: [Python] Strip catch-all tag matching from git-describe

d3fabe0

Author: Korn, Uwe <Uwe.Korn@blue-yonder.com> Closes apache#1676 from xhochy/ARROW-2230 and squashes the following commits: b46c1a2 <Korn, Uwe> ARROW-2230: Strip catch-all tag matching from git-describe

pitrou and others added 15 commits April 23, 2018 22:56

ARROW-2498: [Java] Use java 1.8 instead of java 1.7

c574006

Author: Andy Grove <andy.grove@rms.com> Closes apache#1936 from agrove-rms/jdk8 and squashes the following commits: d5dca81 <Andy Grove> remove jdk7 from CI matrix ef01df4 <Andy Grove> use java 1.8 instead of java 1.7

[GLib] Fix a typo

3f5819a

First try at implementing a CountValues kernel

f93a635

Remove commented out prototype

2d99fc6

wesm reviewed May 1, 2018

View reviewed changes

Fix formatting issues

95da385

emkornfield reviewed Jan 24, 2019

View reviewed changes

emkornfield mentioned this pull request Feb 7, 2019

ARROW-1572: [C++] Implement "value counts" kernels for tabulating value frequencies #3579

Closed

andrioni closed this Feb 7, 2019

asfimport mentioned this pull request Mar 14, 2019

[C++] Implement "value counts" kernels for tabulating value frequencies #17585

Closed

ARROW-1572: [C++] Implement "value counts" kernels for tabulating value frequencies #1970

ARROW-1572: [C++] Implement "value counts" kernels for tabulating value frequencies #1970

Uh oh!

Conversation

andrioni commented May 1, 2018

Uh oh!

wesm May 1, 2018

Choose a reason for hiding this comment

Uh oh!

andrioni May 1, 2018

Choose a reason for hiding this comment

Uh oh!

emkornfield Jan 24, 2019

Choose a reason for hiding this comment

Uh oh!

emkornfield left a comment

Choose a reason for hiding this comment

Uh oh!

emkornfield Jan 24, 2019

Choose a reason for hiding this comment

Uh oh!

emkornfield commented Jan 24, 2019

Uh oh!

xhochy commented Jan 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wesm commented Jan 24, 2019

Uh oh!

emkornfield commented Jan 30, 2019

Uh oh!

emkornfield commented Feb 6, 2019

Uh oh!

emkornfield commented Feb 7, 2019

Uh oh!

andrioni commented Feb 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

xhochy commented Jan 24, 2019 •

edited

Loading