-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-971: [C++/Python] Implement Array.IsValid/IsNull #1378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1092 from kou/glib-travis-macos and squashes the following commits: 291808b [Kouhei Sutou] [GLib] Use Xcode 8.3 on Travis CI
If you use `@rpath` for install_name (default), you can use the DYLD_LIBRARY_PATH environment variable to find libarrow.dylib. But the DYLD_LIBRARY_PATH environment variable isn't inherited to sub process by System Integration Protection (SIP). It's difficult to use libarrow.dylib. You can use full path install_name by -DARROW_INSTALL_NAME_RPATH=OFF CMake option. If you use it, you can find libarrow.dylib without DYLD_LIBRARY_PATH environment variable. Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1100 from kou/cpp-macos-support-install-name and squashes the following commits: 8207ace [Kouhei Sutou] [C++] Support building with full path install_name on macOS
…ld verification script I found that the script did not work due to the remnants of the last time I ran it. Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1101 from wesm/ARROW-1542 and squashes the following commits: 0718370 [Wes McKinney] Install packages in temporary directory in MSVC build verification script
Since we're accumulating a bunch of components, I started this script which we can refine to make verifying releases easier for others. I bootstrapped some pieces off https://github.com/apache/parquet-cpp/blob/master/dev/release/verify-release-candidate, very helpful! This script: * Checks GPG signature, checksums * Installs temporary Python install for the duration of these tests * Builds/install C++ and runs tests (with Python and Plasma) * Builds parquet-cpp against the Arrow RC * Python (with Parquet and Plasma extensions) * C GLib (requires Ruby in PATH and the gems indicated in README) * Integration tests * JavaScript (requires NodeJS >= 6.0.0) There are some potentially snowflake-y aspects to my environment: * BOOST_ROOT is set to a Boost install location containing libraries built with `-fPIC`. I'm not sure what to do about this one. One maybe better option is to use system level boost and shared libraries * Maven 3.3.9 is in PATH * NodeJS 6.11.3 is in PATH There are probably some other things that Linux users will run into as they run this script. I had to compile GLib libraries in this since the ones at system level (Ubuntu 14.04) are too old. cc @kou @xhochy Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1102 from wesm/ARROW-559 and squashes the following commits: 8fd6530 [Wes McKinney] Use Boost shared libraries 3531927 [Wes McKinney] Add note to dev/README.md 079b5e4 [Wes McKinney] Fix comments 17f7ac0 [Wes McKinney] More fixes, finally works adb3146 [Wes McKinney] More work on release verification script 86ef171 [Wes McKinney] Start Linux release verification script
Closes apache#1107 Change-Id: I9cb83279900aed8e04ef8baf049e30c5007e6538
Ubuntu 14.04 ships GLib 2.40. Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1106 from kou/glib-support-glib-2.40-again and squashes the following commits: cbcdf9a [Kouhei Sutou] [GLib] Support GLib 2.40 again
Resolves https://issues.apache.org/jira/browse/ARROW-1544 Author: Paul Taylor <paul.e.taylor@me.com> Closes apache#1103 from trxcllnt/js-export-vector-typedefs and squashes the following commits: 91a0625 [Paul Taylor] use gulp 4 from github. thought 4-alpha was on npm already. e5a1034 [Paul Taylor] fix jest test coverage script c6b09ee [Paul Taylor] export Vector types on root Arrow export 032ad27 [Paul Taylor] add compileOnSave (now required by TS 2.5?) eb96552 [Paul Taylor] update dependencies
…E.md of c_glib Add some detailed explanation of common build problems especially on macOS because it requires some tweaks. Author: Wataru Shimizu <waruzilla@gmail.com> Closes apache#1104 from wagavulin/build-troubleshooting and squashes the following commits: 9b65542 [Wataru Shimizu] Improve format and the explanation of installing/linking autoconf archive on macOS. b6c5274 [Wataru Shimizu] Add "Common build problems" section in the README.md of c_glib
`append_values()` are for bulk values append. `append_nulls()` are for bulk nulls append. Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1110 from kou/glib-support-bulk-append-in-builder and squashes the following commits: 4926031 [Kouhei Sutou] [GLib] Support bulk append in builder
…iter.close to avoid Windows flakiness I can reproduce this failure locally, but I'm unsure why this just now started happening. The 0.7.0 release build passed (https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/build/1.0.3357/job/477b1iicmwuy51l8) and there haven't been related code changes since then. Either way it's better to close the sink explicitly Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1114 from wesm/ARROW-1550 and squashes the following commits: 863827c [Wes McKinney] Check status 7248c79 [Wes McKinney] Explicitly close owned file handles in ParquetWriter.close to avoid flakiness on Windows
I drafted a post to publish tomorrow. If anyone would like to make some changes or additions please post a link to a git commit here for me to cherry pick cc @kou @trxcllnt @pcmoritz I think we should write a whole blog post about the object serialization functions. The perf wins over pickle when working with large datasets are a pretty big deal Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1111 from wesm/ARROW-1551 and squashes the following commits: 3e05047 [Wes McKinney] Update publication date to 19 September a9f8770 [Wes McKinney] More edits, links 8c877d9 [Wes McKinney] Draft 0.7.0 release post
Change-Id: I8842358bbdc66635380891982ab3842018615fd9
Change-Id: I9c27893ebfee46364d78963fe20a43f06a1aa700
…ty for computing target memory requirement cc @jacques-n , This is same as apache#1097 The latter one was closed as I had to rename the branch correctly and use the correct JIRA number. Author: siddharth <siddharth@dremio.com> Closes apache#1112 from siddharthteotia/ARROW-1533 and squashes the following commits: 4c97be4 [siddharth] ARROW-1533: realloc should consider the existing buffer capacity for computing target memory requirement
Problem: Typically there are 3 ways of specifying the amount of memory needed for vectors. CASE (1) allocateNew() – here the application doesn't really specify the size of memory or value count. Each vector type has a default value count (4096) and therefore a default size (in bytes) is used in such cases. For example, for a 4 byte fixed-width vector, we will allocate 32KB of memory for a call to allocateNew(). CASE (2) setInitialCapacity(count) followed by allocateNew() - In this case also the application doesn't specify the value count or size in allocateNew(). However, the call to setInitialCapacity() dictates the amount of memory the subsequent call to allocateNew() will allocate. For example, we can do setInitialCapacity(1024) and the call to allocateNew() will allocate 4KB of memory for the 4 byte fixed-width vector. CASE (3) allocateNew(count) - The application is specific about requirements. For nullable vectors, the above calls also allocate the memory for validity vector. The problem is that Bit Vector uses a default memory size in bytes of 4096. In other words, we allocate a vector for 4096*8 value count. In the default case (as explained above), the vector types have a value count of 4096 so we need only 4096 bits (512 bytes) in the bit vector and not really 4096 as the size in bytes. This happens in CASE 1 where the application depends on the default memory allocation . In such cases, the size of buffer for bit vector is 8x than actually needed Author: siddharth <siddharth@dremio.com> Closes apache#1109 from siddharthteotia/ARROW-1547 and squashes the following commits: c92164a [siddharth] addressed review comments f3d1234 [siddharth] ARROW-1547: Fix 8x memory over-allocation in BitVector
Author: Deepak Majeti <deepak.majeti@hpe.com> Closes apache#1105 from majetideepak/ARROW-1536 and squashes the following commits: 9f4ed61 [Deepak Majeti] Review comments d49e1aa [Deepak Majeti] Fix failure 055dc30 [Deepak Majeti] ARROW-1536:[C++] Do not transitively depend on libboost_system
…time may need to be installed on Windows Close apache#819 (tidying) Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1115 from wesm/ARROW-1554 and squashes the following commits: a7c3e27 [Wes McKinney] Update Sphinx install page to note that VC14 runtime may need to be installed separately when using pip on Windows
Implement setInitialCapacity for MapWriter and pass on this capacity during lazy creation of child vectors cc @jacques-n , @StevenMPhillips Author: siddharth <siddharth@dremio.com> Closes apache#1113 from siddharthteotia/ARROW-1553 and squashes the following commits: 5a759be [siddharth] ARROW-1553: Implement setInitialCapacity for MapWriter and pass on this capacity during lazy creation of child vectors
We now raise a ValueError when the length of the names doesn't match the length of the arrays. ```python In [1]: import pyarrow as pa In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], names=['a', 'b', 'c']) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-2-cda803f3f774> in <module>() ----> 1 pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], names=['a', 'b', 'c']) table.pxi in pyarrow.lib.Table.from_arrays() table.pxi in pyarrow.lib._schema_from_arrays() ValueError: Length of names (3) does not match length of arrays (2) ``` This affected `RecordBatch.from_arrays` and `Table.from_arrays`. Author: Tom Augspurger <tom.w.augspurger@gmail.com> Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1117 from TomAugspurger/validate-names and squashes the following commits: 4df6f59 [Tom Augspurger] REF: avoid redundant len calculation 965a560 [Wes McKinney] Fix test failure exposed in test_parquet.py ed74d52 [Tom Augspurger] ARROW-1557 [Python] Validate names length in Table.from_arrays
Author: Li Jin <ice.xelloss@gmail.com> Closes apache#1067 from icexelloss/json-reader-ARROW-1497 and squashes the following commits: 6d4e1df [Li Jin] Fix JsonReader to read union vectors correctly
…ppedFile::Create Author: Amir Malekpour <a.malekpour@gmail.com> Author: Amir Malekpour <a.malekpour@gmail.com> Closes apache#1116 from amirma/arrow-1500 and squashes the following commits: 689aaa9 [Amir Malekpour] RROW-1500: [C++] Do not ignore return value from truncate in MemoryMappedFile::Create
…_script stage to fail faster Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1118 from wesm/ARROW-1578 and squashes the following commits: 0bb5202 [Wes McKinney] System python not available on xcode 6.4 machines d1cf679 [Wes McKinney] Set language: python when linting on macOS 910f684 [Wes McKinney] Fixes for linting. Do not cache .conda_packages ed9e23a [Wes McKinney] Move linting to separate shell script b7db083 [Wes McKinney] Only run lint checks when not running in --only-library mode 7e50fad [Wes McKinney] Revert cpplint failure 28fc3fb [Wes McKinney] Typo 329f017 [Wes McKinney] Run lint checks before compiling anything. Make cpplint warning
Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1121 from xhochy/ARROW-1591 and squashes the following commits: 0b3a11a [Uwe L. Korn] ARROW-1591: C++: Xcode 9 is not correctly detected
This makes the child fields of ListVector have consistent names of `ListVector.DATA_VECTOR_NAME`. Previously, an empty ListVector would have a child name of `ZeroVector.name` which is "[DEFAULT]". Author: Bryan Cutler <cutlerb@gmail.com> Author: Steven Phillips <steven@dremio.com> Closes apache#1119 from BryanCutler/java-ListVector-child-name-ARROW-1347 and squashes the following commits: c240378 [Bryan Cutler] changed to use instanceof and added test 2923a45 [Steven Phillips] ARROW-1347: [JAVA] return consistent child field name for List vectors
…broken builds One of the dependencies installed in the docs requirements is causing NumPy to get downgraded by the SAT solver, and this is then causing an ABI conflict with the pyarrow build (which was built with a different version of NumPy). This installs everything in one `conda install` call Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1123 from wesm/ARROW-1595 and squashes the following commits: 60b05ad [Wes McKinney] Install conda dependencies all at once, pin NumPy version
This PR fixes the Table generics to infer the types from the call site:  @wesm this PR also includes the fixes to the prepublish script I mentioned yesterday. Author: Paul Taylor <paul.e.taylor@me.com> Closes apache#1120 from trxcllnt/fix-ts-typings and squashes the following commits: 73d8eee [Paul Taylor] make package the default gulp task 1d269fe [Paul Taylor] flow table method generics dd1e819 [Paul Taylor] more defensively typed reader internal values ac6a778 [Paul Taylor] add comments explaining ARROW-1363 reader workaround e37f885 [Paul Taylor] fix gulp and prepublish scripts 58fa201 [Paul Taylor] enforce exact dependency package versions
Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1122 from kou/glib-add-uint-array-builder and squashes the following commits: 24bb9a7 [Kouhei Sutou] [GLib] Add missing "unsigned" fd23f24 [Kouhei Sutou] [GLib] Fix build error on macOS 5b59775 [Kouhei Sutou] [GLib] Add UIntArrayBuilder
Even though fixed object id is used in implementation, comment says random object id is created. Author: Kentaro Hayashi <hayashi@clear-code.com> Closes apache#1124 from kenhys/arrow-1598 and squashes the following commits: dc5934e [Kentaro Hayashi] ARROW-1598: [C++] Fix diverged code comment in plasma tutorial
…ternal::BitmapReader in lieu of macros @xhochy since this is causing the crash reported in ARROW-1601 we may want to do a patch release 0.7.1 and parquet-cpp 1.3.1 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1126 from wesm/ARROW-1601 and squashes the following commits: 6cec81c [Wes McKinney] Fix RleDecoder logic with BitmapReader ba58b8a [Wes McKinney] Fix test name fa47865 [Wes McKinney] Add BitmapReader class to replace the bitset macros
* updated file path to brew install for repos dir * added information about bundled wheel build
Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1361 from kou/glib-dictionary-data-type and squashes the following commits: 6ccce1f [Kouhei Sutou] [GLib] Add GArrowDictionaryDataType
This closes [ARROW-1758](https://issues.apache.org/jira/browse/ARROW-1758). Author: Licht-T <licht-t@outlook.jp> Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1347 from Licht-T/clean-pickle-option-for-object-serialization and squashes the following commits: 927f154 [Wes McKinney] Use cloudpickle for lambda serialization if available ba998dd [Licht-T] CLN: Remove pickle=True option for object serialization
…der, Table.to_batches method This also fixes ARROW-504 by adding a chunksize option when writing tables to a RecordBatch stream in Python Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1364 from wesm/ARROW-1178 and squashes the following commits: a31e258 [Wes McKinney] Add chunksize argument to RecordBatchWriter.write_table dc6023a [Wes McKinney] Implement Table.to_batches, add tests
Change-Id: I1db065001e7fc196128e8f8c36b3406a89ccbdd5
This makes for a more convenient / less rigid API without as need for as many usages of `reinterpret_cast<const uint8_t*>`. This does not impact downstream projects (e.g. parquet-cpp is unaffected) unless they provide implementations of these virtual interfaces. Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1363 from wesm/ARROW-1850 and squashes the following commits: af5a348 [Wes McKinney] Update glib, arrow-gpu for API changes 5d5cf2d [Wes McKinney] Use void* / const void* for buffers in file APIs
…erialized Python object with minimal allocation
For systems (like Dask) that prefer to handle their own framed buffer transport, this provides a list of memoryview-compatible objects with minimal copying / allocation from the input data structure, which can similarly be zero-copy reconstructed to the original object.
To motivate the use case, consider a dict of ndarrays:
```
data = {i: np.random.randn(1000, 1000) for i in range(50)}
```
Here, we have:
```
>>> %timeit serialized = pa.serialize(data)
52.7 µs ± 1.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
This is about 400MB of data. Some systems may not want to double memory by assembling this into a single large buffer, like with the `to_buffer` method:
```
>>> written = serialized.to_buffer()
>>> written.size
400015456
```
We provide a `to_components` method which contains a dict with a `'data'` field containing a list of `pyarrow.Buffer` objects. This can be converted back to the original Python object using `pyarrow.deserialize_components`:
```
>>> %timeit components = serialized.to_components()
73.8 µs ± 812 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> list(components.keys())
['num_buffers', 'data', 'num_tensors']
>>> len(components['data'])
101
>>> type(components['data'][0])
pyarrow.lib.Buffer
```
and
```
>>> %timeit recons = pa.deserialize_components(components)
93.6 µs ± 260 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
The reason there are 101 data components (1 + 2 * 50) is that:
* 1 buffer for the serialized Union stream representing the object
* 2 buffers for each of the tensors: 1 for the metadata and 1 for the tensor body. The body is separate so that this is zero-copy from the input
Next step after this is ARROW-1784 which is to transport a pandas.DataFrame using this mechanism
cc @pitrou @jcrist @mrocklin
Author: Wes McKinney <wes.mckinney@twosigma.com>
Closes apache#1362 from wesm/ARROW-1783 and squashes the following commits:
4ec5a89 [Wes McKinney] Add missing decref on error
e8c76d4 [Wes McKinney] Acquire GIL in GetSerializedFromComponents
1d2e0e2 [Wes McKinney] Fix function documentation
fffc7bb [Wes McKinney] Typos, add deserialize_components to API
50d2fee [Wes McKinney] Finish componentwise serialization roundtrip
58174dd [Wes McKinney] More progress, stubs for reconstruction
b1e31a3 [Wes McKinney] Draft GetTensorMessage
337e1d2 [Wes McKinney] Draft SerializedPyObject::GetComponents
598ef33 [Wes McKinney] Tweak
This removes non-nullable vectors that are no longer part of the vector class hierarchy and renames Nullable*Vector classes to remove the Nullable prefix. Author: Bryan Cutler <cutlerb@gmail.com> Closes apache#1341 from BryanCutler/java-nullable-vector-rename-ARROW-1710 and squashes the following commits: 7d930dc [Bryan Cutler] fixed realloc test ff2120d [Bryan Cutler] clean up test 374dfcc [Bryan Cutler] properly rename BitVector file 6b7a85e [Bryan Cutler] remove old BitVector.java before rebase 089f7fc [Bryan Cutler] some minor cleanup 4e580d9 [Bryan Cutler] removed legacy BitVector 74f771f [Bryan Cutler] fixed remaining tests 8c5dfef [Bryan Cutler] fix naming in support classes 6e498e5 [Bryan Cutler] removed nullable prefix dfed444 [Bryan Cutler] removed non-nullable vectors
…zero offset This uncovered some bugs. I inspected the other kernels that are untested and while they look fine, at some point we may want to add some more extensive unit tests about this Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1369 from wesm/ARROW-1735 and squashes the following commits: de41d92 [Wes McKinney] Test CastKernel writing into output array with non-zero offset
**Just posting this for discussion.** See the preceding discussion on https://issues.apache.org/jira/browse/ARROW-1854. I think the ideal way to solve this would actually be to improve our handling of lists, which should be possible given that pickle seems to outperform us by 6x according to the benchmarks in https://issues.apache.org/jira/browse/ARROW-1854. Note that the implementation in this PR will not handle numpy arrays of user-defined classes because it will not fall back to cloudpickle when needed. cc @pcmoritz @wesm Author: Wes McKinney <wes.mckinney@twosigma.com> Author: Robert Nishihara <robertnishihara@gmail.com> Closes apache#1360 from robertnishihara/numpyobject and squashes the following commits: c37a0a0 [Wes McKinney] Fix flake 5191503 [Wes McKinney] Fix post rebase 43f2c80 [Wes McKinney] Add SerializationContext.clone method. Add pandas_serialization_context member that uses pickle for NumPy arrays with unsupported tensor types c944023 [Wes McKinney] Use pickle.HIGHEST_PROTOCOL, convert to Buffer then memoryview for more memory-efficient transport cf719c3 [Robert Nishihara] Use pickle to serialize numpy arrays of objects.
…ath prefix Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1366 from wesm/ARROW-1684 and squashes the following commits: e63e42a [Wes McKinney] Support selecting nested Parquet fields by any path prefix
…ength strings I also fixed a bug this surfaced in the hash table resize (unit test coverage was not adequate) Now we have ``` $ ./release/compute-benchmark Run on (8 X 4200.16 MHz CPU s) 2017-11-28 18:33:53 Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------- BM_BuildDictionary/min_time:1.000 1352 us 1352 us 1038 2.88639GB/s BM_BuildStringDictionary/min_time:1.000 3994 us 3994 us 351 75.5809MB/s BM_UniqueInt64NoNulls/16M/50/min_time:1.000/real_time 35814 us 35816 us 39 3.49023GB/s BM_UniqueInt64NoNulls/16M/1024/min_time:1.000/real_time 119656 us 119660 us 12 1069.73MB/s BM_UniqueInt64NoNulls/16M/10k/min_time:1.000/real_time 174924 us 174930 us 8 731.747MB/s BM_UniqueInt64NoNulls/16M/1024k/min_time:1.000/real_time 448425 us 448440 us 3 285.443MB/s BM_UniqueInt64WithNulls/16M/50/min_time:1.000/real_time 49511 us 49513 us 29 2.52468GB/s BM_UniqueInt64WithNulls/16M/1024/min_time:1.000/real_time 134519 us 134523 us 10 951.541MB/s BM_UniqueInt64WithNulls/16M/10k/min_time:1.000/real_time 191331 us 191336 us 7 668.999MB/s BM_UniqueInt64WithNulls/16M/1024k/min_time:1.000/real_time 533597 us 533613 us 3 239.882MB/s BM_UniqueString10bytes/16M/50/min_time:1.000/real_time 150731 us 150736 us 9 1061.5MB/s BM_UniqueString10bytes/16M/1024/min_time:1.000/real_time 256929 us 256938 us 5 622.739MB/s BM_UniqueString10bytes/16M/10k/min_time:1.000/real_time 414412 us 414426 us 3 386.09MB/s BM_UniqueString10bytes/16M/1024k/min_time:1.000/real_time 1744253 us 1744308 us 1 91.7298MB/s BM_UniqueString100bytes/16M/50/min_time:1.000/real_time 563890 us 563909 us 2 2.77093GB/s BM_UniqueString100bytes/16M/1024/min_time:1.000/real_time 704695 us 704720 us 2 2.21727GB/s BM_UniqueString100bytes/16M/10k/min_time:1.000/real_time 995685 us 995721 us 2 1.56927GB/s BM_UniqueString100bytes/16M/1024k/min_time:1.000/real_time 3584108 us 3584230 us 1 446.415MB/s ``` We can also refactor the hash table implementations without worrying too much about whether we're making things slower Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1370 from wesm/ARROW-1844 and squashes the following commits: 638f1a1 [Wes McKinney] Decrease resize load factor to 0.5 2885c64 [Wes McKinney] Multiply bytes processed by state.iterations() f7b3619 [Wes McKinney] Add initial Unique benchmarks for int64, strings
JIRA: https://issues.apache.org/jira/browse/ARROW-1869 This PR fixes spelling error in class name for `LowCostIdentityHashMap`. Follow-up for apache#1150. Author: Ivan Sadikov <ivan.sadikov@team.telstra.com> Closes apache#1372 from sadikovi/fix-low-cost-identity-hash-map and squashes the following commits: e3529f6 [Ivan Sadikov] fix low cost identity hash map name
Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1365 from kou/glib-dictionary-array and squashes the following commits: 83bfa13 [Kouhei Sutou] [GLib] Add GArrowDictionaryArray
|
CI on Windows failed because of c:\projects\arrow\cpp\src\arrow\util\logging.h(138): error C2220: warning treated as error - no 'object' file generated [C:\projects\arrow\cpp\build\src\arrow\arrow_static.vcxproj]
"C:\projects\arrow\cpp\build\INSTALL.vcxproj" (default target) (1) ->
"C:\projects\arrow\cpp\build\ALL_BUILD.vcxproj" (default target) (3) ->
"C:\projects\arrow\cpp\build\src\arrow\python\arrow_python_shared.vcxproj" (default target) (17) ->
"C:\projects\arrow\cpp\build\src\arrow\arrow_shared.vcxproj" (default target) (18) ->
c:\projects\arrow\cpp\src\arrow\util\logging.h(138): error C2220: warning treated as error - no 'object' file generated [C:\projects\arrow\cpp\build\src\arrow\arrow_shared.vcxproj]
79 Warning(s)
2 Error(s)
Time Elapsed 00:13:59.08
|
Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1377 from kou/glib-unique and squashes the following commits: 4385e22 [Kouhei Sutou] Add garrow_array_unique()
…lues The last upgrade of the Jackson JSON library changed behavior to no longer allow reading of "NaN" values by default. This change configures the JSON generator and parser to allow for NaN values (unquoted) alongside standard floating point numbers. A test was added for JSON writing/reading and modified the test for Arrow file and stream . Author: Bryan Cutler <cutlerb@gmail.com> Closes apache#1375 from BryanCutler/java-JsonReader-all_non_numeric-ARROW-1817 and squashes the following commits: 4c4682a [Bryan Cutler] configure JsonWriter to write NaN not as strings, add test for read and write of float with NaN 1fa24f4 [Bryan Cutler] added conf for JacksonParser to allow NaN tokens
76da1ea to
73a0328
Compare
|
I need to think a bit about this one. The Python API side is OK, but on the C++ side we might want to make this kernel-like |
|
@wesm So you mean this should be implemented as reusable in another component? |
|
Right, this computation might be a unit of work in some more general computational pipeline, so it would be useful for this to be implemented with a similar API to other array kernel functions |
|
@Licht-T What's the status of this PR? Do you plan to move this to a kernel? |
|
This is marked for the 0.12 release. We should rewrite as a kernel |
|
Closing as stale, let us revisit in an upcoming release cycle |
This closes ARROW-971.