Skip to content

Commit 8c877d9

Browse files
committed
Draft 0.7.0 release post
Change-Id: Ie718ee1b03cf983036e5391f4334a57fb21ec953
1 parent e1d9c7f commit 8c877d9

File tree

4 files changed

+225
-24
lines changed

4 files changed

+225
-24
lines changed
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
---
2+
layout: post
3+
title: "Apache Arrow 0.7.0 Release"
4+
date: "2017-09-18 00:00:00 -0400"
5+
author: wesm
6+
categories: [release]
7+
---
8+
<!--
9+
{% comment %}
10+
Licensed to the Apache Software Foundation (ASF) under one or more
11+
contributor license agreements. See the NOTICE file distributed with
12+
this work for additional information regarding copyright ownership.
13+
The ASF licenses this file to you under the Apache License, Version 2.0
14+
(the "License"); you may not use this file except in compliance with
15+
the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing, software
20+
distributed under the License is distributed on an "AS IS" BASIS,
21+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
22+
See the License for the specific language governing permissions and
23+
limitations under the License.
24+
{% endcomment %}
25+
-->
26+
27+
The Apache Arrow team is pleased to announce the 0.7.0 release. It includes
28+
[**133 resolved JIRAs**][1] many new features and bug fixes to the various
29+
language implementations. The Arrow memory format remains stable since the
30+
0.3.x release.
31+
32+
See the [Install Page][2] to learn how to get the libraries for your
33+
platform. The [complete changelog][3] is also available.
34+
35+
We include some highlights from the release in this post.
36+
37+
## New PMC Member: Kouhei Sutou
38+
39+
Since the last release we have added [Kou][4] to the Arrow Project Management
40+
Committee. He is also a PMC for Apache Subversion, and a major contributor to
41+
many other open source projects.
42+
43+
As an active member of the Ruby community in Japan, Kou has been developing the
44+
GLib-based C bindings for Arrow with associated Ruby wrappers, to enable Ruby
45+
users to benefit from the work that's happening in Apache Arrow.
46+
47+
We are excited to be collaborating with the Ruby community on shared
48+
infrastructure for in-memory analytics and data science.
49+
50+
## Expanded JavaScript (TypeScript) Implementation
51+
52+
[Paul Taylor][5] from the [Falcor][7] and [ReactiveX][6] projects has worked to
53+
expand the JavaScript implementation (which is written in TypeScript), using
54+
the latest in modern JavaScript build and packaging technology. We are looking
55+
forward to building out the JS implementation and bringing it up to full
56+
functionality with the C++ and Java implementations.
57+
58+
We are looking for more JavaScript developers to join the project and work
59+
together to make Arrow for JS work well with many kinds of front end use cases,
60+
like real time data visualization.
61+
62+
## Type casting for C++ and Python
63+
64+
As part of longer-term efforts to build an Arrow-native in-memory analytics
65+
library, we implemented a variety of type conversion functions. These functions
66+
are essential in ETL tasks when conforming one table schema to another. These
67+
are similar to the `astype` function in NumPy.
68+
69+
```python
70+
In [17]: import pyarrow as pa
71+
72+
In [18]: arr = pa.array([True, False, None, True])
73+
74+
In [19]: arr
75+
Out[19]:
76+
<pyarrow.lib.BooleanArray object at 0x7ff6fb069b88>
77+
[
78+
True,
79+
False,
80+
NA,
81+
True
82+
]
83+
84+
In [20]: arr.cast(pa.int32())
85+
Out[20]:
86+
<pyarrow.lib.Int32Array object at 0x7ff6fb0383b8>
87+
[
88+
1,
89+
0,
90+
NA,
91+
1
92+
]
93+
```
94+
95+
Over time these will expand to support as many input-and-output type
96+
combinations with optimized conversions.
97+
98+
## New Arrow GPU (CUDA) Extension Library for C++
99+
100+
To help with GPU-related projects using Arrow, like the [GPU Open Analytics
101+
Initiative][8], we have started a C++ add-on library to simplify Arrow memory
102+
management on CUDA-enabled graphics cards. We would like to expand this to
103+
include a library of reusable CUDA kernel functions for GPU analytics on Arrow
104+
columnar memory.
105+
106+
For example, we could write a record batch from CPU memory to GPU device memory
107+
like so (some error checking omitted):
108+
109+
```c++
110+
#include <arrow/api.h>
111+
#include <arrow/gpu/cuda_api.h>
112+
113+
using namespace arrow;
114+
115+
gpu::CudaDeviceManager* manager;
116+
std::shared_ptr<gpu::CudaContext> context;
117+
118+
gpu::CudaDeviceManager::GetInstance(&manager)
119+
manager_->GetContext(kGpuNumber, &context);
120+
121+
std::shared_ptr<RecordBatch> batch = GetCpuData();
122+
123+
std::shared_ptr<gpu::CudaBuffer> device_serialized;
124+
gpu::SerializeRecordBatch(*batch, context_.get(), &device_serialized));
125+
```
126+
127+
We can then "read" the GPU record batch, but the returned `arrow::RecordBatch`
128+
internally will contain GPU device pointers that you can use for CUDA kernel
129+
calls:
130+
131+
```
132+
std::shared_ptr<RecordBatch> device_batch;
133+
gpu::ReadRecordBatch(batch->schema(), device_serialized,
134+
default_memory_pool(), &device_batch));
135+
136+
// Now run some CUDA kernels on device_batch
137+
```
138+
139+
## Decimal Integration Tests
140+
141+
[Phillip Cloud][9] has been working on decimal support in C++ to enable Parquet
142+
read/write support in C++ and Python, and also end-to-end testing against the
143+
Arrow Java libraries.
144+
145+
In the upcoming releases, we hope to complete the remaining data types that
146+
need end-to-end testing between Java and C++:
147+
148+
* Fixed size lists (variable-size lists already implemented)
149+
* Fixes size binary
150+
* Unions
151+
* Maps
152+
* Time intervals
153+
154+
## Other Notable Python Changes
155+
156+
Some highlights of Python development outside of bug fixes and general API
157+
improvements include:
158+
159+
* Simplified `put` and `get` arbitrary Python objects in Plasma objects
160+
* Object serialization functions: LINK TO DOCS
161+
162+
* New `flavor='spark'` option to `pyarrow.parquet.write_table` to enable easy
163+
writing of Parquet files maximized for Spark compatibility
164+
165+
* `parquet.write_to_dataset` function with support for partitioning
166+
* Improved support for Dask filesystems
167+
* Improved usability for IPC (schema, record batch read/write)
168+
169+
## The Road Ahead
170+
171+
Upcoming Arrow releases will continue to expand the project to cover more use
172+
cases. In addition to completing end-to-end testing for all the major data
173+
types, some of us will be shifting attention to building Arrow-native in-memory
174+
analytics libraries.
175+
176+
We are looking for more JavaScript, R, and other programming language
177+
developers to join the project and expand the available implementations and
178+
bindings to more languages.
179+
180+
[1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.7.0
181+
[2]: http://arrow.apache.org/install
182+
[3]: http://arrow.apache.org/release/0.7.0.html
183+
[4]: https://github.com/kou
184+
[5]: https://github.com/trxcllnt
185+
[6]: http://reactivex.io
186+
[7]: https://github.com/netflix/falcor
187+
[8]: http://gpuopenanalytics.com/
188+
[9]: http://github.com/cpcloud

site/_release/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ limitations under the License.
2626

2727
Navigate to the release page for downloads and the changelog.
2828

29+
* [0.7.0 (17 September 2017)][8]
2930
* [0.6.0 (14 August 2017)][7]
3031
* [0.5.0 (23 July 2017)][6]
3132
* [0.4.1 (9 June 2017)][5]
@@ -41,3 +42,4 @@ Navigate to the release page for downloads and the changelog.
4142
[5]: {{ site.baseurl }}/release/0.4.1.html
4243
[6]: {{ site.baseurl }}/release/0.5.0.html
4344
[7]: {{ site.baseurl }}/release/0.6.0.html
45+
[8]: {{ site.baseurl }}/release/0.7.0.html

site/index.html

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,26 +7,37 @@ <h1>Apache Arrow</h1>
77
<p class="lead">Powering Columnar In-Memory Analytics</p>
88
<p>
99
<a class="btn btn-lg btn-success" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
10-
<a class="btn btn-lg btn-primary" href="{{ site.baseurl }}/install/" role="button">Install (0.6.0 Release - August 14, 2017)</a>
10+
<a class="btn btn-lg btn-primary" href="{{ site.baseurl }}/install/" role="button">Install (0.7.0 Release - September 17, 2017)</a>
1111
</p>
1212
</div>
13-
<h4><strong>Latest News</strong>: <a href="{{ site.baseurl }}/blog/">Apache Arrow 0.6.0 release</a></h4>
13+
<h4><strong>Latest News</strong>: <a href="{{ site.baseurl }}/blog/">Apache Arrow 0.7.0 release</a></h4>
1414
<div class="row">
1515
<div class="col-lg-4">
1616
<h2>Fast</h2>
17-
<p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIM
18-
D (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format
19-
as possible.</p>
17+
<p>Apache Arrow&#8482; enables execution engines to take advantage of
18+
the latest SIMD (Single input multiple data) operations included in modern
19+
processors, for native vectorized optimization of analytical data
20+
processing. Columnar layout is optimized for data locality for better
21+
performance on modern hardware like CPUs and GPUs.</p>
22+
2023
<p>The Arrow memory format supports <strong>zero-copy reads</strong>
2124
for lightning-fast data access without serialization overhead.</p>
25+
2226
</div>
2327
<div class="col-lg-4">
2428
<h2>Flexible</h2>
25-
<p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. Java, C, C++, Python, Ruby, and JavaScript implementations are in progress and more languages are welcome.</p>
29+
<p>Arrow acts as a new high-performance interface between various
30+
systems. It is also focused on supporting a wide variety of
31+
industry-standard programming languages. Java, C, C++, Python, Ruby,
32+
and JavaScript implementations are in progress and more languages are
33+
welcome.</p>
2634
</div>
2735
<div class="col-lg-4">
2836
<h2>Standard</h2>
29-
<p>Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.</p>
37+
<p>Apache Arrow is backed by key developers of 13 major open source
38+
projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis,
39+
Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it
40+
the de-facto standard for columnar in-memory analytics.</p>
3041
</div>
3142
</div> <!-- close "row" div -->
3243

@@ -41,7 +52,7 @@ <h2>Advantages of a Common Data Layer</h2>
4152
<img src="img/copy2.png" alt="common data layer" style="width:100%" />
4253
<ul>
4354
<li>Each system has its own internal memory format</li>
44-
<li>70-80% CPU wasted on serialization and deserialization</li>
55+
<li>70-80% computation wasted on serialization and deserialization</li>
4556
<li>Similar functionality implemented in multiple projects</li>
4657
</ul>
4758
</div>

site/install.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -20,17 +20,17 @@ limitations under the License.
2020
{% endcomment %}
2121
-->
2222

23-
## Current Version: 0.6.0
23+
## Current Version: 0.7.0
2424

25-
### Released: 14 August 2017
25+
### Released: 17 September 2017
2626

2727
See the [release notes][10] for more about what's new.
2828

2929
### Source release
3030

31-
* **Source Release**: [apache-arrow-0.6.0.tar.gz][6]
32-
* **Verification**: [md5][3], [asc][7]
33-
* [Git tag b173334][2]
31+
* **Source Release**: [apache-arrow-0.7.0.tar.gz][6]
32+
* **Verification**: [sha512][3], [asc][7]
33+
* [Git tag 97f9029][2]
3434

3535
### Java Packages
3636

@@ -52,19 +52,19 @@ Install them with:
5252

5353

5454
```shell
55-
conda install arrow-cpp=0.6.* -c conda-forge
56-
conda install pyarrow==0.6.* -c conda-forge
55+
conda install arrow-cpp=0.7.* -c conda-forge
56+
conda install pyarrow==0.7.* -c conda-forge
5757
```
5858

5959
### Python Wheels on PyPI (Unofficial)
6060

6161
We have provided binary wheels on PyPI for Linux, macOS, and Windows:
6262

6363
```shell
64-
pip install pyarrow==0.6.*
64+
pip install pyarrow==0.7.*
6565
```
6666

67-
We recommend pinning `0.6.*` in `requirements.txt` to install the latest patch
67+
We recommend pinning `0.7.*` in `requirements.txt` to install the latest patch
6868
release.
6969

7070
These include the Apache Arrow and Apache Parquet C++ binary libraries bundled
@@ -149,13 +149,13 @@ conda install arrow-cpp -c twosigma
149149
conda install pyarrow -c twosigma
150150
```
151151

152-
[1]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.6.0/
153-
[2]: https://github.com/apache/arrow/releases/tag/apache-arrow-0.6.0
154-
[3]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.6.0/apache-arrow-0.6.0.tar.gz.md5
155-
[4]: http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.arrow%22%20AND%20v%3A%220.6.0%22
152+
[1]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.7.0/
153+
[2]: https://github.com/apache/arrow/releases/tag/apache-arrow-0.7.0
154+
[3]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.7.0/apache-arrow-0.7.0.tar.gz.sha512
155+
[4]: http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.arrow%22%20AND%20v%3A%220.7.0%22
156156
[5]: http://conda-forge.github.io
157-
[6]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.6.0/apache-arrow-0.6.0.tar.gz
158-
[7]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.6.0/apache-arrow-0.6.0.tar.gz.asc
157+
[6]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.7.0/apache-arrow-0.7.0.tar.gz
158+
[7]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.7.0/apache-arrow-0.7.0.tar.gz.asc
159159
[8]: https://github.com/red-data-tools/parquet-glib
160160
[9]: https://github.com/red-data-tools/arrow-packages
161-
[10]: http://arrow.apache.org/release/0.6.0.html
161+
[10]: http://arrow.apache.org/release/0.7.0.html

0 commit comments

Comments
 (0)