Skip to content

Conversation

@sgilmore10
Copy link
Member

@sgilmore10 sgilmore10 commented Sep 7, 2023

Rationale for this change

Currently, there is no way to easily convert an arrow.array.ChunkedArray into a corresponding MATLAB array, other than (1) manually iterating chunk by chunk, (2) calling toMATLAB on each chunk, and then (3) concatenating all of the converted chunks together into one contiguous MATLAB array.

It would be helpful to add a toMATLAB method to arrow.array.ChunkedArray that abstracts away all of these steps.

What changes are included in this PR?

  1. Added toMATLAB method to arrow.array.ChunkedArray class
  2. Added preallocateMATLABArray abstract method to arrow.type.Type class. This method is used by the ChunkedArray toMATLAB to pre-allocate a MATLAB array of the expected class type and shape. This is necessary to ensure toMATLAB returns the correct MATLAB array when the ChunkedArray has zero chunks. If toMATLAB stored the result of calling toMATLAB on each chunk in a cell array before concatenating the values, toMATLAB would return a 0x0 double array for zero-chunked arrays. The pre-allocation approach avoids this issue.
  3. Implement preallocateMATLABArray on all arrow.type.Type classes.
  4. Added an abstract class arrow.type.NumericType that all classes representing numeric data types inherit from. NumericType implements preallocateMATLABArray for its subclasses.

Are these changes tested?

Yes. Added unit tests to tChunkedArray.m.

Are there any user-facing changes?

Yes. Users can now call toMATLAB on ChunkedArrays.

Example

>> a = arrow.array([1 2 NaN 4 5]);
>> b = arrow.array([6 7 8 9 NaN 11]);
>> c = arrow.array.ChunkedArray.fromArrays(a, b);
>> data = toMATLAB(c)

data =

     1
     2
   NaN
     4
     5
     6
     7
     8
     9
   NaN
    11

Copy link
Member

@kevingurney kevingurney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you!

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Sep 7, 2023
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Sep 7, 2023
@kevingurney
Copy link
Member

+1

@kevingurney kevingurney merged commit 65e2f22 into apache:main Sep 7, 2023
@kevingurney kevingurney deleted the GH-37597 branch September 7, 2023 17:46
@kevingurney kevingurney removed the awaiting merge Awaiting merge label Sep 7, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 65e2f22.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…dArray` class (apache#37613)

### Rationale for this change

Currently, there is no way to easily convert an `arrow.array.ChunkedArray` into a corresponding MATLAB array, other than (1) manually iterating chunk by chunk, (2) calling `toMATLAB` on each chunk, and then (3) concatenating all of the converted chunks together into one contiguous MATLAB array.

It would be helpful to add a toMATLAB method to `arrow.array.ChunkedArray` that abstracts away all of these steps.

### What changes are included in this PR?

1. Added `toMATLAB` method to `arrow.array.ChunkedArray` class
2. Added `preallocateMATLABArray` abstract method to `arrow.type.Type` class. This method is used by the `ChunkedArray` `toMATLAB` to pre-allocate a MATLAB array of the expected class type and shape. This is necessary to ensure `toMATLAB` returns the correct MATLAB array when the `ChunkedArray` has zero chunks. If `toMATLAB` stored the result of calling `toMATLAB` on each chunk in a `cell` array before concatenating the values, `toMATLAB` would return a 0x0 `double` array for zero-chunked arrays. The pre-allocation approach avoids this issue.
3. Implement `preallocateMATLABArray` on all `arrow.type.Type` classes.
4. Added an abstract class `arrow.type.NumericType` that all classes representing numeric data types inherit from. `NumericType` implements `preallocateMATLABArray` for its subclasses.

### Are these changes tested?

Yes. Added unit tests to `tChunkedArray.m`.

### Are there any user-facing changes?

Yes. Users can now call `toMATLAB` on `ChunkedArray`s.

**Example**

```matlab

>> a = arrow.array([1 2 NaN 4 5]);
>> b = arrow.array([6 7 8 9 NaN 11]);
>> c = arrow.array.ChunkedArray.fromArrays(a, b);
>> data = toMATLAB(c)

data =

     1
     2
   NaN
     4
     5
     6
     7
     8
     9
   NaN
    11

```

* Closes: apache#37597

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…dArray` class (apache#37613)

### Rationale for this change

Currently, there is no way to easily convert an `arrow.array.ChunkedArray` into a corresponding MATLAB array, other than (1) manually iterating chunk by chunk, (2) calling `toMATLAB` on each chunk, and then (3) concatenating all of the converted chunks together into one contiguous MATLAB array.

It would be helpful to add a toMATLAB method to `arrow.array.ChunkedArray` that abstracts away all of these steps.

### What changes are included in this PR?

1. Added `toMATLAB` method to `arrow.array.ChunkedArray` class
2. Added `preallocateMATLABArray` abstract method to `arrow.type.Type` class. This method is used by the `ChunkedArray` `toMATLAB` to pre-allocate a MATLAB array of the expected class type and shape. This is necessary to ensure `toMATLAB` returns the correct MATLAB array when the `ChunkedArray` has zero chunks. If `toMATLAB` stored the result of calling `toMATLAB` on each chunk in a `cell` array before concatenating the values, `toMATLAB` would return a 0x0 `double` array for zero-chunked arrays. The pre-allocation approach avoids this issue.
3. Implement `preallocateMATLABArray` on all `arrow.type.Type` classes.
4. Added an abstract class `arrow.type.NumericType` that all classes representing numeric data types inherit from. `NumericType` implements `preallocateMATLABArray` for its subclasses.

### Are these changes tested?

Yes. Added unit tests to `tChunkedArray.m`.

### Are there any user-facing changes?

Yes. Users can now call `toMATLAB` on `ChunkedArray`s.

**Example**

```matlab

>> a = arrow.array([1 2 NaN 4 5]);
>> b = arrow.array([6 7 8 9 NaN 11]);
>> c = arrow.array.ChunkedArray.fromArrays(a, b);
>> data = toMATLAB(c)

data =

     1
     2
   NaN
     4
     5
     6
     7
     8
     9
   NaN
    11

```

* Closes: apache#37597

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MATLAB] Add toMATLAB method to arrow.array.ChunkedArray class

2 participants