Skip to content

Commit 788c8f2

Browse files
authored
GH-43186: [Go] Use auto-aligned atomic int64 for pqarrow pathbuilders (#43206)
<!-- Thanks for opening a pull request! If this is your first pull request you can find detailed information on how to contribute here: * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request) * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html) If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project. Then could you also rename the pull request title in the following format? GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY} or MINOR: [${COMPONENT}] ${SUMMARY} In the case of PARQUET issues on JIRA the title also supports: PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY} --> ### Rationale for this change Fixes: #43186 Improves 32-bit support for the pqarrow writer. We may want to push similar changes to other refcount implementations to more completely support running on 32-bit system. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ### What changes are included in this PR? Update `pathBuilder` and `multipathLevelBuilder` refCounts to use `atomic.Int64` which is automatically aligned on 32-bit systems. <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ### Are these changes tested? The issue reproduces with existing tests when building for a 32-bit architecture, so no new tests were added. This PR adds a "test" step to the existing workflow for 386 architecture builds, currently limited to run the tests fixed in this PR. <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ### Are there any user-facing changes? 32-bit systems should no longer panic when writing parquet files. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking. --> <!-- **This PR includes breaking changes to public APIs.** --> <!-- Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious. --> <!-- **This PR contains a "Critical Fix".** --> * GitHub Issue: #43186
1 parent 2ae192b commit 788c8f2

4 files changed

Lines changed: 68 additions & 13 deletions

File tree

.github/workflows/go.yml

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -168,8 +168,8 @@ jobs:
168168
python3 -m pip install benchadapt@git+https://github.com/conbench/conbench.git@main#subdirectory=benchadapt/python
169169
python3 ci/scripts/go_bench_adapt.py
170170
171-
build386:
172-
name: Go Cross-build for 386
171+
build_test_386:
172+
name: Go Cross-build and test for 386
173173
runs-on: ubuntu-latest
174174
if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
175175
timeout-minutes: 20
@@ -188,9 +188,12 @@ jobs:
188188
cache: true
189189
cache-dependency-path: go/go.sum
190190
- name: Run build
191-
run: |
192-
cd go
193-
GOARCH=386 go build ./...
191+
run: GOARCH=386 go build ./...
192+
working-directory: ./go
193+
- name: Run test
194+
# WIP refactor, only tests in the specified dirs have been fixed
195+
run: GOARCH=386 go test ./parquet/file/...
196+
working-directory: ./go
194197

195198
docker_cgo:
196199
name: AMD64 Debian 12 Go ${{ matrix.go }} - CGO

go/internal/utils/ref_count.go

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
// Licensed to the Apache Software Foundation (ASF) under one
2+
// or more contributor license agreements. See the NOTICE file
3+
// distributed with this work for additional information
4+
// regarding copyright ownership. The ASF licenses this file
5+
// to you under the Apache License, Version 2.0 (the
6+
// "License"); you may not use this file except in compliance
7+
// with the License. You may obtain a copy of the License at
8+
//
9+
// http://www.apache.org/licenses/LICENSE-2.0
10+
//
11+
// Unless required by applicable law or agreed to in writing, software
12+
// distributed under the License is distributed on an "AS IS" BASIS,
13+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
// See the License for the specific language governing permissions and
15+
// limitations under the License.
16+
17+
package utils
18+
19+
import "sync/atomic"
20+
21+
// NewRefCount creates a new atomic counter set to the specified initial value.
22+
func NewRefCount(initial int64) *atomic.Int64 {
23+
var val atomic.Int64
24+
val.Store(initial)
25+
return &val
26+
}
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
// Licensed to the Apache Software Foundation (ASF) under one
2+
// or more contributor license agreements. See the NOTICE file
3+
// distributed with this work for additional information
4+
// regarding copyright ownership. The ASF licenses this file
5+
// to you under the Apache License, Version 2.0 (the
6+
// "License"); you may not use this file except in compliance
7+
// with the License. You may obtain a copy of the License at
8+
//
9+
// http://www.apache.org/licenses/LICENSE-2.0
10+
//
11+
// Unless required by applicable law or agreed to in writing, software
12+
// distributed under the License is distributed on an "AS IS" BASIS,
13+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
// See the License for the specific language governing permissions and
15+
// limitations under the License.
16+
17+
//go:build !noasm
18+
// +build !noasm
19+
20+
package bmi
21+
22+
func init() {
23+
funclist.extractBits = extractBitsGo
24+
funclist.gtbitmap = greaterThanBitmapGo
25+
}

go/parquet/pqarrow/path_builder.go

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ import (
2525
"github.com/apache/arrow/go/v17/arrow/array"
2626
"github.com/apache/arrow/go/v17/arrow/memory"
2727
"github.com/apache/arrow/go/v17/internal/bitutils"
28+
"github.com/apache/arrow/go/v17/internal/utils"
2829
"github.com/apache/arrow/go/v17/parquet/internal/encoding"
2930
"golang.org/x/xerrors"
3031
)
@@ -301,15 +302,15 @@ type pathBuilder struct {
301302
paths []pathInfo
302303
nullableInParent bool
303304

304-
refCount int64
305+
refCount *atomic.Int64
305306
}
306307

307308
func (p *pathBuilder) Retain() {
308-
atomic.AddInt64(&p.refCount, 1)
309+
p.refCount.Add(1)
309310
}
310311

311312
func (p *pathBuilder) Release() {
312-
if atomic.AddInt64(&p.refCount, -1) == 0 {
313+
if p.refCount.Add(-1) == 0 {
313314
for idx := range p.paths {
314315
p.paths[idx].primitiveArr.Release()
315316
p.paths[idx].primitiveArr = nil
@@ -498,15 +499,15 @@ type multipathLevelBuilder struct {
498499
data arrow.ArrayData
499500
builder pathBuilder
500501

501-
refCount int64
502+
refCount *atomic.Int64
502503
}
503504

504505
func (m *multipathLevelBuilder) Retain() {
505-
atomic.AddInt64(&m.refCount, 1)
506+
m.refCount.Add(1)
506507
}
507508

508509
func (m *multipathLevelBuilder) Release() {
509-
if atomic.AddInt64(&m.refCount, -1) == 0 {
510+
if m.refCount.Add(-1) == 0 {
510511
m.data.Release()
511512
m.data = nil
512513
m.builder.Release()
@@ -516,10 +517,10 @@ func (m *multipathLevelBuilder) Release() {
516517

517518
func newMultipathLevelBuilder(arr arrow.Array, fieldNullable bool) (*multipathLevelBuilder, error) {
518519
ret := &multipathLevelBuilder{
519-
refCount: 1,
520+
refCount: utils.NewRefCount(1),
520521
rootRange: elemRange{int64(0), int64(arr.Data().Len())},
521522
data: arr.Data(),
522-
builder: pathBuilder{nullableInParent: fieldNullable, paths: make([]pathInfo, 0), refCount: 1},
523+
builder: pathBuilder{nullableInParent: fieldNullable, paths: make([]pathInfo, 0), refCount: utils.NewRefCount(1)},
523524
}
524525
if err := ret.builder.Visit(arr); err != nil {
525526
return nil, err

0 commit comments

Comments
 (0)