Skip to content

fix: move from atomic.(Add|Load|Store) to atomic.Int64{}#326

Merged
zeroshade merged 4 commits intoapache:mainfrom
sahib:fix/use-atomic-int64
Apr 4, 2025
Merged

fix: move from atomic.(Add|Load|Store) to atomic.Int64{}#326
zeroshade merged 4 commits intoapache:mainfrom
sahib:fix/use-atomic-int64

Conversation

@sahib
Copy link
Copy Markdown
Contributor

@sahib sahib commented Mar 22, 2025

Rationale for this change

This is a follow-up to #323.

ARM 32-bit requires atomic operations to be 32-bit aligned. This has not been the case in a rather large amount of cases in the code base. Failing to align causes crashes during runtime.

What changes are included in this PR?

I replaced all uses of atomic.LoadInt64, atomic.StoreInt64 and atomic.AddInt64 with atomic.Int64{} and its methods. This type has built-in alignment, so it does not matter in which order struct fields appear, which makes it generally harder to screw up alignment during changes.

Are these changes tested?

Briefly on the 32-bit architecture with the same program mentioned in #323. The best way to move forward here would be to also add a 32bit CI environment to make sure arrow builds and tests there. But I guess this is out of scope for this PR.

I've noticed that go vet ./... produces many warnings over the project, most of them already there before this PR. The new ones are all of the kind:

arrow/array/dictionary.go:614:42: literal copies lock value from bldr: github.com/apache/arrow-go/v18/arrow/array.dictionaryBuilder contains github.com/apache/arrow-go/v18/arrow/array.builder contains sync/atomic.Int64 contains sync/atomic.noCopy

I.e. false positives due to initialization in another struct and then copying it in a constructor function.

Are there any user-facing changes?

In the hope that everything was done right: no.

@sahib sahib requested a review from zeroshade as a code owner March 22, 2025 15:09
@zeroshade
Copy link
Copy Markdown
Member

@sahib can you address the linting and other CI issues here?

@sahib
Copy link
Copy Markdown
Contributor Author

sahib commented Mar 29, 2025

@sahib can you address the linting and other CI issues here?

I tried to fix the linting (I think you have to trigger the pipeline explicitly?).

Interestingly, those were not broken due to my changes, but the template was wrong. I still have (at least one) one failing test locally. I think this fails rightfully so, since strcvonv.ParseInt() will not parse stuff like Nan or +Inf. Should it be strconv.ParseFloat() or should this just not work for Ints?

@sahib
Copy link
Copy Markdown
Contributor Author

sahib commented Apr 3, 2025

@zeroshade: I've gone with the more likely option of fixing the test cases.

OT question: How hard would you think would it be to enable the asm optimizations also for arm(32)? I noticed they do help a lot on arm64 and would be great to have them there as well.

@zeroshade
Copy link
Copy Markdown
Member

@sahib I haven't tested generating the asm for ARM32 and I don't have an ARM32 system that I could test it on if I were to generate it.

In the ideal scenario, it should be really easy to generate the assembly via adding the entries to the Makefiles, but I'm sure it won't be that simple.

@sahib
Copy link
Copy Markdown
Contributor Author

sahib commented Apr 3, 2025

@zeroshade I can give it a try once I get a chance. Maybe finally an excuse to learn Go Assembler for me...
From my understanding it should be mostly compatible and it's easy to test with a raspberry pi (+32bit OS).

I guess this PR can be merged?

@sahib
Copy link
Copy Markdown
Contributor Author

sahib commented Apr 4, 2025

@zeroshade More a minor thing I also noticed and might be worth sharing:

Most of the asm functions are selected over function variables during runtime (i.e. during init() different implementation are set based on cpu functions). Overall a nice approach, but it comes with a small performance penalty, as this prohibits inlining (and maybe some more things?).

Considering this mini benchmark:

func BenchmarkTestGreaterThanBitmap(b *testing.B) {
	const N = 10
	levels := make([]int16, N)
	for idx := range levels {
		levels[idx] = int16(idx)
	}

	b.Run("func", func(b *testing.B) {
		for b.Loop() {
			GreaterThanBitmap(levels, int16(N/2))
		}
	})

	b.Run("no-func-go", func(b *testing.B) {
		for b.Loop() {
			greaterThanBitmapGo(levels, int16(N/2))
		}
	})
}
# noasm to make sure that we do not compare against arch specific function
$ go test -tags noasm -bench=. -run=xxx
BenchmarkTestGreaterThanBitmap/func-16         	165481227	        7.289 ns/op
BenchmarkTestGreaterThanBitmap/no-func-go-16   	243919600	        4.611 ns/op

That difference of course is negligible if the function runtime increases. But overall it would be probably possible to squeeze out a few percent of benchmark speed when changing those function values to something along the lines:

func ExtractBits(...) {
    if cpu.X86.HasBMI2 {
        return extractBitsGo(...)
    }

    return extractBitsBMI()
}

I did push a dummy branch here: https://github.com/sahib/arrow-go/tree/bench/build-tag - seems like the whole platform selection gets less convoluted using this approach as well. All in all more minor stuff, but I wanted your opinion on this first.

@zeroshade
Copy link
Copy Markdown
Member

Yup, I'll approve and merge this.

Most of the asm functions are selected over function variables during runtime (i.e. during init() different implementation are set based on cpu functions). Overall a nice approach, but it comes with a small performance penalty, as this prohibits inlining (and maybe some more things?).

If I remember correctly, the Go compiler isn't able to inline assembly functions like this in the first place so we wouldn't be able to inline anyways.

I did push a dummy branch here: https://github.com/sahib/arrow-go/tree/bench/build-tag - seems like the whole platform selection gets less convoluted using this approach as well. All in all more minor stuff, but I wanted your opinion on this first.

I like the idea, as long as the benchmarks in your branch support the idea and don't cause any performance regressions

@zeroshade zeroshade merged commit 2b72399 into apache:main Apr 4, 2025
23 checks passed
zeroshade added a commit that referenced this pull request Apr 30, 2025
### Rationale for this change
#326 moved to using `atomic.Int64{]` for the refcount, as a result we
found many places that we were copying the value. For linting reasons we
should remove those.

### What changes are included in this PR?
Utilizing generics to clean up the dictionary array code significantly,
along with removing most of the cases where we were copying the refcount
which contains a lock.

### Are these changes tested?
Yes, the existing unit tests cover it all.

### Are there any user-facing changes?
Nope.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants