fix: move from atomic.(Add|Load|Store) to atomic.Int64{}#326
fix: move from atomic.(Add|Load|Store) to atomic.Int64{}#326zeroshade merged 4 commits intoapache:mainfrom
Conversation
|
@sahib can you address the linting and other CI issues here? |
I tried to fix the linting (I think you have to trigger the pipeline explicitly?). Interestingly, those were not broken due to my changes, but the template was wrong. I still have (at least one) one failing test locally. I think this fails rightfully so, since |
|
@zeroshade: I've gone with the more likely option of fixing the test cases. OT question: How hard would you think would it be to enable the asm optimizations also for arm(32)? I noticed they do help a lot on arm64 and would be great to have them there as well. |
|
@sahib I haven't tested generating the asm for ARM32 and I don't have an ARM32 system that I could test it on if I were to generate it. In the ideal scenario, it should be really easy to generate the assembly via adding the entries to the Makefiles, but I'm sure it won't be that simple. |
|
@zeroshade I can give it a try once I get a chance. Maybe finally an excuse to learn Go Assembler for me... I guess this PR can be merged? |
|
@zeroshade More a minor thing I also noticed and might be worth sharing: Most of the asm functions are selected over function variables during runtime (i.e. during Considering this mini benchmark: func BenchmarkTestGreaterThanBitmap(b *testing.B) {
const N = 10
levels := make([]int16, N)
for idx := range levels {
levels[idx] = int16(idx)
}
b.Run("func", func(b *testing.B) {
for b.Loop() {
GreaterThanBitmap(levels, int16(N/2))
}
})
b.Run("no-func-go", func(b *testing.B) {
for b.Loop() {
greaterThanBitmapGo(levels, int16(N/2))
}
})
}# noasm to make sure that we do not compare against arch specific function
$ go test -tags noasm -bench=. -run=xxx
BenchmarkTestGreaterThanBitmap/func-16 165481227 7.289 ns/op
BenchmarkTestGreaterThanBitmap/no-func-go-16 243919600 4.611 ns/opThat difference of course is negligible if the function runtime increases. But overall it would be probably possible to squeeze out a few percent of benchmark speed when changing those function values to something along the lines: func ExtractBits(...) {
if cpu.X86.HasBMI2 {
return extractBitsGo(...)
}
return extractBitsBMI()
}I did push a dummy branch here: https://github.com/sahib/arrow-go/tree/bench/build-tag - seems like the whole platform selection gets less convoluted using this approach as well. All in all more minor stuff, but I wanted your opinion on this first. |
|
Yup, I'll approve and merge this.
If I remember correctly, the Go compiler isn't able to inline assembly functions like this in the first place so we wouldn't be able to inline anyways.
I like the idea, as long as the benchmarks in your branch support the idea and don't cause any performance regressions |
### Rationale for this change #326 moved to using `atomic.Int64{]` for the refcount, as a result we found many places that we were copying the value. For linting reasons we should remove those. ### What changes are included in this PR? Utilizing generics to clean up the dictionary array code significantly, along with removing most of the cases where we were copying the refcount which contains a lock. ### Are these changes tested? Yes, the existing unit tests cover it all. ### Are there any user-facing changes? Nope.
Rationale for this change
This is a follow-up to #323.
ARM 32-bit requires atomic operations to be 32-bit aligned. This has not been the case in a rather large amount of cases in the code base. Failing to align causes crashes during runtime.
What changes are included in this PR?
I replaced all uses of
atomic.LoadInt64,atomic.StoreInt64andatomic.AddInt64withatomic.Int64{}and its methods. This type has built-in alignment, so it does not matter in which order struct fields appear, which makes it generally harder to screw up alignment during changes.Are these changes tested?
Briefly on the 32-bit architecture with the same program mentioned in #323. The best way to move forward here would be to also add a 32bit CI environment to make sure
arrowbuilds and tests there. But I guess this is out of scope for this PR.I've noticed that
go vet ./...produces many warnings over the project, most of them already there before this PR. The new ones are all of the kind:I.e. false positives due to initialization in another struct and then copying it in a constructor function.
Are there any user-facing changes?
In the hope that everything was done right: no.