GH-50043: [C++][Python] Fix hash_any/hash_all on sliced boolean arrays#50094
Conversation
|
|
|
I wonder if we have the same bug elsewhere in aggregate kernels (e.g. for different types). It's out of scope here, but do you have any input @fenfeng9 ? |
3f7ceae to
22dd4b3
Compare
I don't see the same kind of offset issue in nearby aggregate kernels for other data types. I'll take a closer look separately, and if I find another case, I'll file a new issue. |
Perfect, thank you! This looks good to me now. Will merge if green. |
Thank you for your quick and patient review. |
|
Thanks again @fenfeng9! |
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 95cb5d0. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 25 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
hash_anyandhash_allcould return incorrect results for sliced nullable boolean arrays.The validity bitmap used the slice offset, but the boolean values bitmap did not.
What changes are included in this PR?
Apply the slice offset when reading boolean values in
hash_any/hash_all.Add C++ and Python regression tests.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.