Skip to content

branch-4.1: [improvement](be) Optimize count on nullable column #64166#64202

Open
github-actions[bot] wants to merge 1 commit into
branch-4.1from
auto-pick-64166-branch-4.1
Open

branch-4.1: [improvement](be) Optimize count on nullable column #64166#64202
github-actions[bot] wants to merge 1 commit into
branch-4.1from
auto-pick-64166-branch-4.1

Conversation

@github-actions

@github-actions github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Cherry-picked from #64166

Count aggregation without GROUP BY reaches
AggFnEvaluator::execute_single_add(), which calls
add_batch_single_place(). AggregateFunctionCount and
AggregateFunctionCountNotNullUnary previously inherited the row-by-row
helper there, so count(*) and count(nullable_expr) paid per-row
add/is_null_at costs even when all rows were aggregated into one state.

This patch adds batch implementations: count(*) increments the state
once by batch_size, while unary count(nullable_expr) checks the nullable
null map once and fast-paths the no-NULL case to count += batch_size.
When NULLs exist it uses simd::count_zero_num() over the null map to
count non-NULL rows. The nullable class name is kept because SQL
count(expr) counts non-NULL values, not NULL values.

Performance:
test with sql
```sql
select count(nullable(number)) from numbers("number"="1000000000");

select count(nullable(if(number >= 0, null, number))) from numbers("number"="1000000000");

select count(nullable(if(number % 2 = 0, number, null))) from numbers("number"="1000000000");
```
get result
```
 Scenario     before median / mean    after median / mean    median diff
━━━━━━━━━━━  ━━━━━━━━━━━━━━━━━━━━━━  ━━━━━━━━━━━━━━━━━━━━━  ━━━━━━━━━━━━━
 non NULL           645 / 648.6 ms         555 / 556.4 ms         -14.0%
───────────  ──────────────────────  ─────────────────────  ─────────────
 all NULL         1541 / 1539.6 ms       1448 / 1450.6 ms          -6.0%
───────────  ──────────────────────  ─────────────────────  ─────────────
 half NULL        4256 / 4261.2 ms       4192 / 4232.2 ms          -1.5%
```
@github-actions github-actions Bot requested a review from yiguolei as a code owner June 8, 2026 03:54
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen

Copy link
Copy Markdown
Contributor

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants