[improvement](be) Optimize count on nullable column#64166
Conversation
Count aggregation without GROUP BY reaches AggFnEvaluator::execute_single_add(), which calls add_batch_single_place(). AggregateFunctionCount and AggregateFunctionCountNotNullUnary previously inherited the row-by-row helper there, so count(*) and count(nullable_expr) paid per-row add/is_null_at costs even when all rows were aggregated into one state.
This patch adds batch implementations: count(*) increments the state once by batch_size, while unary count(nullable_expr) checks the nullable null map once and fast-paths the no-NULL case to count += batch_size. When NULLs exist it uses simd::count_zero_num() over the null map to count non-NULL rows. The nullable class name is kept because SQL count(expr) counts non-NULL values, not NULL values.
Performance validation used release BE binaries before/after the patch on the same temporary single FE/BE cluster, with operator_test-like session settings, 1e9-row numbers() inputs, warmups excluded, and EXPLAIN confirming partial_count(nullable(...)) plans.
SQL cases and median results:
1. select count(nullable(number)) from numbers("number"="1000000000"); before 645 ms, after 555 ms, -14.0%.
2. select count(nullable(if(number >= 0, null, number))) from numbers("number"="1000000000"); before 1541 ms, after 1448 ms, -6.0%.
3. select count(nullable(if(number % 2 = 0, number, null))) from numbers("number"="1000000000"); before 4256 ms, after 4192 ms, -1.5%.
A second after-binary retest produced medians of 548 / 1447 / 4166 ms for the same three cases, matching the measured improvement shape: largest gain for no-NULL nullable input, modest gain for all-NULL, and near-neutral for partial NULL where expression/null-map work dominates.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
There was a problem hiding this comment.
Review result: no blocking issues found.
Critical checkpoint conclusions:
- Goal and proof: The PR optimizes no-GROUP-BY count aggregation by overriding add_batch_single_place for count(*) and unary count(nullable_expr). The code accomplishes that goal for the single-place path; added BE unit coverage checks nullable inputs with and without NULLs.
- Scope and clarity: The change is small and focused on the aggregate count implementation plus targeted tests.
- Concurrency and lifecycle: No new shared state, threads, locks, static initialization, or non-obvious lifecycle management are introduced.
- Configuration and compatibility: No new config, protocol, storage format, or persisted metadata changes.
- Parallel paths: Grouped add_batch, selected add, streaming serialization, merge, and window paths remain unchanged. The optimized path is specifically AggFnEvaluator::execute_single_add, matching the PR intent.
- Conditional checks: The nullable fast path uses the existing ColumnNullable null-map API; batch_size is bounded by the null map with DCHECK and is consistent with execute_single_add passing the current block row count.
- Testing: Added tests cover ordinary count, nullable all-non-null, and nullable mixed-null cases through the aggregate test harness. An all-null explicit case would be additional coverage but is not required for correctness of the reviewed change.
- Observability: No new observability is needed for this local CPU optimization.
- Transactions/persistence/data writes: Not applicable.
- FE/BE variables: Not applicable.
- Performance: The implementation removes per-row virtual/add/is_null_at overhead in the intended hot path and uses existing SIMD null-map counting; no obvious new hot-path anti-pattern found.
User focus points: No additional user-provided review focus was present.
Verification: I attempted ./run-be-ut.sh --run --filter=AggregateFunctionCountTest.*, but the runner environment failed before compiling BE UT because thirdparty/installed/bin/protoc is missing during gensrc generation. No code-level test failure was observed.
TPC-H: Total hot run time: 29148 ms |
TPC-DS: Total hot run time: 168921 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
Count aggregation without GROUP BY reaches
AggFnEvaluator::execute_single_add(), which calls
add_batch_single_place(). AggregateFunctionCount and
AggregateFunctionCountNotNullUnary previously inherited the row-by-row
helper there, so count(*) and count(nullable_expr) paid per-row
add/is_null_at costs even when all rows were aggregated into one state.
This patch adds batch implementations: count(*) increments the state
once by batch_size, while unary count(nullable_expr) checks the nullable
null map once and fast-paths the no-NULL case to count += batch_size.
When NULLs exist it uses simd::count_zero_num() over the null map to
count non-NULL rows. The nullable class name is kept because SQL
count(expr) counts non-NULL values, not NULL values.
Performance:
test with sql
```sql
select count(nullable(number)) from numbers("number"="1000000000");
select count(nullable(if(number >= 0, null, number))) from numbers("number"="1000000000");
select count(nullable(if(number % 2 = 0, number, null))) from numbers("number"="1000000000");
```
get result
```
Scenario before median / mean after median / mean median diff
━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━
non NULL 645 / 648.6 ms 555 / 556.4 ms -14.0%
─────────── ────────────────────── ───────────────────── ─────────────
all NULL 1541 / 1539.6 ms 1448 / 1450.6 ms -6.0%
─────────── ────────────────────── ───────────────────── ─────────────
half NULL 4256 / 4261.2 ms 4192 / 4232.2 ms -1.5%
```
Count aggregation without GROUP BY reaches AggFnEvaluator::execute_single_add(), which calls add_batch_single_place(). AggregateFunctionCount and AggregateFunctionCountNotNullUnary previously inherited the row-by-row helper there, so count(*) and count(nullable_expr) paid per-row add/is_null_at costs even when all rows were aggregated into one state.
This patch adds batch implementations: count(*) increments the state once by batch_size, while unary count(nullable_expr) checks the nullable null map once and fast-paths the no-NULL case to count += batch_size. When NULLs exist it uses simd::count_zero_num() over the null map to count non-NULL rows. The nullable class name is kept because SQL count(expr) counts non-NULL values, not NULL values.
Performance:
test with sql
get result