Skip to content

Storages: Enhance storage logging details#10951

Open
JaySon-Huang wants to merge 4 commits into
pingcap:masterfrom
JaySon-Huang:enhance_storage_details
Open

Storages: Enhance storage logging details#10951
JaySon-Huang wants to merge 4 commits into
pingcap:masterfrom
JaySon-Huang:enhance_storage_details

Conversation

@JaySon-Huang

@JaySon-Huang JaySon-Huang commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #10950

Problem Summary:

When diagnosing TiFlash storage and query issues in production, existing logs and execution summaries lack enough context to distinguish scan types and understand slow storage operations:

  • TableScan execution summary JSON does not indicate whether the scan is a regular TableScan or PartitionTableScan.
  • Split-based ingest logs are mostly at DEBUG level and do not include elapsed time, making slow ingest hard to spot.
  • Segment split/merge-delta finish logs do not include the trigger reason.
  • Segment info strings miss stable column count and on-disk DMFile size.
  • FileCache download success logs do not distinguish foreground vs background downloads.
  • gtest_kvstore_fast_add_peer is flaky in CI and FAP is not planned for short-term support.

What is changed and how it works?

Add observability improvements for TableScan statistics and storage layer logging.

- Expose `is_partition_table_scan` in TableScan execution summary JSON.
- Add elapsed-time logging for split ingest; promote slow paths (>10s) from DEBUG to INFO.
- Log split/merge-delta reason in segment operation finish logs.
- Enrich Segment::info() with stable column count and DMFile on-disk bytes.
- Log FileCache download type (Foreground/Background) on successful download.
- Disable flaky gtest_kvstore_fast_add_peer test.

TableScan statistics

  • Cache is_partition_table_scan in TableScanStatistics constructor via executor->has_partition_table_scan().
  • Output "is_partition_table_scan": true/false in appendExtraJson, so execution summary / slow query analysis can tell partition scans apart.

DeltaMerge storage logging

  • Introduce SplitIngestLogContext in ingestDTFilesUsingSplit:
    • Track elapsed time across split ingest phases.
    • Use LOG_IMPL to emit begin/finish logs at INFO when elapsed > 10s, otherwise DEBUG.
    • Add elapsed_seconds to per-file ingest attempt logs (INFO).
  • Log reason in segmentSplit and segmentMergeDelta finish logs via magic_enum::enum_name(reason).
  • Extend Segment::info() with stable_cols and dmf_disk_bytes.
  • Add DMFile::getNumColumns() and StableValueSpace::getDMFilesNumColumns().

S3 FileCache

  • Add FileCache::DownloadType (Foreground / Background).
  • Include download type in downloadImpl success log for easier disagg read path analysis.

Tests

  • Wrap gtest_kvstore_fast_add_peer.cpp with #if 0 and document why (flaky CI, FAP not planned for short-term support).

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • New Features
    • Enhanced table-scan status details with partition-scan indicator and added stable scan summary fields.
    • Added column-count reporting to segment/stable summaries.
    • S3 file-download logging now distinguishes foreground vs background transfers.
  • Bug Fixes
    • Improved split-ingest and completion logging with elapsed time and “reason” context for easier troubleshooting.
    • Disabled a flaky KV store test from building/running.
  • Refactor
    • Clarified the split-vs-column-file ingest branching for improved readability.

Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
@ti-chi-bot ti-chi-bot Bot added the release-note-none Denotes a PR that doesn't merit a release note. label Jul 2, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fzhedu for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 2, 2026
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1e4b17ca-f1e0-49aa-a981-8c35a872b0ca

📥 Commits

Reviewing files that changed from the base of the PR and between f6104ca and 627efec.

📒 Files selected for processing (5)
  • tests/docker/next-gen-columnar-yaml/cluster.yaml
  • tests/docker/next-gen-utils/Makefile
  • tests/docker/next-gen-yaml/cluster.yaml
  • tests/fullstack-test-next-gen-columnar/_env.sh
  • tests/fullstack-test-next-gen/_env.sh
✅ Files skipped from review due to trivial changes (1)
  • tests/docker/next-gen-columnar-yaml/cluster.yaml

📝 Walkthrough

Walkthrough

This PR adds partition-table-scan metadata and richer DeltaMerge/FileCache logging, disables one flaky KVStore test, and renames several next-gen test image and branch defaults from -next-gen to -nextgen.

Changes

Logging and diagnostics enhancements

Layer / File(s) Summary
Partition table scan flag in statistics
dbms/src/Flash/Statistics/TableScanImpl.h, dbms/src/Flash/Statistics/TableScanImpl.cpp
Adds is_partition_table_scan to table-scan statistics, initializes it from executor metadata, and includes it in the JSON payload.
Timing-aware split ingest logging
dbms/src/Storages/DeltaMerge/DeltaMergeStore_Ingest.cpp
Adds a stopwatch-backed log context for split ingest, emits elapsed seconds in begin/per-file/finish logs, and rewrites the split-replace branch calls.
Segment split and merge reasons
dbms/src/Storages/DeltaMerge/DeltaMergeStore_InternalSegment.cpp
Adds reason to segment split and merge-delta completion logs and keeps nearby comments updated.
Stable file column counts in segment info
dbms/src/Storages/DeltaMerge/File/DMFile.h, dbms/src/Storages/DeltaMerge/StableValueSpace.{h,cpp}, dbms/src/Storages/DeltaMerge/Segment.cpp
Adds column-count accessors and extends Segment::info() with stable column and byte fields.
FileCache download type tracking
dbms/src/Storages/S3/FileCache.{h,cpp}
Adds a download-type enum, threads it through downloadImpl, logs it on success, and updates foreground/background call sites.
Disable flaky KVStore fast add peer test
dbms/src/Storages/KVStore/tests/gtest_kvstore_fast_add_peer.cpp
Disables the test file from compilation with a #if 0 guard.

Next-gen test image and branch tag updates

Layer / File(s) Summary
Docker image tag defaults
tests/docker/next-gen-utils/Makefile, tests/docker/next-gen-yaml/cluster.yaml, tests/docker/next-gen-columnar-yaml/cluster.yaml
Updates PD, TiKV, TiDB, TiFlash, and worker image tags from -next-gen to -nextgen.
Fullstack branch defaults
tests/fullstack-test-next-gen/_env.sh, tests/fullstack-test-next-gen-columnar/_env.sh
Updates exported branch defaults and normalization rules to use *-nextgen and cloud-engine-nextgen names.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Possibly related PRs

  • pingcap/tiflash#10905: Also changes TableScanStatistics::appendExtraJson in dbms/src/Flash/Statistics/TableScanImpl.cpp.
  • Suggested reviewers: yongman, CalvinNeo

Poem

A rabbit hopped through logs and tags,
With branch names new and fewer flags.
It sniffed the scans, then bounded high,
While split ingest ticked by with a sigh.
Hop, hop — the build burrow shines tonight 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes many unrelated changes beyond #10950, such as split-ingest timing logs, FileCache logging, and test/image tag updates. Split unrelated storage logging, test, and Docker image updates into separate PRs, leaving only the partition-table-scan logging change here.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title is related to the storage/logging changes, though it is broader than the main partition-scan observability update.
Description check ✅ Passed The PR description matches the template and includes the issue, problem summary, change summary, checklist, and release note.
Linked Issues check ✅ Passed The TableScan JSON adds is_partition_table_scan, which addresses #10950 by distinguishing partition table scans in logging output.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Signed-off-by: JaySon-Huang <tshent@qq.com>
@ti-chi-bot

ti-chi-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

@JaySon-Huang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-sanitizer-tsan 627efec link false /test pull-sanitizer-tsan
pull-integration-next-gen-columnar 627efec link true /test pull-integration-next-gen-columnar
pull-integration-test 627efec link true /test pull-integration-test
pull-unit-test 627efec link true /test pull-unit-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance the logging on partition table scan

1 participant