HBASE-30154: Fix hlink missing backreference causing dataloss#8233
Open
shrinidhi-arista wants to merge 2 commits into
Open
HBASE-30154: Fix hlink missing backreference causing dataloss#8233shrinidhi-arista wants to merge 2 commits into
shrinidhi-arista wants to merge 2 commits into
Conversation
…ing merged-region snapshot - Add TestCloneSnapshotFromClientAfterMergingRegion: integration test that reproduces the data-loss bug where HFileLinkCleaner deletes archived pre-merge HFiles because RestoreSnapshotHelper.restoreReferenceFile() creates HFileLink Reference files in the clone without writing back-references to the archive directory. - Add TestHFileLinkCleaner.testHFileLinkReferenceFileProtectsArchivedHFile: unit test verifying that HFileLinkCleaner treats an HFileLink Reference file as a live forward link and does not delete the archived HFile it points to.
…renced by clone When a snapshot is taken while a merged region still holds reference files, and that snapshot is then cloned, RestoreSnapshotHelper.restoreReferenceFile() creates HFileLink Reference files in the clone (e.g. srcTable=srcRegion-hfile.cloneRegion) but does not write back-references to the archive directory. As a result: 1. CatalogJanitor GCs the pre-merge regions, archiving their HFiles. 2. HFileLinkCleaner sees no back-references for those archived HFiles and deletes them. 3. Every subsequent region open on the cloned table fails with FileNotFoundException. Fix with two complementary changes: - RestoreSnapshotHelper: write a back-reference to the archive when creating an HFileLink Reference file, so HFileLinkCleaner knows the archived HFile is still live. - HFileLinkCleaner: when evaluating a back-reference, also check for HFileLink Reference files (glob <hfileLinkName>.*) in the link directory — these are created by restoreReferenceFile() and protect the archived HFile even when no zero-byte HFileLink file exists alongside them.
Apache9
reviewed
Jun 6, 2026
| // Also protect HFileLink Reference files created by | ||
| // RestoreSnapshotHelper.restoreReferenceFile(). These are named | ||
| // <hfileLinkName>.<encodedRegion> and live in the same directory as the | ||
| // zero-byte HFileLink. The zero-byte file does not exist (only the Reference |
Contributor
There was a problem hiding this comment.
'The zero-byte file does not exist', mind explaining more about this? I do not fully understand...
There was a problem hiding this comment.
Pull request overview
This PR fixes a data-loss scenario where archived HFiles referenced by HFileLink Reference files (created during snapshot clone/restore of split/merge references) could be incorrectly deleted because the expected back-reference/forward-link signals were missing or not recognized by the cleaner.
Changes:
RestoreSnapshotHelpernow writes an archive back-reference when creating an HFileLink-based Reference file so archived parent HFiles remain protected.HFileLinkCleanernow also treats.<suffix>HFileLink Reference files as valid forward references when evaluating back-references.- Adds regression tests covering both the cleaner behavior and a full clone-snapshot-after-merge workflow.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java | Writes an archive back-reference when restoring/cloning reference files so archived HFiles remain “live”. |
| hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileLinkCleaner.java | Extends back-reference evaluation to also consider HFileLink Reference files (<link>.*) as forward links. |
| hbase-server/src/test/java/org/apache/hadoop/hbase/master/cleaner/TestHFileLinkCleaner.java | Adds a unit test simulating Reference-only clones; adjusts link naming to match actual HFileLink file names. |
| hbase-server/src/test/java/org/apache/hadoop/hbase/client/CloneSnapshotFromClientAfterMergingRegionTestBase.java | Adds an end-to-end regression test reproducing the merge + snapshot clone + GC + cleaner failure mode. |
| hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestCloneSnapshotFromClientAfterMergingRegion.java | Parameterized test harness that starts a mini cluster with HFileLinkCleaner enabled. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
41
to
47
| import org.apache.hadoop.hbase.TableName; | ||
| import org.apache.hadoop.hbase.backup.HFileArchiver; | ||
| import org.apache.hadoop.hbase.client.ColumnFamilyDescriptor; | ||
| import org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder; | ||
| import org.apache.hadoop.hbase.util.HFileArchiveUtil; | ||
| import org.apache.hadoop.hbase.client.Connection; | ||
| import org.apache.hadoop.hbase.client.ConnectionFactory; |
Comment on lines
24
to
30
| import java.io.IOException; | ||
| import java.util.Collections; | ||
| import org.apache.hadoop.conf.Configuration; | ||
| import org.apache.hadoop.hbase.regionserver.StoreFileInfo; | ||
| import org.apache.hadoop.fs.FileStatus; | ||
| import org.apache.hadoop.fs.FileSystem; | ||
| import org.apache.hadoop.fs.Path; |
Comment on lines
+259
to
+262
| cleaner.chore(); | ||
| assertFalse(fs.exists(linkBackRef), | ||
| "Back-reference should be removed after HFileLink Reference file is gone"); | ||
| // Archived HFile cleanup (requires TTL to expire) is validated by @After. |
Comment on lines
+28
to
+39
| import org.apache.hadoop.hbase.TableName; | ||
| import org.apache.hadoop.hbase.master.RegionState; | ||
| import org.apache.hadoop.hbase.master.assignment.RegionStates; | ||
| import org.apache.hadoop.hbase.master.cleaner.HFileCleaner; | ||
| import org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner; | ||
| import org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner; | ||
| import org.apache.hadoop.hbase.regionserver.CompactedHFilesDischarger; | ||
| import org.apache.hadoop.hbase.regionserver.HRegion; | ||
| import org.apache.hadoop.hbase.io.HFileLink; | ||
| import org.apache.hadoop.hbase.regionserver.HStore; | ||
| import org.apache.hadoop.hbase.regionserver.HStoreFile; | ||
| import org.apache.hadoop.hbase.regionserver.StoreFileInfo; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix includes -
HFileLink Reference file, so HFileLinkCleaner knows the archived HFile is still live.
Added test TestCloneSnapshotFromClientAfterMergingRegion which fails with FileNotFound
exception while checking if all the hfiles referenced by cloned table are available. Disabling and
enabling the table also causes the test to hang because of master retrying indefinitely. The test
passes with the fix.
files (glob .*) in the link directory — these are created by
restoreReferenceFile() and protect the archived HFile even when no zero-byte HFileLink
file exists alongside them.
Added testHFileLinkReferenceFileProtectsArchivedHFile() which fails while asserting that Hfile
is not cleaned up but passes with the fix.