Skip to content

HBASE-30154: Fix hlink missing backreference causing dataloss#8233

Open
shrinidhi-arista wants to merge 2 commits into
apache:masterfrom
aristanetworks:HBASE-30154
Open

HBASE-30154: Fix hlink missing backreference causing dataloss#8233
shrinidhi-arista wants to merge 2 commits into
apache:masterfrom
aristanetworks:HBASE-30154

Conversation

@shrinidhi-arista
Copy link
Copy Markdown

Fix includes -

  • RestoreSnapshotHelper: write a back-reference to the archive when creating an
    HFileLink Reference file, so HFileLinkCleaner knows the archived HFile is still live.

Added test TestCloneSnapshotFromClientAfterMergingRegion which fails with FileNotFound
exception while checking if all the hfiles referenced by cloned table are available. Disabling and
enabling the table also causes the test to hang because of master retrying indefinitely. The test
passes with the fix.

  • HFileLinkCleaner: when evaluating a back-reference, also check for HFileLink Reference
    files (glob .*) in the link directory — these are created by
    restoreReferenceFile() and protect the archived HFile even when no zero-byte HFileLink
    file exists alongside them.

Added testHFileLinkReferenceFileProtectsArchivedHFile() which fails while asserting that Hfile
is not cleaned up but passes with the fix.

shrinidhi-arista and others added 2 commits May 13, 2026 15:11
…ing merged-region snapshot

- Add TestCloneSnapshotFromClientAfterMergingRegion: integration test that reproduces the
  data-loss bug where HFileLinkCleaner deletes archived pre-merge HFiles because
  RestoreSnapshotHelper.restoreReferenceFile() creates HFileLink Reference files in the
  clone without writing back-references to the archive directory.

- Add TestHFileLinkCleaner.testHFileLinkReferenceFileProtectsArchivedHFile: unit test
  verifying that HFileLinkCleaner treats an HFileLink Reference file as a live forward
  link and does not delete the archived HFile it points to.
…renced by clone

When a snapshot is taken while a merged region still holds reference files, and that
snapshot is then cloned, RestoreSnapshotHelper.restoreReferenceFile() creates HFileLink
Reference files in the clone (e.g. srcTable=srcRegion-hfile.cloneRegion) but does not
write back-references to the archive directory. As a result:

  1. CatalogJanitor GCs the pre-merge regions, archiving their HFiles.
  2. HFileLinkCleaner sees no back-references for those archived HFiles and deletes them.
  3. Every subsequent region open on the cloned table fails with FileNotFoundException.

Fix with two complementary changes:

- RestoreSnapshotHelper: write a back-reference to the archive when creating an
  HFileLink Reference file, so HFileLinkCleaner knows the archived HFile is still live.

- HFileLinkCleaner: when evaluating a back-reference, also check for HFileLink Reference
  files (glob <hfileLinkName>.*) in the link directory — these are created by
  restoreReferenceFile() and protect the archived HFile even when no zero-byte HFileLink
  file exists alongside them.
// Also protect HFileLink Reference files created by
// RestoreSnapshotHelper.restoreReferenceFile(). These are named
// <hfileLinkName>.<encodedRegion> and live in the same directory as the
// zero-byte HFileLink. The zero-byte file does not exist (only the Reference
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'The zero-byte file does not exist', mind explaining more about this? I do not fully understand...

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a data-loss scenario where archived HFiles referenced by HFileLink Reference files (created during snapshot clone/restore of split/merge references) could be incorrectly deleted because the expected back-reference/forward-link signals were missing or not recognized by the cleaner.

Changes:

  • RestoreSnapshotHelper now writes an archive back-reference when creating an HFileLink-based Reference file so archived parent HFiles remain protected.
  • HFileLinkCleaner now also treats .<suffix> HFileLink Reference files as valid forward references when evaluating back-references.
  • Adds regression tests covering both the cleaner behavior and a full clone-snapshot-after-merge workflow.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/RestoreSnapshotHelper.java Writes an archive back-reference when restoring/cloning reference files so archived HFiles remain “live”.
hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileLinkCleaner.java Extends back-reference evaluation to also consider HFileLink Reference files (<link>.*) as forward links.
hbase-server/src/test/java/org/apache/hadoop/hbase/master/cleaner/TestHFileLinkCleaner.java Adds a unit test simulating Reference-only clones; adjusts link naming to match actual HFileLink file names.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/CloneSnapshotFromClientAfterMergingRegionTestBase.java Adds an end-to-end regression test reproducing the merge + snapshot clone + GC + cleaner failure mode.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestCloneSnapshotFromClientAfterMergingRegion.java Parameterized test harness that starts a mini cluster with HFileLinkCleaner enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 41 to 47
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.backup.HFileArchiver;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptor;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder;
import org.apache.hadoop.hbase.util.HFileArchiveUtil;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
Comment on lines 24 to 30
import java.io.IOException;
import java.util.Collections;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.regionserver.StoreFileInfo;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
Comment on lines +259 to +262
cleaner.chore();
assertFalse(fs.exists(linkBackRef),
"Back-reference should be removed after HFileLink Reference file is gone");
// Archived HFile cleanup (requires TTL to expire) is validated by @After.
Comment on lines +28 to +39
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.master.RegionState;
import org.apache.hadoop.hbase.master.assignment.RegionStates;
import org.apache.hadoop.hbase.master.cleaner.HFileCleaner;
import org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner;
import org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner;
import org.apache.hadoop.hbase.regionserver.CompactedHFilesDischarger;
import org.apache.hadoop.hbase.regionserver.HRegion;
import org.apache.hadoop.hbase.io.HFileLink;
import org.apache.hadoop.hbase.regionserver.HStore;
import org.apache.hadoop.hbase.regionserver.HStoreFile;
import org.apache.hadoop.hbase.regionserver.StoreFileInfo;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants