repository: add PackWriter and two-phase chunk index update by mr-raj12 · Pull Request #9723 · borgbackup/borg

mr-raj12 · 2026-06-05T18:43:16Z

Description

PackWriter buffers (chunk_id, cdata) pairs and flushes them as a pack file via borgstore once max_count chunks accumulate. At N=1 (max_count=1), pack_id == chunk_id and pack files land at packs/{chunk_id_hex}. No changes needed to get() or delete(). UNKNOWN_INT32 = 0xFFFFFFFF is the sentinel for pack location fields that are not yet written. 0xFFFFFFFF is above MAX_DATA_SIZE (~20 MB), so it can never collide with a real obj_offset. chunks.add() writes the placeholder; update_pack_info() fills in real values after flush().

flush() clears _pieces in a try/finally. Without this, a store failure would leave the chunk in the buffer and it would get re-bundled with the next chunk, pushing the N>1 code path and writing under a hash-derived key instead of the chunk's own id.

Changes:

repository.py: add PackWriter; put() delegates to
_pack_writer.add() and returns pack results.
constants.py: add UNKNOWN_INT32 = 0xFFFFFFFF.
hashindex.pyx: add() uses UNKNOWN_INT32 placeholders; new
update_pack_info().
hashindex.pyi: type stub for update_pack_info().
cache.py: add_chunk() calls update_pack_info() from pack results.
archive.py: add_reference() in rebuild_archives() does the same.

refs #8572

Checklist

PR is against master
New code has tests and docs where appropriate
Tests pass
Commit messages are clean and reference related issues

codecov · 2026-06-05T18:49:28Z

Codecov Report

❌ Patch coverage is 95.65217% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.78%. Comparing base (a1e8e53) to head (0395cc1).
⚠️ Report is 7 commits behind head on master.

Files with missing lines	Patch %	Lines
src/borg/repository.py	94.59%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #9723      +/-   ##
==========================================
+ Coverage   84.72%   84.78%   +0.05%     
==========================================
  Files          92       92              
  Lines       15007    15047      +40     
  Branches     2243     2250       +7     
==========================================
+ Hits        12715    12757      +42     
+ Misses       1592     1590       -2     
  Partials      700      700

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

ThomasWaldmann

Looks good overall. Some small optimizations could be done.

Later, when introducing a size limit, it will get a bit more complicated, if we want to absolutely obey that limit and not possibly exceed it by a maximum chunksize in the worst case.

Update: thinking about it, we could also just accept that it is no strict limit. Simpler code. E.g. when setting 50MB as limit, it could be also 70MB.

ThomasWaldmann · 2026-06-05T21:03:16Z

range-load using the pack values from index within this PR or in next one?

mr-raj12 · 2026-06-05T21:16:35Z

range-load using the pack values from index within this PR or in next one?

in the next one

ThomasWaldmann · 2026-06-06T06:34:44Z

OK, so please finish this one.

For the next one:

Guess that will be a small and simple PR, just use the obj_offset and obj_size from the index to do a range-load. As that is always 0 and filesize right now, nothing should break.

Idea for the next one after that:

As you now update the index after you can know the sha256 pack_id, add a env var (SHA256_PACK_ID=1 or so) to disable the pack_id == chunk_id hack and use the real sha256 pack_id from the index. Still stay at max_count=1.

Likely, that will show a lot of problems in the existing code, pointing to all the places that now need to use the index to get the pack_id / work based on packs. Make a priority list of what needs fixing.

The CI could get 1 informative but otherwise ignored job that runs the tests with that env var, so we can see how less and less tests fail while you fix more and more stuff.

ThomasWaldmann

LGTM.

Guess the real test will come when actually using the indexed pack-related values.

…gbackup#8572 PackWriter buffers (chunk_id, cdata) pairs and flushes as pack files via borgstore. At N=1 pack_id == chunk_id; UNKNOWN_INT32 (0xFFFFFFFF) placeholders in the index are replaced by real pack location fields after flush() via update_pack_info(). Update test_chunkindex_add to expect UNKNOWN_INT32 sentinels from add().

…rom PackWriter, refs borgbackup#8572 Fix PackWriter.flush() to use max_count == 1 (not len == 1) for the pack_id hack, so final partial packs under max_count > 1 correctly use SHA256. Add covering test. Move sha256 import to module level in repository_test.

mr-raj12 force-pushed the pack-files-step5-packwriter branch from 182e92a to 049fba2 Compare June 5, 2026 20:17

ThomasWaldmann requested changes Jun 5, 2026

View reviewed changes

Comment thread src/borg/hashindex.pyx Outdated

Comment thread src/borg/repository.py Outdated

Comment thread src/borg/repository.py Outdated

ThomasWaldmann reviewed Jun 5, 2026

View reviewed changes

Comment thread src/borg/archive.py Outdated

ThomasWaldmann reviewed Jun 5, 2026

View reviewed changes

Comment thread src/borg/hashindex.pyx Outdated

mr-raj12 force-pushed the pack-files-step5-packwriter branch from 8616791 to f60a3d1 Compare June 7, 2026 22:23

ThomasWaldmann reviewed Jun 7, 2026

View reviewed changes

Comment thread src/borg/repository.py

mr-raj12 force-pushed the pack-files-step5-packwriter branch from f60a3d1 to 60ca680 Compare June 7, 2026 23:25

ThomasWaldmann reviewed Jun 8, 2026

View reviewed changes

Comment thread src/borg/repository.py Outdated

mr-raj12 force-pushed the pack-files-step5-packwriter branch from 60ca680 to 0a9160c Compare June 8, 2026 05:12

ThomasWaldmann approved these changes Jun 8, 2026

View reviewed changes

mr-raj12 force-pushed the pack-files-step5-packwriter branch from 0a9160c to 2c60858 Compare June 9, 2026 02:24

mr-raj12 requested a review from ThomasWaldmann June 9, 2026 05:37

ThomasWaldmann reviewed Jun 9, 2026

View reviewed changes

Comment thread src/borg/repository.py Outdated

mr-raj12 added 2 commits June 9, 2026 14:49

mr-raj12 force-pushed the pack-files-step5-packwriter branch from 2c60858 to 0395cc1 Compare June 9, 2026 09:20

ThomasWaldmann merged commit 7cdaebf into borgbackup:master Jun 9, 2026
28 of 32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

repository: add PackWriter and two-phase chunk index update#9723

repository: add PackWriter and two-phase chunk index update#9723
ThomasWaldmann merged 2 commits into
borgbackup:masterfrom
mr-raj12:pack-files-step5-packwriter

mr-raj12 commented Jun 5, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

ThomasWaldmann left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ThomasWaldmann commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

mr-raj12 commented Jun 5, 2026

Uh oh!

ThomasWaldmann commented Jun 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ThomasWaldmann left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mr-raj12 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

codecov Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ThomasWaldmann left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ThomasWaldmann commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

mr-raj12 commented Jun 5, 2026

Uh oh!

ThomasWaldmann commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ThomasWaldmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mr-raj12 commented Jun 5, 2026 •

edited

Loading

codecov Bot commented Jun 5, 2026 •

edited

Loading

ThomasWaldmann left a comment •

edited

Loading

ThomasWaldmann commented Jun 6, 2026 •

edited

Loading