Skip to content

channeld: fix channel_ready retransmission after splice via funding_tx_index#9256

Open
nGoline wants to merge 3 commits into
ElementsProject:masterfrom
nGoline:fix-channel-ready-retransmit-after-splice
Open

channeld: fix channel_ready retransmission after splice via funding_tx_index#9256
nGoline wants to merge 3 commits into
ElementsProject:masterfrom
nGoline:fix-channel-ready-retransmit-after-splice

Conversation

@nGoline

@nGoline nGoline commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Problem

On reconnect, channeld decides whether to retransmit channel_ready based on whether the peer's channel_reestablish references a splice (is_splice_active). It computed that by comparing the peer's my_current_funding_locked txid against peer->channel->funding.txid.

That comparison is wrong once a splice locks: channel_update_funding() overwrites funding.txid with the splice txid, so afterwards both sides agree on the same txid, the check falls through, and channel_ready is incorrectly retransmitted on reconnect, which can disrupt channels.

This is the problem @ddustin raised reviewing #9079 (comparing against the txid is incorrect, and we should track metadata on the inflight instead), suggesting a funding_tx_index, matching what eclair does.

Fix

Give every funding tx a stable funding_tx_index: 0 for the original funding (including RBF attempts), incrementing by 1 per splice. It is:

  • carried on the channeld inflight and on the channel,
  • threaded through the channeld_init and channeld_add_inflight wire,
  • assigned on splice creation (parent + 1) and copied onto the channel funding when a splice locks,
  • persisted in channel_funding_inflights and channels (two migrations, DEFAULT 0, so existing channels are unaffected).

The reestablish check now resolves the my_current_funding_locked txid to its funding_tx_index (current funding or a pending inflight) and treats > 0 as a splice.

Commits (bisectable)

  1. splice: add tests for funding_tx_index tracking, regression tests (xfail).
  2. splice: track funding_tx_index on inflights and channel funding, the plumbing.
  3. channeld: detect splice on reestablish via funding_tx_index, not txid, the fix.

Tests

  • test_splice_reconnect_after_lock_no_channel_ready: splice, restart, reestablish; asserts the index persists and channel_ready is not retransmitted. It fails on commit 2 (old txid compare) and passes on commit 3, so it genuinely guards the fix.
  • test_splice_funding_tx_index_increments: two sequential splices reach index 2.
  • test_splice_inflight_funding_tx_index: a pending splice inflight carries index 1 and survives a restart.
  • Existing test_reconnect_no_update confirms non-splice channels still retransmit channel_ready.

Risk and plan

The behavioural change is a single line in the reestablish check. The plan is to get the funding_tx_index semantics reviewed properly, and if we don't converge before the v26.09 freeze, revert that one line with a FIXME and keep the plumbing for the follow-up. So this carries low risk.

@ddustin, this implements the funding_tx_index approach you suggested; would appreciate your review of the assignment semantics and whether it matches what you and tbast had in mind.

nGoline added 3 commits June 24, 2026 17:54
Add regression tests that each funding transaction carries a
funding_tx_index (0 for the original funding, +1 per splice), that it is
persisted and reloaded across restart, and that a pending splice inflight
carries its index.

Changelog-None
Give every funding tx a stable index: 0 for the original funding
(including RBF attempts), incrementing by 1 per splice.  Threaded through
the channeld inflight, the channel, the channeld_init and
channeld_add_inflight wire messages, set on splice creation and carried
onto the channel funding when a splice locks, and persisted in the
channel_funding_inflights and channels tables.
is_splice_active compared the peer's my_current_funding_locked txid
against peer->channel->funding.txid, which is wrong: once a splice locks,
funding.txid becomes the splice txid, both sides agree on it, the check
falls through, and channel_ready is incorrectly retransmitted on reconnect.

Resolve the txid to its funding_tx_index (the current channel funding or a
pending splice inflight) and treat > 0 as a splice.

Changelog-Fixed: channeld: correctly detect splice transactions when deciding whether to retransmit `channel_ready` on reconnect.
@nGoline nGoline requested a review from ddustin June 24, 2026 21:50
@nGoline nGoline added the Status::Ready for Review The work has been completed and is now awaiting evaluation or approval. label Jun 24, 2026

@ddustin ddustin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! 👏 Mostly style notes with only one major concern:

test_splice_reconnect_after_lock_no_channel_ready may not actually be testing for the lack of channel_ready because of the unneeded billboard: Channel ready for use. check.

# add extra sats to pay fee
funds_result = l1.rpc.fundpsbt("107527sat", 0, 0, excess_as_change=True)

result = l1.rpc.splice_init(chan_id, 100000, funds_result['psbt'])

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done easier with the splicein command or the dev-splice command for more complex ones.

def do_splice(amount):
funds = l1.rpc.fundpsbt('{}sat'.format(amount + 7527), 0, 0,
excess_as_change=True)
result = l1.rpc.splice_init(chan_id, amount, funds['psbt'])

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here this can become a one liner with splicein or dev-splice (for scripts).


l1.daemon.wait_for_log(r'peer_in WIRE_CHANNEL_REESTABLISH')
l2.daemon.wait_for_log(r'peer_in WIRE_CHANNEL_REESTABLISH')
l1.daemon.wait_for_log(r'billboard: Channel ready for use.')

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to remove l1.daemon.wait_for_log(r'billboard: Channel ready for use.') check for the later on assert not l1.daemon.is_in_log(r'Retransmitting channel_ready') to have any effect as it could eat the "channel_ready" message.

# Restart l1: the inflight (and its index) must reload from the DB.
l1.restart()
l1.daemon.wait_for_log(r'peer_in WIRE_CHANNEL_REESTABLISH')
inflights = l1.db_query("SELECT funding_tx_index FROM"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you meant to query this from the RPC instead of db? Doesn't seem critical since test_splice_reconnect_after_lock_no_channel_ready should cover this already.

Comment thread channeld/channeld.c
&recv_tlvs->my_current_funding_locked->my_current_funding_locked_txid,
&peer->channel->funding.txid));
&& funding_tx_index_for_txid(peer,
&recv_tlvs->my_current_funding_locked->my_current_funding_locked_txid) > 0);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct style is to have the BOLT reference at the code where it is used and the code needs to match the spec language as closely as possible. This is for many reasons.

I understand moving the check into the is_splice_active variable may feel like a clean up, but being clean to the spec is more important here.

Let's drop this variable, put the check back where it was on the channel_ready check, and use a correct BOLT reference.

The bolt reference should read something like:

	/* BOLT #2:
	 *  - if next_commitment_number is 1 in both the channel_reestablish it sent and received, and none of those channel_reestablish messages contain my_current_funding_locked or next_funding for a splice transaction:
	 *   - MUST retransmit channel_ready.
	 * - otherwise:
	 *    - MUST NOT retransmit channel_ready, but MAY send channel_ready with a different short_channel_id alias field.
	 * /

Using the Bolt reference this way makes CI enforce the bolt reference is correct and serves as a reminder to easily scan the code directly beneath it to verify we are following the spec.

This ensures we're keeping up with the spec as it changes, makes audits significantly easier, and even helps us notice when the spec itself is wrong so we can update the spec.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, adding a comment here explaining something like "tx_index of 1 or more means it was from a splice." could be useful.

# the new funding_tx_index columns) and reconnects/reestablishes.
l1.restart()

# Force the reconnect rather than waiting on auto-reconnect backoff.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal but probably should be in the first commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status::Ready for Review The work has been completed and is now awaiting evaluation or approval.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants