Skip to content

Paella of sea bugs with jamon iberico, cinnamon, maple syrup and peanut butter.#1607

Open
Fredi-raspall wants to merge 19 commits into
mainfrom
pr/fredi/misc
Open

Paella of sea bugs with jamon iberico, cinnamon, maple syrup and peanut butter.#1607
Fredi-raspall wants to merge 19 commits into
mainfrom
pr/fredi/misc

Conversation

@Fredi-raspall

@Fredi-raspall Fredi-raspall commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Fixes several small things experienced on deployments:

  • allows rebinding CLI socket (by sending SIGUSR1 to dataplane) in case the corresponding file system entry is accidentally removed.
  • Splits signal handler in two to avoid circular dependencies and adds listeners to disable default behavior that would stop dataplane (e.g. with the prior code, dataplane would stop upon receiving things like SIGUSR2, or SIGALRM or SIGPIPE).
  • add support to account packets that were correctly processed but dropped if the last step (serialization) prior xmit failed.
━━━━━━━━━━━━━━━━━━━━━━ Packet stats ━━━━━━━━━━━━━━━━
    Packet result                  count
   [ .. ]
    Eth: not for us                2633
    Locally delivered              2633
    Delivered                      0
    Deparse error                  0 <-- here
    No headroom                    0 <--- here

  • Fix dockerfile for quick builds
  • Add netlink listener to interface manager to be able to log / register interface operational status changes and register events so that they can be looked up from cli. This allows detecting physical issues that may cause route flapping on faulty interface/transceiver/cabling (or excluding them).
━━━━━━━━━━━━━━━━━━━━━ ROUTER_EVENTS ━━━━━━━━━━━━━━━━
 generated: 27 stored: 27 capacity: 1000

 0    2026-06-26T 15:24:13: Started!
 1    2026-06-26T 15:24:13: Connected to frr-agent
[ .. ]  
 11   2026-06-26T 15:24:43: eth2: oper state changed unknown -> down (carrier:false, carrier-up:1 carrier-down:2)
 12   2026-06-26T 15:24:49: eth2: oper state changed down -> up (carrier:true, carrier-up:2 carrier-down:2)
 13   2026-06-26T 15:25:58: eth2: oper state changed up -> down (carrier:false, carrier-up:2 carrier-down:3)
 14   2026-06-26T 15:26:02: eth2: oper state changed down -> up (carrier:true, carrier-up:3 carrier-down:3)
 15   2026-06-26T 15:26:14: eth2: admin state changed up -> down (carrier:false, carrier-up:3 carrier-down:4)
 16   2026-06-26T 15:26:14: eth2: oper state changed up -> down (carrier:false, carrier-up:3 carrier-down:4)
 17   2026-06-26T 15:26:31: eth2: admin state changed down -> up (carrier:true, carrier-up:4 carrier-down:4)
 18   2026-06-26T 15:26:31: eth2: oper state changed down -> up (carrier:true, carrier-up:4 carrier-down:4)
 19   2026-06-26T 15:26:40: eth2: admin state changed up -> down (carrier:false, carrier-up:4 carrier-down:5)
 20   2026-06-26T 15:26:40: eth2: oper state changed up -> down (carrier:false, carrier-up:4 carrier-down:5)
 21   2026-06-26T 15:26:48: eth2: admin state changed down -> up (carrier:true, carrier-up:5 carrier-down:5)
 22   2026-06-26T 15:26:48: eth2: oper state changed down -> up (carrier:true, carrier-up:5 carrier-down:5)
  • better integrate the router control channel into the router IO loop by adding a waker.

@Fredi-raspall Fredi-raspall requested a review from a team as a code owner June 26, 2026 16:06
@Fredi-raspall Fredi-raspall requested review from Copilot, qmonnet and sergeymatov and removed request for a team and sergeymatov June 26, 2026 16:06
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR updates Unix signal handling, adds an rtnetlink-backed interface monitor, wires interface events through management and routing, adds packet serialization error variants, renames a masquerade helper, and parameterizes Dockerfile artifact paths by platform.

Changes

Interface Event Pipeline

Layer / File(s) Summary
Signal catcher handoff
lifecycle/src/lib.rs, dataplane/src/runtime.rs
lifecycle replaces spawn_signal_handler with DpSignal and spawn_signal_catcher returning an mpsc::Receiver; dataplane runtime consumes it in a custom signal task and adds interfaces to MgmtParams.
InterfaceMonitor and EthEvent
interface-manager/Cargo.toml, interface-manager/src/lib.rs, interface-manager/src/monitor/mod.rs
New monitor module exports EthEvent and InterfaceMonitor, which listens to rtnetlink NewLink multicast, converts tracked interfaces to events, and broadcasts them over Tokio broadcast channels.
Mgmt relay and ConfigProcessor wiring
mgmt/src/processor/launch.rs, mgmt/src/processor/proc.rs, mgmt/src/tests/mgmt.rs
run_mgmt starts InterfaceMonitor, relays EthEvent values to routing, constructs ConfigProcessor with an explicit runtime handle, and passes a shared ConfigClient into both startup paths.
RouterEvent variants and interface state setters
routing/src/router/revent.rs, routing/src/interfaces/interface.rs, routing/src/interfaces/iftable.rs, routing/src/interfaces/iftablerw.rs, routing/src/cli/display.rs
RouterEvent gains IfAdmChange and IfOperChange; interface state setters become unconditional assignments; iftable visibility narrows; IfTableWriter drops allow(unused); IfState::fmt uses padding.
RouterCtlSender waker and handle_ifevent
routing/src/router/ctl.rs
LockGuard and RouterCtlSender carry Arc<Waker>; send_and_wake centralizes send and wake behavior; handle_ctl_msg drains the control channel; handle_ifevent derives admin and oper states from EthEvent and emits RouterEvent changes.
Rio poll loop waker and CLI socket restore
routing/Cargo.toml, routing/src/router/rio.rs
Rio and RioHandle store Arc<Waker>; Rio::new registers CTL_CHANNEL and CLIPATH_WATCHER; the poll loop handles control and inotify events; cli_sock_restore reopens the CLI socket; ExitGuard replaces the inline termination guard.

Packet Serialization Errors

Layer / File(s) Summary
SerializeError and DoneReason variants
net/src/packet/meta.rs, net/src/packet/mod.rs, net/src/packet/stats_display.rs
DoneReason gains DeparseError and NoHeadRoom; SerializeError<E> is introduced; Packet::serialize returns SerializeError and records the matching DoneReason through done_force.
Kernel worker receive and logging
dataplane/src/drivers/kernel/worker.rs
packet_recv increases the raw receive buffer size, and tx_packet logs pkt.serialize() failures at warn!.

Masquerade Helper Rename

Layer / File(s) Summary
Helper rename and comment cleanup
nat/src/masquerade/nf.rs
refresh_masquerade_state documentation is clarified; the OneWay warning log is replaced with comments; the helper is renamed and its call sites updated.

Dockerfile Platform Build Arg

Layer / File(s) Summary
Platform build arg
Dockerfile
ARG PLATFORM=x86_64-unknown-linux-gnu is added, and the three COPY --link artifact paths are updated to include ${PLATFORM}.

Possibly related PRs

  • githedgehog/dataplane#1598: Touches the same nat/src/masquerade/nf.rs file with broad masquerade renames that precede the helper rename and comment cleanup in this PR.

Suggested reviewers

  • mvachhar
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title is unrelated to the PR and does not describe the actual code changes. Replace it with a concise summary of the main change, such as improving signal handling, interface monitoring, and router control wakeups.
✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description clearly matches the changeset, covering signal handling, CLI socket rebinding, packet serialization errors, Dockerfile, interface monitoring, and router waker updates.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves dataplane operational robustness by (1) restructuring signal handling to avoid unintended process termination and enable CLI socket rebinding, (2) integrating the router control channel into the mio poll loop via a Waker, and (3) adding netlink-based interface event monitoring and router event logging. It also extends packet outcome accounting to include post-processing serialization failures and adjusts build/container ergonomics.

Changes:

  • Router I/O loop: integrate control-channel processing into mio::Poll via Waker, add CLI socket restore control message, and harden loop-exit detection.
  • Interface events: add an interface-manager netlink monitor, relay events into the router control path, and log/admin+oper state transitions as router events.
  • Packet stats: introduce SerializeError and new DoneReason values to account “no headroom” / “deparse error” outcomes; minor Dockerfile and kernel worker tweaks.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
routing/src/router/rio.rs Adds mio::Waker integration for ctl channel; adds CLI socket restore; refactors exit guard.
routing/src/router/ctl.rs Adds waker-backed RouterCtlSender, new ctl messages (RebindCli, IfEvent), drains ctl channel on wake.
routing/src/router/revent.rs Adds router events for interface admin/oper transitions with formatted display.
routing/src/interfaces/interface.rs Removes per-set logging for admin/oper state updates (now logged via router events).
routing/src/interfaces/iftablerw.rs Enables writer methods for interface state updates (removes unused allow).
routing/src/interfaces/iftable.rs Narrows visibility of state setters to pub(super).
routing/src/cli/display.rs Updates IfState display to use formatter padding behavior.
routing/Cargo.toml Adds interface-manager dependency.
net/src/packet/mod.rs Adds SerializeError, splits serialize logic, sets DoneReason on serialization failure.
net/src/packet/meta.rs Adds DoneReason::DeparseError and DoneReason::NoHeadRoom.
net/src/packet/stats_display.rs Adds display strings for new DoneReason values.
nat/src/masquerade/nf.rs Minor doc/comment cleanup and internal helper rename.
mgmt/src/processor/proc.rs Changes ConfigProcessor::new to take a runtime handle; spawns rtnetlink connection.
mgmt/src/processor/launch.rs Starts interface monitor + relay task; passes router control sender; plumbs config client.
mgmt/src/tests/mgmt.rs Updates test to pass tokio runtime handle to ConfigProcessor::new.
lifecycle/src/lib.rs Splits signal handling into catcher (relays signals) for broader signal coverage.
interface-manager/src/monitor/mod.rs Introduces netlink-based interface monitor and broadcast distribution; adds (ignored) test.
interface-manager/src/lib.rs Exposes new monitor module.
interface-manager/Cargo.toml Adds dependencies needed for monitor + tests.
dataplane/src/runtime.rs Adds main signal-processing task (acts on SIGUSR1 to rebind CLI); uses new catcher API.
dataplane/src/drivers/kernel/worker.rs Increases RX buffer size; promotes serialize failure log to warn.
Dockerfile Copies binaries from target triple subdir (target/${PLATFORM}/${PROFILE}).
Cargo.lock Updates lockfile for new dependencies.

Comment thread mgmt/src/processor/launch.rs
Comment thread mgmt/src/processor/launch.rs
Comment thread lifecycle/src/lib.rs
Comment thread net/src/packet/mod.rs Outdated
Comment thread net/src/packet/mod.rs Outdated
Comment on lines +125 to +129
// create netlink socket and spawn background task
let Ok((connection, netlink, _)) = rtnetlink::new_connection() else {
panic!("failed to create connection");
};
spawn(connection);
handle.spawn(connection);
Comment on lines +210 to +214
#[wrap(with_caps([Capability::CAP_NET_ADMIN]))]
#[n_vm::in_vm]
#[cfg_attr(not(emulated), traced_test)]
#[ignore = "could not make this test to run"]
async fn test_interface_monitor() {

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (1)
net/src/packet/mod.rs (1)

65-73: 📐 Maintainability & Code Quality | 🔵 Trivial

Consider adding #[from] to DeparseError for automatic error conversion

Adding #[from] to the DeparseError variant allows the ? operator to automatically convert DeParseError<E> into SerializeError<E>, eliminating the need for explicit .map_err(SerializeError::DeparseError) calls in do_serialize.

The thiserror crate automatically implements Error::source() when #[from] is used. Ensure DeParseError<E> implements std::error::Error (a requirement for #[from]); if it is a standard thiserror enum, this likely holds true.

♻️ Proposed change
     #[error("De-parsing error: {0}")]
-    DeparseError(DeParseError<E>),
+    DeparseError(#[from] DeParseError<E>),

Then simplify the error handling in do_serialize:

-        self.headers.deparse(buf).map_err(SerializeError::DeparseError)?;
+        self.headers.deparse(buf)?;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@net/src/packet/mod.rs` around lines 65 - 73, The SerializeError enum
currently wraps DeParseError(E) without automatic conversion, forcing manual
map_err handling in do_serialize. Update the DeparseError variant in
SerializeError<E> to use #[from] and confirm DeParseError<E> implements
std::error::Error so thiserror can derive source() automatically. Then simplify
do_serialize to rely on the ? operator for converting DeParseError<E> into
SerializeError<E> instead of explicit mapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Dockerfile`:
- Around line 11-15: The Docker build path now depends on PLATFORM, but the
build invocation only forwards PROFILE, so Docker defaults to the wrong target
and can miss the built artifacts. Update the docker build wiring in the justfile
to pass PLATFORM through alongside PROFILE, and ensure the Dockerfile COPY
commands continue to consume the same PLATFORM variable used by the workflow so
quick builds and CI resolve the same target.

In `@interface-manager/src/monitor/mod.rs`:
- Around line 95-114: The `netlink_to_event` path is incorrectly treating
`carrierup` and `carrierdown` as required, causing link-state events to be
dropped when those display-only counters are missing. Update the parsing in
`netlink_to_event` so missing `LinkAttribute::CarrierUpCount` and
`LinkAttribute::CarrierDownCount` do not short-circuit the function; instead,
make them optional or default them while still constructing the `EthEvent` from
`ifindex`, `ifup`, `iflowerup`, `ifrunning`, and `carrier`.

In `@mgmt/src/processor/launch.rs`:
- Around line 116-128: Startup link events can be dropped because
InterfaceMonitor begins emitting before the router knows the interface, and
interface_event_notify/router control currently ignores events for unknown
ifindex values. Fix this by ensuring the startup path either applies an initial
interface snapshot/replay before subscribing to deltas, or delays starting
InterfaceMonitor/interface_event_notify until after ConfigProcessor has
initialized routing state; use the existing InterfaceMonitor,
interface_event_notify, and ConfigProcessor::new/startup sequence to keep early
link events from being lost.
- Around line 89-100: The interface relay loop in interface_event_notify
currently ignores the broadcast receiver closing state, causing an endless retry
spin after rx.recv().await stops yielding events. Update the loop to explicitly
handle the closed result from rx.recv() and break out of the task when the
channel is closed, while keeping the existing send_ifevent relay behavior for
valid EthEvent messages.

In `@net/src/packet/mod.rs`:
- Around line 335-340: Update the doc comments on `Packet::deparse` and
`Packet::serialize` so the `# Errors` section refers to the actual return type,
`SerializeError`, instead of the non-existent `PacketError` variants. Keep the
wording aligned with the existing method names and error cases in
`net/src/packet/mod.rs`, replacing `PacketError::NoHeadRoom` and
`PacketError::DeparseError` with the correct `SerializeError` terminology
throughout both doc blocks.

In `@routing/src/cli/display.rs`:
- Around line 480-485: The IfState Display formatting now emits variable-length
text, which breaks alignment in the CLI tables that print it from the table
rendering code in display.rs. Fix this by either restoring fixed-width padding
in IfState::fmt so it always returns a consistently padded value, or by updating
every IfState table call site in the display formatting paths to apply an
explicit width/column constraint so the output remains aligned.

In `@routing/src/router/ctl.rs`:
- Around line 38-48: LockGuard::drop currently uses ambient tokio::task::spawn,
which can panic if the guard is dropped outside an active Tokio runtime. Update
LockGuard to store a tokio::runtime::Handle when it is created, then replace the
spawn call in drop with handle.spawn(...) so the unlock task is scheduled safely
regardless of runtime context. Ensure the handle is threaded through the
LockGuard constructor and any call sites that build it.

In `@routing/src/router/rio.rs`:
- Around line 215-228: The cli_sock_restore flow in rio::Router currently
deregisters and shuts down the active CLI socket before opening the replacement,
which can leave the router without a usable socket if open_cli_sock fails.
Change the sequence so the new socket is created first, then only call
deregister and shutdown on the old self.clisock after the new socket opens
successfully, and finally assign the new socket and register it through
register(CLISOCK, ...).
- Around line 510-516: The control-channel drain in rio’s poll loop is too
tightly coupled to the CTL_CHANNEL token, which can leave queued messages
unprocessed if a wake is missed after send_and_wake succeeds. Move the
handle_ctl_msg call in router::rio::Rio’s event loop so it runs on every poll
iteration, not only inside the CTL_CHANNEL match arm, while keeping its
try_recv-based draining behavior unchanged. Use the existing handle_ctl_msg and
send_and_wake flow as the main reference points when updating the loop.

---

Nitpick comments:
In `@net/src/packet/mod.rs`:
- Around line 65-73: The SerializeError enum currently wraps DeParseError(E)
without automatic conversion, forcing manual map_err handling in do_serialize.
Update the DeparseError variant in SerializeError<E> to use #[from] and confirm
DeParseError<E> implements std::error::Error so thiserror can derive source()
automatically. Then simplify do_serialize to rely on the ? operator for
converting DeParseError<E> into SerializeError<E> instead of explicit mapping.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 193b3e81-e417-4008-9911-61214a56e5c0

📥 Commits

Reviewing files that changed from the base of the PR and between 5f19b51 and eb45426.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (22)
  • Dockerfile
  • dataplane/src/drivers/kernel/worker.rs
  • dataplane/src/runtime.rs
  • interface-manager/Cargo.toml
  • interface-manager/src/lib.rs
  • interface-manager/src/monitor/mod.rs
  • lifecycle/src/lib.rs
  • mgmt/src/processor/launch.rs
  • mgmt/src/processor/proc.rs
  • mgmt/src/tests/mgmt.rs
  • nat/src/masquerade/nf.rs
  • net/src/packet/meta.rs
  • net/src/packet/mod.rs
  • net/src/packet/stats_display.rs
  • routing/Cargo.toml
  • routing/src/cli/display.rs
  • routing/src/interfaces/iftable.rs
  • routing/src/interfaces/iftablerw.rs
  • routing/src/interfaces/interface.rs
  • routing/src/router/ctl.rs
  • routing/src/router/revent.rs
  • routing/src/router/rio.rs
💤 Files with no reviewable changes (1)
  • routing/src/interfaces/iftablerw.rs

Comment thread Dockerfile
Comment thread interface-manager/src/monitor/mod.rs Outdated
Comment thread mgmt/src/processor/launch.rs
Comment thread mgmt/src/processor/launch.rs
Comment thread net/src/packet/mod.rs Outdated
Comment on lines +480 to +485
let s = match *self {
IfState::Unknown => "unknown",
IfState::Up => "up",
IfState::Down => "down",
};
f.pad(s)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify IfState display call sites use explicit width where table alignment matters.
rg -n 'admin_state|oper_state|IfState' routing/src/cli/display.rs routing/src -g '*.rs' -C 2

Repository: githedgehog/dataplane

Length of output: 20228


Refactor IfState Display to preserve fixed-width column alignment in CLI tables.

The Display implementation for IfState currently outputs variable-length strings ("unknown", "up", "down"). Call sites in routing/src/cli/display.rs (lines 534–535, 574) insert these values into tabular output without enforcing explicit column widths. This change removes the previous implicit fixed-width padding, causing column misalignment in CLI tables.

To resolve:

  • Either revert IfState::fmt to return a consistently padded string (e.g., 9 characters), or
  • Update all table-formating call sites to specify explicit width constraints (e.g., {:9} or configure the table formatter to fix column width).

routing/src/cli/display.rs
routing/src/cli/display.rs

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@routing/src/cli/display.rs` around lines 480 - 485, The IfState Display
formatting now emits variable-length text, which breaks alignment in the CLI
tables that print it from the table rendering code in display.rs. Fix this by
either restoring fixed-width padding in IfState::fmt so it always returns a
consistently padded value, or by updating every IfState table call site in the
display formatting paths to apply an explicit width/column constraint so the
output remains aligned.

Comment thread routing/src/router/ctl.rs
Comment on lines 38 to 48
impl Drop for LockGuard {
fn drop(&mut self) {
let tx = self.0.take();
if let Some(tx) = tx {
if let Some(tx) = self.tx.take() {
let waker = Arc::clone(&self.waker);
task::spawn(async move {
if let Err(e) = tx.send(RouterCtlMsg::GuardedUnlock).await {
error!("Fatal: could not send unlock request!!: {e}");
} else {
let _ = waker.wake();
}
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n routing/src/router/ctl.rs | head -100

Repository: githedgehog/dataplane

Length of output: 3781


🏁 Script executed:

rg -A 5 -B 5 "LockGuard" routing/src/ | head -80

Repository: githedgehog/dataplane

Length of output: 2393


Avoid ambient tokio::task::spawn in Drop implementation.

LockGuard::drop invokes tokio::task::spawn without capturing a Runtime handle. If LockGuard is dropped outside an active Tokio runtime context (e.g., in synchronous code or during process shutdown), spawn will panic.

Store a tokio::runtime::Handle within LockGuard at creation and use handle.spawn(...) to ensure the task is spawned safely regardless of the caller's runtime context.

Current unsafe implementation
impl Drop for LockGuard {
    fn drop(&mut self) {
        if let Some(tx) = self.tx.take() {
            let waker = Arc::clone(&self.waker);
            task::spawn(async move {
                // ...
            });
        }
    }
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@routing/src/router/ctl.rs` around lines 38 - 48, LockGuard::drop currently
uses ambient tokio::task::spawn, which can panic if the guard is dropped outside
an active Tokio runtime. Update LockGuard to store a tokio::runtime::Handle when
it is created, then replace the spawn call in drop with handle.spawn(...) so the
unlock task is scheduled safely regardless of runtime context. Ensure the handle
is threaded through the LockGuard constructor and any call sites that build it.

Comment thread routing/src/router/rio.rs
Comment thread routing/src/router/rio.rs

@qmonnet qmonnet left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments and suggestions below, but not blocking - PR looks good from my side 🥘. Thanks!

However:

  • You have one commit missing a sign-off
  • AI have raised some actual concerns, it seems, please address where necessary
  • The “concurrency” job failure looks like something you need to address, too
  • (The Miri job on powerpc64 is currently broken though, not related to this PR.)

Comment thread routing/src/router/rio.rs Outdated
Comment thread interface-manager/src/monitor/mod.rs Outdated
Comment thread mgmt/src/processor/launch.rs Outdated
Comment thread mgmt/src/processor/proc.rs
@qmonnet qmonnet added the bug Something isn't working label Jun 26, 2026
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/misc branch 3 times, most recently from 290325c to d136d7a Compare June 29, 2026 13:08

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@concurrency/src/sync/loom_backend.rs`:
- Line 49: Remove the redundant manual Receiver implementation for Arc<T> in
loom_backend so the blanket core Receiver impl via Deref is used instead. Delete
the explicit impl on Arc<T> in the sync code, and remove Receiver from the
core::ops import since it is no longer needed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 61415c00-0a56-41e6-bddc-932682a9f294

📥 Commits

Reviewing files that changed from the base of the PR and between e1bcc10 and d136d7a.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (23)
  • Dockerfile
  • concurrency/src/sync/loom_backend.rs
  • dataplane/src/drivers/kernel/worker.rs
  • dataplane/src/runtime.rs
  • interface-manager/Cargo.toml
  • interface-manager/src/lib.rs
  • interface-manager/src/monitor/mod.rs
  • lifecycle/src/lib.rs
  • mgmt/src/processor/launch.rs
  • mgmt/src/processor/proc.rs
  • mgmt/src/tests/mgmt.rs
  • nat/src/masquerade/nf.rs
  • net/src/packet/meta.rs
  • net/src/packet/mod.rs
  • net/src/packet/stats_display.rs
  • routing/Cargo.toml
  • routing/src/cli/display.rs
  • routing/src/interfaces/iftable.rs
  • routing/src/interfaces/iftablerw.rs
  • routing/src/interfaces/interface.rs
  • routing/src/router/ctl.rs
  • routing/src/router/revent.rs
  • routing/src/router/rio.rs
💤 Files with no reviewable changes (1)
  • routing/src/interfaces/iftablerw.rs
✅ Files skipped from review due to trivial changes (3)
  • interface-manager/src/lib.rs
  • net/src/packet/stats_display.rs
  • mgmt/src/tests/mgmt.rs
🚧 Files skipped from review as they are similar to previous changes (17)
  • routing/Cargo.toml
  • routing/src/interfaces/iftable.rs
  • routing/src/router/revent.rs
  • Dockerfile
  • interface-manager/Cargo.toml
  • dataplane/src/drivers/kernel/worker.rs
  • net/src/packet/meta.rs
  • lifecycle/src/lib.rs
  • mgmt/src/processor/launch.rs
  • routing/src/interfaces/interface.rs
  • routing/src/cli/display.rs
  • net/src/packet/mod.rs
  • mgmt/src/processor/proc.rs
  • nat/src/masquerade/nf.rs
  • dataplane/src/runtime.rs
  • interface-manager/src/monitor/mod.rs
  • routing/src/router/rio.rs

Comment thread concurrency/src/sync/loom_backend.rs Outdated
Comment thread dataplane/src/runtime.rs Outdated
match sig {
DpSignal::SIGTERM | DpSignal::SIGINT | DpSignal::SIGQUIT => root.cancel(),
DpSignal::SIGUSR1 | DpSignal::SIGUSR2 | DpSignal::SIGHUP | DpSignal::SIGALRM | DpSignal::SIGPIPE => {}
DpSignal::SIGUSR2 | DpSignal::SIGHUP | DpSignal::SIGALRM | DpSignal::SIGPIPE => {},

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use inotify to monitor the path? And maybe have a timer which reconnects if it finds the socket missing. This seems an odd choice for a k8s product

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I can give it a try.

Comment thread interface-manager/src/monitor/mod.rs Outdated
};

// we allow veths for testing
if info.is_some() && !is_veth && phy_only {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have other means of determining if an interface is physical.

More, I don't understand exactly what "physical" means in many cases, especially regarding sr-iov and virtio. We may want to rethink this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant not vrf or not bridge or not vlan (so no "virtual" in the generic sense or not a sub-interface). The net device representing a nic (as close as possible) so that the carrier indication could reliably indicate if it was operational.

Comment thread interface-manager/src/monitor/mod.rs Outdated
}

/// Convert a netlink message to an `EthEvent`
fn netlink_to_event(msg: NetlinkMessage<RouteNetlinkMessage>, phy_only: bool) -> Option<EthEvent> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this should be a method on InterfaceMonitor?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you meant associated function?

Comment thread interface-manager/src/monitor/mod.rs Outdated
#[wrap(with_caps([Capability::CAP_NET_ADMIN]))]
#[n_vm::in_vm]
#[cfg_attr(not(emulated), traced_test)]
#[ignore = "could not make this test to run"]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???

This needs work

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
The rebinding should happen when processing a message over the
control channel.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Split the current signal handler in two parts: a signal catcher and
a signal handler. The former is generic and relays signals to the
handler, which is dataplane-specific. This change is needed to avoid
a circular dependency that would occur because the existing handler
is defined in lifecycle. An alternative would be to move the prior
signal handler to dataplane, but opted to keep the lifecycle crate
complete with that piece. The signal catcher just registers signal
listeners and relays them to dataplane, which does the actual
handling.

The prior handler is extended with listeners for other signals
which would otherwise terminate dataplane. Dataplane can now be
stopped with: SIGINT, SIGTERM, SIGQUIT.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Cause the cli to rebind to the configured path. This allows ensuring
the availability of the CLI in case the filesystem path is accidentally
removed (e.g. by starting another dataplane).

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
When sending a packet once processed, its buffer needs to be updated
according to the changes made in its headers. This can fail when
"de-parsing" the headers or in case there is no headroom in the
buffer. Avoid unreachable! on a fallible method (which should not
otherwise fail) and mark the packet in case of failure so that if
those problems happen, they are not silently overlooked at.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Adds a small netlink listener that produces events upon parsing
netlink messages sent by the kernel and can broadcast them to
multiple subscribers. This will be used to monitor flapping
interfaces. This code will require small adjustments when the new
interface model lands.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Instead of creating the ConfigProcessor explicitly in both config
drivers, create and run it before branching. This makes it clearer
that the distinct config entry points just require a config client
to drive the processor.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Start the interface monitor in the management subsystem and relay
interface events to the router.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Handle interface events by updating the state of the interfaces as
seen by the routing crate and logging any change.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
The prior code logged interface status changes in methods that
update the interface objects in the interface table. This is
problematic since the interface table is implemented using left-right.
As a result those methods are called twice (one for each copy
of the interface table), causing logs to be duplicated and appear
at seemingly random times. Fix this by moving the logic outside
the reach of the absorb trait.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
The control channel was never integrated in the router IO poller.
As a result, control messages were serviced, in the worst case,
every timeout interval of the poller. This is problematic now that
the router also gets interface events (which may be more frequent
than the prior type of messages). Solve this by waking up the
poller anytime a message is queued towards the router with the
aid of a Waker.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Some smart nics are known to accept frames larger than the standard
jumbo size. Enlarge the reception buffer in the kernel driver to
accommodate for that.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
@Fredi-raspall

Copy link
Copy Markdown
Contributor Author

Changes since prior version:

  • let the interface monitor get a list of interfaces (net devs) to track. This avoids guess-work on the type since it does not matter.
  • Do not use signals to rebind CLI socket (as that requires user intervention). Instead, use inotify to detect CLI sock path deletions and rebind automatically.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@interface-manager/src/monitor/mod.rs`:
- Around line 183-188: Remove the duplicate tracing-test macro on the test in
mod.rs: the function currently has both the unconditional #[traced_test] and the
#[cfg_attr(not(emulated), traced_test)] variant. Keep only one of them so the
test is not wrapped twice; adjust the attributes on the affected test function
to use either the unconditional form or the cfg_attr form, but not both.

In `@routing/src/router/rio.rs`:
- Around line 116-121: The CLI socket path handling in setup_clipath_watcher
currently assumes Path::parent will exist, so relative paths like cli.sock fail
later instead of being handled cleanly. Update setup_clipath_watcher in
router::rio to either treat a missing parent as the current directory (using "."
as the watch target) or enforce absolute CLI socket paths earlier during config
parsing before this watcher is called. Keep the fix centered on the
setup_clipath_watcher path resolution logic and the config/CLI path validation
flow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 534b238f-887d-441f-90b2-11bf0a378000

📥 Commits

Reviewing files that changed from the base of the PR and between d136d7a and 4a569e6.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (22)
  • Dockerfile
  • dataplane/src/drivers/kernel/worker.rs
  • dataplane/src/runtime.rs
  • interface-manager/Cargo.toml
  • interface-manager/src/lib.rs
  • interface-manager/src/monitor/mod.rs
  • lifecycle/src/lib.rs
  • mgmt/src/processor/launch.rs
  • mgmt/src/processor/proc.rs
  • mgmt/src/tests/mgmt.rs
  • nat/src/masquerade/nf.rs
  • net/src/packet/meta.rs
  • net/src/packet/mod.rs
  • net/src/packet/stats_display.rs
  • routing/Cargo.toml
  • routing/src/cli/display.rs
  • routing/src/interfaces/iftable.rs
  • routing/src/interfaces/iftablerw.rs
  • routing/src/interfaces/interface.rs
  • routing/src/router/ctl.rs
  • routing/src/router/revent.rs
  • routing/src/router/rio.rs
💤 Files with no reviewable changes (1)
  • routing/src/interfaces/iftablerw.rs
✅ Files skipped from review due to trivial changes (4)
  • mgmt/src/tests/mgmt.rs
  • net/src/packet/stats_display.rs
  • routing/src/cli/display.rs
  • nat/src/masquerade/nf.rs
🚧 Files skipped from review as they are similar to previous changes (13)
  • interface-manager/src/lib.rs
  • routing/src/interfaces/iftable.rs
  • routing/src/router/revent.rs
  • net/src/packet/meta.rs
  • interface-manager/Cargo.toml
  • dataplane/src/drivers/kernel/worker.rs
  • mgmt/src/processor/proc.rs
  • Dockerfile
  • routing/src/interfaces/interface.rs
  • net/src/packet/mod.rs
  • lifecycle/src/lib.rs
  • routing/src/router/ctl.rs
  • mgmt/src/processor/launch.rs

Comment thread interface-manager/src/monitor/mod.rs Outdated
Comment thread routing/src/router/rio.rs
Add an inotify watcher to the directory containing the file path
for the CLI Unix socket and rebind the socket in case the file path
is removed.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Rewrite interface monitor so that arbitrary_self_types is not
needed, as that feature seems unstable.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
@Fredi-raspall Fredi-raspall force-pushed the pr/fredi/misc branch 2 times, most recently from 6253738 to 2f09e5c Compare July 1, 2026 13:37
Polish logs, remove duped methods.

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants