Bgp: add Adj-RIB-In support and route withdrawal failover#1110
Bgp: add Adj-RIB-In support and route withdrawal failover#1110giovanninardini wants to merge 12 commits into
Conversation
…and scenario parameters
…-run the decision process after route withdrawal
…s the behavior of some BGP examples)
… and make Ipv6RoutingTable configure interfaces before the Ipv6NodeConfigurator tries to assign IPv6 addresses. This fixes a problem that caused the interfaces to not re-obtain the global address after resuming its operational stateAdd
| for (auto& elem : adjRibInTable) | ||
| delete elem; |
There was a problem hiding this comment.
🔴 Loc-RIB route entries leak on router destruction because they are never freed
The Loc-RIB entries are never freed in the destructor (BgpRouter::~BgpRouter at src/inet/routing/bgpv4/BgpRouter.cc:32-44) despite being separately-allocated clones, so every BGP route installed during the simulation leaks on shutdown or crash.
Impact: Memory leaks accumulate on each router lifecycle (shutdown/crash/restart), worsening in long-running simulations.
Loc-RIB ownership changed from shared to exclusive without matching destructor update
In the old code, bgpRoutingTable entries and rt (IP routing table) entries were the SAME object — rt->deleteRoute() freed them, so the destructor didn't need to. The PR changes this: installBestRouteForPrefix at src/inet/routing/bgpv4/BgpRouter.cc:973 creates a clone for the Loc-RIB (selected = cloneRoute(best)) and a SEPARATE clone for rt (routeForRt = cloneRoute(selected) at line 992). The Loc-RIB clone is pushed into bgpRoutingTable at line 983 and is now exclusively owned by BgpRouter.
Similarly, addToAdvertiseList at line 243-248 pushes locally-originated entries into bgpRoutingTable.
The destructor frees adjRibInTable entries (line 37-38) and _prefixListINOUT entries (line 40-41) but never frees bgpRoutingTable entries. Bgp::removeBgpRoutes() only removes routes from the IP routing table (the other clones), not from bgpRoutingTable.
| for (auto& elem : adjRibInTable) | |
| delete elem; | |
| for (auto& elem : adjRibInTable) | |
| delete elem; | |
| for (auto& elem : bgpRoutingTable) | |
| delete elem; | |
Was this helpful? React with 👍 or 👎 to provide feedback.
| ospfModule->insertExternalRoute(ie->getInterfaceId(), ospfNetAddr); | ||
| } | ||
|
|
||
| updateSendProcess(oldEntry == nullptr ? NEW_ROUTE_ADDED : ROUTE_DESTINATION_CHANGED, selected->getSourceSessionId(), selected); |
There was a problem hiding this comment.
🟡 Dangling pointer used as a boolean flag after the route it points to is freed
A previously-valid pointer is compared to null (oldEntry == nullptr at src/inet/routing/bgpv4/BgpRouter.cc:1009) after the object it pointed to was freed by deleteLocRibEntry at line 964, so the comparison reads an indeterminate pointer value.
Impact: Undefined behavior per the C++ standard; in practice the check works correctly on all real compilers but may confuse sanitizers or future optimizers.
Pointer value used after delete without resetting to nullptr
At line 961, oldEntry is set to current (a valid pointer into bgpRoutingTable). At line 964, deleteLocRibEntry(oldEntry) is called, which at src/inet/routing/bgpv4/BgpRouter.cc:875 executes delete entry — freeing the object that oldEntry points to. After this, oldEntry is a dangling pointer.
At line 1009, oldEntry == nullptr is evaluated to decide between NEW_ROUTE_ADDED and ROUTE_DESTINATION_CHANGED. Since oldEntry was non-null before deletion and was never reassigned, the comparison yields the intended result on all practical implementations. However, reading the value of a pointer after the object it refers to has been deleted is undefined behavior (C++11 §3.7.4.2).
The fix is to capture the boolean before the delete:
bool hadPreviousRoute = (oldEntry != nullptr);and use hadPreviousRoute at line 1009.
Prompt for agents
In BgpRouter::installBestRouteForPrefix (src/inet/routing/bgpv4/BgpRouter.cc), the variable oldEntry is used after its pointee is freed by deleteLocRibEntry(oldEntry) at line 964. At line 1009, the comparison oldEntry == nullptr reads a dangling pointer value (undefined behavior). The fix is to capture the boolean before the delete: add bool hadPreviousRoute = (current != nullptr); near line 961 (before deleteLocRibEntry is called), and then use hadPreviousRoute instead of oldEntry == nullptr at line 1009.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if (startupTime == SIMTIME_ZERO) | ||
| startupTime = SimTime(1, SIMTIME_S); // TODO what value should go here? |
There was a problem hiding this comment.
🚩 Forced 1-second startup delay on restart overrides user configuration
In Bgp::startBgp() at src/inet/routing/bgpv4/Bgp.cc:129-130, when the router restarts (simTime() > 0) and the user-configured startupTime parameter is 0s, the code forces it to 1 second with a TODO comment. This means the startupTime parameter's documented default of 0s (from src/inet/routing/bgpv4/Bgp.ned:58) is silently overridden on every restart. While the comment acknowledges this is a workaround to avoid race conditions with interface bring-up, it creates non-obvious behavior: a router that started instantly at t=0 will take 1s to restart after a crash/shutdown, regardless of user intent. The value of 1s is arbitrary and may be too long or too short depending on the scenario.
Was this helpful? React with 👍 or 👎 to provide feedback.
| STAGE_LOCAL, // for changes that don't depend on other modules | ||
| STAGE_PHYSICAL_LAYER, | ||
| STAGE_LINK_LAYER, | ||
| STAGE_NETWORK_INTERFACE_CONFIGURATION, |
There was a problem hiding this comment.
🚩 New STAGE_NETWORK_INTERFACE_CONFIGURATION may affect other modules' lifecycle ordering
The new STAGE_NETWORK_INTERFACE_CONFIGURATION stage inserted in ModuleOperations.h between STAGE_LINK_LAYER and STAGE_NETWORK_LAYER (for start) and between STAGE_NETWORK_LAYER and STAGE_LINK_LAYER (for stop) changes the numeric values of all subsequent stage enums. Any module that uses a hardcoded integer (rather than the enum name) for its lifecycle stage will now execute at the wrong time. The Ipv6RoutingTable is updated to use this new stage (src/inet/networklayer/ipv6/Ipv6RoutingTable.cc:1046), but other modules should be audited to ensure they still use the correct symbolic stage names and that the new ordering doesn't create unexpected dependencies.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
This PR adds Adj-RIB-In based route tracking to the BGP implementation so routes learned from peers are retained separately from the selected Loc-RIB route. When a peer withdraws a prefix or a BGP session fails, the router now removes only the affected learned paths, reruns best-path selection for those prefixes, and either installs an alternate route or advertises a withdrawal.
Details
BgpRouterfor all peer-learned routes.Examples
Adds two new diamond-topology examples that demonstrate failover, failback, and a second failure:
examples/bgpv4/BgpDiamondFailoverexamples/bgpv4/BgpDiamondFailover6Both examples prefer the
A-B-Cpath initially, fail over toA-D-Cwhen routerBgoes down, fail back afterBrestarts, and then exercise failover again after a crash.Testing