Free Website Security Scanner | Protect Your Site

This is a follow-up to my earlier post about OPNsense's failover system killing active connections when no failover target exists. After identifying the bug, I wrote a patch, submitted a pull request, and OPNsense closed it without merging. The conversation was locked.

So this post is for anyone running a similar setup: a primary WAN with a standby backup—a USB 5G modem, a Starlink dish, a second ISP—that you plug in when you need it and leave disconnected the rest of the time. You want state killing to work when the backup is available, but not when it isn't.

The Problem, Briefly

OPNsense gateway groups support a “Failover States” option that kills active connections when a gateway goes down, forcing traffic onto the backup. This is useful—it's exactly what you want during a real failover event.

The problem: it fires unconditionally. If your backup interface is disabled, unplugged, or also marked down, OPNsense kills your states anyway. The routing system is smart enough to skip the failover—it logs ROUTING: ignoring down gateways—but the state-killing logic doesn't check. Connections die for nothing.

What I Tried to Fix

I submitted PR #9823 which added a has_viable_failover_target() check to the monitor_killstates function. Before killing states, it would verify that at least one other group member was actually up and reachable, or that default gateway switching could provide an alternative. If no viable target existed, it would skip the state kill and log:

ROUTING: skipping state kill for gateway WAN_DHCP, no viable failover target

Simple, targeted, and it preserved the existing behavior for every scenario where failover actually had somewhere to go.

OPNsense's Response

The maintainer closed the PR, calling it a “non-issue” with too much code to be worth discussing. The conversation was locked. This followed the same outcome as the original bug report (#9789), which was closed as a “configuration issue.”

The suggested configuration fixes don't address the actual use case: wanting state killing to work when a backup is present, but not when it isn't. That's not a configuration problem—it's a missing condition.

How to Protect Yourself

Since the fix won't be coming upstream, here are your options depending on your setup:

Option 1: Disable Gateway Monitoring (Simplest)

If you only have one active WAN and the backup is rarely connected, disable monitoring on the primary gateway entirely. Go to System → Gateways → Single, edit your WAN gateway, and uncheck monitoring. This prevents dpinger from declaring the gateway down, so the state-killing logic never triggers.

Downside: you lose gateway-down alerts. Replace them with an independent monitor like Monit:

check host cloudflare with address 1.1.1.1
    if failed ping count 5 with timeout 10 seconds then alert

Option 2: Raise the Latency Thresholds

Under your gateway settings, increase the Latency High Threshold and Loss Threshold so dpinger doesn't escalate to “down” over slow ICMP responses. Defaults are aggressive—a few hundred milliseconds of latency with zero packet loss shouldn't trigger a failover event.

Option 3: Disable Failover States in the Gateway Group

Under System → Gateways → Group, you can disable the “Failover States” option. This stops state killing entirely for that group. The downside: when you do plug in your backup and a real failover happens, existing connections won't be killed and migrated—they'll hang until they time out naturally.

Option 4: Manually Apply the Patch

If you want the actual fix—state killing that checks whether a viable failover target exists before destroying connections—you can apply the patch from PR #9823 yourself. The changes are to a single file in the OPNsense routing subsystem.

Warning: Any OPNsense firmware update will overwrite your changes. You'll need to re-apply the patch after every update. This is a manual maintenance burden—only go this route if you're comfortable with that and understand the code you're modifying.

None of these options give you what the patch would have: state killing that's aware of whether failover is actually possible. They're all trade-offs between losing monitoring, losing alerts, or losing state migration during real failover events.

Which Option to Pick

If your backup WAN is rarely connected (you plug it in during outages): go with Option 1. Disable monitoring, set up Monit or an external ping check for alerts, and avoid the state-killing problem entirely. When you plug in the backup and need failover, temporarily re-enable monitoring or manually switch gateways.

If your backup WAN is always connected but you're hitting false positives from dpinger: go with Option 2. Keep monitoring active but make it less trigger-happy.

If you don't care about fast connection migration during failover: Option 3 is the least disruptive. Connections will eventually re-establish on the backup gateway on their own.

If you want the proper fix and don't mind re-applying it after updates: Option 4. This is the only option that gives you intelligent state killing—connections get killed when there's somewhere to fail over to, and left alone when there isn't.

The patch that would have fixed this properly is available in PR #9823. If OPNsense revisits it in the future, the logic is there. Until then, the workarounds above will keep your connections alive.