Fail2Ban is a piece of software which can watch log files and take an arbitrary action when a certain number of matches are found.
It is most commonly used to read logs from an SSH daemon in order to insert a firewall rule against hosts that repeatedly fail to log in. Hence Fail → Ban.
Wherever possible, it is best to require public key and/or multi-factor authentication for SSH login. Then, it does not matter how many times an attacker tries to guess passwords as they should never succeed. It’s just log noise.
Sadly I have some hosts where some users require password authentication to be available from the public Internet. Also, even on the hosts that can have password authentication disabled, it is irritating to see the same IPs trying over and over.
Putting SSH on a different port is not sufficient, by the way. It may cut down the log noise a little, but the advent of services that scan the entire Internet and then sell the results has meant that if you run an SSH daemon on any port, it will be found and be the subject of dictionary attacks.
The usual firewall on Linux is iptables. By default, when Fail2Ban wants to block an IP address it will insert a rule and then when the block expires it will remove it again.
iptables Interaction With Configuration Management
That worked great when the firewall rules were only managed in the config management, but Fail2Ban introduces firewall changes itself.
Now, it’s been many years since I moved on from Puppet so perhaps a way around this has been found there now. At the time though, I was using the Puppetlabs firewall module and it really did not like seeing changes from outside itself. It would keep reverting them.
It was possible to tell it not to meddle with rules that it didn’t add, but it never did work completely correctly. I would still see changes at every run.
Blackholes To The Rescue
I never did manage to come up with a way to control the firewall rules in Puppet but still allow Fail2Ban to add and remove its rules and chains, without there being modifications at every Puppet run.
Instead I sidestepped the problem by using the “route” action of Fail2Ban instead of the “iptables” action. The “route” action simply inserts a blackhole route, as if you did this at the command line:
# ip route add blackhole 192.168.1.1
That blocks all traffic to/from that IP address. Some people may have wanted to only block SSH traffic from those hosts but in my view those hosts are bad actors and I am happy to drop all traffic from/to them.
Problem solved? Well, not entirely.
Multiple Jailhouse Blues
Fail2Ban isn’t just restricted to processing logs for one service. Taken together, the criteria for banning for a given time over a given set of log files is called a jail, and there can be multiple jails.
When using iptables as the jail action this isn’t much of an issue because the rules are added to separate iptables chains named after the jail itself, e.g. f2b-sshd. You can therefore have the same IP address appearing in multiple different chains and whichever is hit first will ban it.
A common way to configure Fail2Ban is to have one jail banning hosts that have a short burst of failures for a relatively short period of time, and then another jail that bans persistent attackers for a much longer period of time. For example, there could be an sshd jail that looks for 3 failures in 3 minutes and bans for 20 minutes, and then an sshd-hourly jail that looks for 5 failures in an hour and bans for a day.
This doesn’t work with the “route” action because there is only one routing table and you can’t have duplicate routes in it.
Initially you may think you can cause the actual execution of the actions to still succeed with something like this:
actionban = ip route add blackhole || true actionunban = ip route del blackhole || true
i.e. force them to always succeed even if the IP is already banned or already expired.
The problem now is that the short-term jails can remove bans that the long-term jails have added. It’s a race condition as to which order the adds and removes are done in.
Ansible iptables_raw Deal
As I say, I switched to Ansible quite a while ago, and for firewalling here I chose the iptables_raw module.
This has the same issues with changed rules as all my earlier Puppet efforts did.
The docs say that you can set keep_unmanaged and then rules from outside of this module won’t be meddled with. This is true, but still Ansible reports changes on every host every time. It isn’t actually doing a change, it is just noting a change.
I think this is because every time iptables_raw changes the rules, it uses iptables-save to save them out to a file. Then Fail2Ban adds and removes some rules, and next time iptables_raw compares the live rule set with the save file that it saved out last time. So there’s always changes (assuming any Fail2Ban activity).
Someone did ask about the possibility of ignoring some chains, which would be ideal for ignoring all the f2b-* chains, but the response seems to indicate that this will not be happening.
So I am still looking for a way to manage Linux host firewalls in Ansible that can ignore some chains and not want to be in sole control of all rules.
That is a possibility, but if I am going to rewrite all of that I think I should probably do it with something that is going to support nftables, which ferm apparently isn’t.
The Metric System
All is not lost, though it is severely bodged.
Routes can have metrics. The metric goes from 0 to 9999, and the lower the number the more important the route is.
There can be multiple routes for the same destination but with different metrics; for example if you have a metric 10 route and a metric 20 route for the same destination, the metric 10 route is chosen.
That means that you can use a different metric for each jail, and then each jail can ban and unban the same IPs without interfering with other jails.
Here’s an action file for the action “route-metric”:
[Definition] actionban = ip route add blackhole metric actionunban = ip route del blackhole metric
On Debian you might put that in a file called /etc/fail2ban/action.d/route-metric.conf and then in a jail definition use it like this:
[sshd-hourly] logpath = /var/log/auth.log filter = sshd enabled = true action = route-metric[metric=9998] # 5 tries maxretry = 5 # in one hour findtime = 3600 # bans for 24 hours bantime = 86400
Just make sure to use a different metric number (9998 here) for each jail and that solves that problem.
Clearly that doesn’t solve it in a very nice way though. If you use Ansible and manage your firewall rules in it, what do you use?
Possibly this could instead be worked around by having multiple routing tables.