My experience is with systems that handle nearly 1000 pageviews per second. We did use a spread of haproxy servers to handle routing and SNI, but they were being fed offender lists by external analysis tools (built in-house).
Most often because they don't download any of the css of external js files from the pages they scrape. But there are a lot of other patterns you can detect once you have their traffic logs loaded in a time series database. I used an ELK stack back in the day.
When I was serving high volume sites (that were targeted by scrapers) I had a collection of files in CDN that contained nothing but the word "no" over and over. Scrapers who barely hit our detection thresholds saw all their requests go to the 50M version. Super aggressive scrapers got the 10G version. And the scripts that just wouldn't stop got the 50G version.
It didn't move the needle on budget, but hopefully it cost them.
I went long enough without using Google (probably a year-ish) that, when I accidentally made a Google search a few days ago, it was a jarring experience.
It felt wrong the same way other search engines did when I first deGoogled. It was kind of nice actually.
Yeah. They all come with risks, but I psychologically struggle to run shell scripts unless I know what's in them. And the same brain dysfunction makes my automatically distrust a script that doesn't set pipefail.
I never fully trust a shell script and usually end up reading any I have to use first, so I know what they do. And after so many years dpkg holds no mysteries for me and Discover will install .debs if I double click while in KDE.
A stab at my personal ranking: .deb > appimage > flatpack > curling a shell script
I can't help but love a .deb file (even when not via repo), I've almost exclusively used Debian and it derivatives since the late 90s. And snap isn't on the list because it got stored in a loopback device I removed.
When I was in college my roommates and I would open all those offers standing at the mailbox, seal the empty envelopes back up, then put then right back in the mailbox for the carrier to grab the next day (or maybe mail thieves, who knows). We figured just mailing them all back was going to cost something.
This terrible chemical directly caused the pressure increase that lead to Chernobyl.