June 08, 2020

I hang moderately of net page I exhaust for some search engine advertising and marketing and marketing and marketing experiments. For sure there’s some protest material and a facebook sharing button for every and every put up.

The net page is so diminutive it runs on a “single controller” PHP app + a 400kb SQLite db, nonetheless can generate hundreds of a bunch of pages.

The total lot is hosted (alongside with a bunch of alternative net sites) on an inexpensive DigitalOcean machine + free cloudflare notion for some caching. Regarded as one of those net sites as some alerting and it started to alert me about being down.

After some investigations I’ve stumbled on out the topic… the Facebook Crawler

That crawler changed into as soon as making extra than 7M requests per day (with a peak of 300req/2d) to that net page.

Their doc changed into as soon as no longer helping on block the bot.

  • og:ttl -> disregarded
  • robots.txt -> disregarded
  • HTTP 429 -> disregarded

I needed to block the user-agent the usage of cloudflare guidelines.

If there’s somebody working on that crawler reading this, please stop ignoring traditional Web netiquette about crawlers.

Next time you can hit somebody on AWS. After which they’ll seemingly set a ask to you to pay the invoice 😉