Logging Sucks

Recently, my manager shared an interactive article with me:

Logging Sucks

And honestly, it’s true. When building software, we rarely add the data we need to effectively debug where things go wrong.

Logs were designed for a different era. An era of monoliths, single servers, and problems you could reproduce locally. Today, a single user request might touch 15 services, 3 databases, 2 caches, and a message queue. Your logs are still acting like it’s 2005. – Boris Tane, loggingsucks.com

Trudging through lines in a log file—that don’t give you any context—get quickly derailed by concurrent requests.

That easy to read log in your local environment gets overwhelmed by real traffic. Queues stall, checkouts fail, and you have no idea why when looking through a 10mb error.log

Instead, use wide events and add everything necessary to the request to show the full context. Then load it into a service that’s easy to query and you’re able to extract critical information:

  • Which customers are unable to checkout? Enterprise or Pro?
  • How many cart failures have there been after deploying version 3.45.1?
  • When did the email queue stall?

If you have time today, it’s a great read and made me rethink my observability strategy. And It may just save you from a critical outage.