Sharpen your debugging skills with realistic production scenarios. Built for both learners and hiring managers.
Authentication Service Latency
Users are reporting intermittent slow login times, sometimes leading to timeouts. Investigate the authentication service and its dependencies to find the root cause of the latency.
An image processing service is crashing with Out of Memory (OOM) errors. This happens sporadically, but seems to be related to high-resolution image uploads. Find the memory leak.
Orders are failing intermittently during checkout. The payment service is returning 500 errors, but only for some users. Customer support is overwhelmed with complaints about failed purchases.
A recently deployed microservice is stuck in CrashLoopBackOff on Kubernetes. The deployment passed CI/CD and the container builds fine, but the pods keep restarting. The team is blocked on shipping the new release.
Your application is hitting rate limits on a third-party geocoding API, causing address lookups to fail for users. This started after a "minor refactor" was deployed yesterday. No new features were added.
A microservices platform is experiencing random connection failures between services. The errors are sporadic and affect different service pairs at different times. Infrastructure team says "nothing changed." SRE is escalating.