Account Access and Trading Outage Postmortem 04/26/2022

April 26, 2022
Newton Team
April 26, 2022
Newton Team
BACK

Root causes:

  • Unexpected networking issues bumped our market data services off of the production load balancer, this led to timeouts on the core application's "rates" endpoint which caused trades to fail.
  • These timeouts caused a backlog of requests for our core application's containers to build up. This backlog quickly overwhelmed the containers, crashing them and triggering a replacement. The replacements would then quickly get overwhelmed, crashing them, and continuing the cycle.

Fixes:

  • Set a short timeout on calls to our market data services from the core application to prevent timeouts from tying up threads for extended periods of time.
  • Improve image caching and tune CPU resources to reduce the risk of a cascading failure
  • In the long term we will continue moving more functionality out of the core application and into microservices so that future failures won't impact unrelated parts of the application.

Thank you for your patience and understanding while we worked to fix this issue.

- The Newton team

Want to help us shape the products we build? We're always looking for valuable insight and feedback from our users. If you have features you'd like to see or are interested in joing our user research group, email support@newton.co

Newton Team

Follow Newton on LinkedIn
Follow Newton on YouTube
Follow Newton on LinkedIn
Follow Newton on Twitter

BACK TO BLOG
join our research group