Resolved -
The API has remained stable so we are marking this incident as resolved.
Jun 12, 14:15 UTC
Monitoring -
We have been able to stop the memcache servers from restarting and our latency and error rates are trending back to normal. We are continuing to monitor the situation.
Jun 12, 13:20 UTC
Update -
We are experiencing elevated latency and error rates.
The root cause is that our memcache cluster is doing an undesirable rolling restart, each time a node restarts we’re experiencing momentary elevated latencies and error rates. About half of the nodes have restarted, we’re attempting to mitigate by adding server and database capacity, but once all nodes have restarted things should be smooth.
Server to server calls are primarily affected, the SDK has mitigations that should reduce impact to zero.
Jun 12, 13:08 UTC
Identified -
We are facing issues with our caching infrastructure and are temporarily increasing database capacity to handle the load.
Jun 12, 12:33 UTC
Investigating -
We are facing issues with our caching infrastructure and are temporarily increasing database capacity to handle the load.
Jun 12, 12:29 UTC