Investigating error rate increasing in our API

Incident Report for DoControl

Resolved

AWS confirms that the issue is now resolved by AWS.
to a summary of this event.
from 3:00 am UTC till 11:00 am UTC we saw an error that infected our platform, around 25% of the incoming traffic didn't process. the rest of the errors were backfill by us.
Posted Oct 14, 2021 - 13:29 UTC

Update

To share more information: some of the workloads were failed to process.
in most cases, we was able to reprocess them but some incoming webhooks were failed to ingest to our platform.
Posted Oct 14, 2021 - 11:21 UTC

Monitoring

We are in touch with aws on this incident, they confirm it coming from US-EAST-1 region and push a fix for that. we are still monitoring to see if this issue was resolved.
Posted Oct 14, 2021 - 11:16 UTC

Investigating

Some of the API calls end up with 500 errors and displaying an error message from the UI.
Posted Oct 14, 2021 - 09:17 UTC
This incident affected: Backend API (Graphql Api).