Site unavailability due to Storage Backend issues

Incident Report for amazee.io

Resolved

This incident has been resolved. We'll follow up with a post mortem next week.

Posted May 12, 2021 - 20:53 UTC

Monitoring

The situation remains stable - We're monitoring the storage backend for issues and will take appropriate action if needed.

Posted May 12, 2021 - 08:36 UTC

Update

In order to reduce the load on the Storage Backend we scaled down the development environments on CH1. The environments will be automatically scaled up again as soon as the load of the Storage Backend allows us to do so.
If you need a specific development environments scaled up earlier, you can either trigger a deployment of the environment or contact the amazee.io support.

Posted May 06, 2021 - 11:00 UTC

Update

We are continuing to work on a fix for this issue.

Posted May 06, 2021 - 10:37 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted May 06, 2021 - 10:01 UTC

Update

We are continuing to monitor for any further issues.

Posted May 06, 2021 - 09:28 UTC

Update

We see that a subset of sites is still impacted by the underlying storage issues. We're working on fixing the situation for those sites.

Posted May 06, 2021 - 09:07 UTC

Monitoring

We're continuing the work in the background - The situation is stable again. If you see issues with your sites feel free to get back to support via support@amazee.io or via Slack / Rocketchat

Posted May 05, 2021 - 13:31 UTC

Update

We're continuing to work on the situation - Some sites might see intermittent availability issues.

Posted May 05, 2021 - 09:59 UTC

Identified

We're still looking into this issue - The maintenance actions during yesterdays maintenance seem not to have solved the storage issues we were encountering.

Posted May 05, 2021 - 08:54 UTC

Update

We are continuing to monitor for any further issues.

Posted May 05, 2021 - 08:51 UTC

Update

We still see some sites having issues with the pressure on the storage backend. We're still working on resolving this issue fully.

Posted May 05, 2021 - 08:21 UTC

Monitoring

The situation remains stable - We're monitoring everything and started to adapt maintenance of tonight to accommodate further steps on the issue we observed today.

Posted May 04, 2021 - 11:53 UTC

Update

The situation is further stabilizing - We're working on a permanent fix for this.

Posted May 04, 2021 - 11:41 UTC

Update

We are continuing to work on a fix for this issue.

Posted May 04, 2021 - 10:54 UTC

Update

We are continuing to work on a fix for this issue.

Posted May 04, 2021 - 10:49 UTC

Update

We are continuing to work on a fix for this issue.
We've involved engineers from our Infrastructure provider to look into this issue together with our engineers.

Posted May 04, 2021 - 10:47 UTC

Update

We are continuing to work on a fix for this issue.

Posted May 04, 2021 - 08:48 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted May 04, 2021 - 08:10 UTC

Investigating

We are currently investigating this issue.

Posted May 04, 2021 - 07:11 UTC

This incident affected: General (Lagoon API, Deployment Infrastructure, Lagoon Dashboard, Lagoon Logs (OpenSearch)).