Site unavailability due to Storage Backend issues
Incident Report for amazee.io
Resolved
This incident has been resolved. We'll follow up with a post mortem next week.
Posted May 12, 2021 - 20:53 UTC
Monitoring
The situation remains stable - We're monitoring the storage backend for issues and will take appropriate action if needed.
Posted May 12, 2021 - 08:36 UTC
Update
In order to reduce the load on the Storage Backend we scaled down the development environments on CH1. The environments will be automatically scaled up again as soon as the load of the Storage Backend allows us to do so.
If you need a specific development environments scaled up earlier, you can either trigger a deployment of the environment or contact the amazee.io support.
Posted May 06, 2021 - 11:00 UTC
Update
We are continuing to work on a fix for this issue.
Posted May 06, 2021 - 10:37 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted May 06, 2021 - 10:01 UTC
Update
We are continuing to monitor for any further issues.
Posted May 06, 2021 - 09:28 UTC
Update
We see that a subset of sites is still impacted by the underlying storage issues. We're working on fixing the situation for those sites.
Posted May 06, 2021 - 09:07 UTC
Monitoring
We're continuing the work in the background - The situation is stable again. If you see issues with your sites feel free to get back to support via support@amazee.io or via Slack / Rocketchat
Posted May 05, 2021 - 13:31 UTC
Update
We're continuing to work on the situation - Some sites might see intermittent availability issues.
Posted May 05, 2021 - 09:59 UTC
Identified
We're still looking into this issue - The maintenance actions during yesterdays maintenance seem not to have solved the storage issues we were encountering.
Posted May 05, 2021 - 08:54 UTC
Update
We are continuing to monitor for any further issues.
Posted May 05, 2021 - 08:51 UTC
Update
We still see some sites having issues with the pressure on the storage backend. We're still working on resolving this issue fully.
Posted May 05, 2021 - 08:21 UTC
Monitoring
The situation remains stable - We're monitoring everything and started to adapt maintenance of tonight to accommodate further steps on the issue we observed today.
Posted May 04, 2021 - 11:53 UTC
Update
The situation is further stabilizing - We're working on a permanent fix for this.
Posted May 04, 2021 - 11:41 UTC
Update
We are continuing to work on a fix for this issue.
Posted May 04, 2021 - 10:54 UTC
Update
We are continuing to work on a fix for this issue.
Posted May 04, 2021 - 10:49 UTC
Update
We are continuing to work on a fix for this issue.
We've involved engineers from our Infrastructure provider to look into this issue together with our engineers.
Posted May 04, 2021 - 10:47 UTC
Update
We are continuing to work on a fix for this issue.
Posted May 04, 2021 - 08:48 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted May 04, 2021 - 08:10 UTC
Investigating
We are currently investigating this issue.
Posted May 04, 2021 - 07:11 UTC
This incident affected: General (API, Deployment Infrastructure, Lagoon Dashboard, Lagoon Logs (Kibana)) and Switzerland (ch1.lagoon).