intermittent availability issues ch1-lagoon
Incident Report for amazee.io
Resolved
This incident has been resolved.
Posted Jan 16, 2019 - 18:00 UTC
Update
We observed a smaller downtime caused by one of the compute nodes today. As immediate step we took the node out of the cluster. During the scheduled maintenance all workloads on this node will be shifted to other compute nodes.
Posted Jan 15, 2019 - 15:39 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jan 15, 2019 - 09:33 UTC
Identified
Increased load may cause some containers to restart slowly as the cluster stabilizes.
We'll continue to mitigate the issue and will update as new information becomes available.
Posted Jan 14, 2019 - 18:43 UTC
Update
We're still seeing some load issues. Our team continues to work on a mitigation for the availability issues observed.
Posted Jan 14, 2019 - 18:31 UTC
Update
During investigation we've noticed increased load on the compute nodes. We're currently scaling the the infrastructure temporarily to handle increased load. This action can lead to short downtimes while the containers restart.

Further to this we identified an issue with our infrastructure provider which needs short maintenance action which should not cause any downtime. This action will be carried out at 21:00 CET.

Currently the systems look stable. Thus we're switching the status back to Operational. We'll be updating this issue as new information becomes available.
Posted Jan 14, 2019 - 17:38 UTC
Investigating
We're currently investigating intermittent availability issues with sites hosted on ch1-lagoon
Posted Jan 14, 2019 - 15:26 UTC