We observed a smaller downtime caused by one of the compute nodes today. As immediate step we took the node out of the cluster. During the scheduled maintenance all workloads on this node will be shifted to other compute nodes.
Posted 4 months ago. Jan 15, 2019 - 16:39 CET
A fix has been implemented and we are monitoring the results.
Posted 4 months ago. Jan 15, 2019 - 10:33 CET
Increased load may cause some containers to restart slowly as the cluster stabilizes. We'll continue to mitigate the issue and will update as new information becomes available.
Posted 4 months ago. Jan 14, 2019 - 19:43 CET
We're still seeing some load issues. Our team continues to work on a mitigation for the availability issues observed.
Posted 4 months ago. Jan 14, 2019 - 19:31 CET
During investigation we've noticed increased load on the compute nodes. We're currently scaling the the infrastructure temporarily to handle increased load. This action can lead to short downtimes while the containers restart.
Further to this we identified an issue with our infrastructure provider which needs short maintenance action which should not cause any downtime. This action will be carried out at 21:00 CET.
Currently the systems look stable. Thus we're switching the status back to Operational. We'll be updating this issue as new information becomes available.
Posted 4 months ago. Jan 14, 2019 - 18:38 CET
We're currently investigating intermittent availability issues with sites hosted on ch1-lagoon