ch1-lagoon - Instabilities
Incident Report for amazee.io
Resolved
After implementing additional mitigations on an infrastructure level on last Friday, our monitoring has not encountered any instabilities anymore. We therefore conduct that the issues have been resolved.
Posted 4 days ago. Sep 17, 2018 - 10:46 CEST
Update
We continued monitoring the instability issues in the last 24h. The implemented fix resolved most of the instabilities, but unfortunately there are still some small instabilities for a 20 seconds every couple of hours.
Therefore we conducted another all-hands meeting with all involved parties (amazee.io, Hosting Partner, Infrastructure Partner) and implemented some additional monitoring on the infrastructure.
This monitoring allowed us to learn which virtual machine is the root cause of the issue and we are investigating what exactly causes the issue on that machine.

We will update as soon as we know more or the instabilities are fully resolved.
Posted 7 days ago. Sep 14, 2018 - 13:33 CEST
Update
We are continuing to monitor for any further issues.
Posted 8 days ago. Sep 13, 2018 - 13:52 CEST
Update
During the course of the day we implemented a few changes on the infrastructure to further stabilize the situation. Since around 16:30 CEST the connection issues have ceased. We continue to monitor the situation closely.
Posted 8 days ago. Sep 12, 2018 - 21:35 CEST
Update
Our engineers found some irregularities in the network stack today. We're restarting all machines during the maintenance window and check back with the infrastructure provider if the issues are gone. So far the situation should be more stable since the late afternoon as we started to implement another fix. We'll update this incident as soon as new information becomes available.

Currently we also plan to get call in a all hands meeting with all involved parties to discuss the issue at hand tomorrow morning September 12 - Morning CEST.
Posted 9 days ago. Sep 11, 2018 - 23:10 CEST
Update
We're currently adding additional nodes to the cluster.
Posted 13 days ago. Sep 07, 2018 - 15:01 CEST
Update
We implemented a fix and the situation looks stable now. We started planning adding more resources to the cluster before the weekend. We will update the ticket as soon as new information becomes available.
Posted 14 days ago. Sep 07, 2018 - 00:32 CEST
Update
We're currently investigating issues on ch1-lagoon. We'll update as soon as new information will become available.
Posted 14 days ago. Sep 06, 2018 - 22:56 CEST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted 15 days ago. Sep 05, 2018 - 17:32 CEST
Investigating
The fix implemented earlier didn't solve the issue. We're looking into the issue.
Posted 16 days ago. Sep 05, 2018 - 13:14 CEST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted 16 days ago. Sep 05, 2018 - 11:46 CEST
Investigating
We're seeing services flapping on ch1-lagoon our engineers look into the issue.
Posted 16 days ago. Sep 05, 2018 - 11:45 CEST
This incident affected: Switzerland (ch1.lagoon).