Regular Weekly Maintenance - Europe - extended
Incident Report for

During weekly maintenance we updated the ch1.lagoon OpenShift Cluster to Version v3.9.57. Unfortunately this version has an unknown regression where the “subPath” functionality of VolumeMounts is not functional. This functionality is used by all mariadb-galera clusters to mount their persistent data volumes into the container.

Because of this all mariadb-galera clusters on the ch1.lagoon OpenShift where not operational anymore.

After a first analysis of the problem the maintenance and on-call engineers decided that a downgrade to the previous OpenShift Version was not possible (as it would open us up to the CVE-2018-1002105 vulnerability). Instead we decided to release a Hotfix of Lagoon which removes the usage of the “subPath” functionality in mariadb-galera clusters and brings an automated migration script:

After a deployment of the affected mariadb-galera clusters, they fully bootstrapped and where operational again.

We are in contact with RedHat in order to see how such a regression was able to be released in the v3.9.57 version of OpenShift as there should be automated tests for it.

As soon as we have more information we will update this Post Mortem.

Posted 6 months ago. Dec 20, 2018 - 00:50 CET

This incident has been resolved.
Posted 6 months ago. Dec 19, 2018 - 04:42 CET
A hotfix was rolled out for the regression and the cluster is now stabilizing.
Posted 6 months ago. Dec 19, 2018 - 03:19 CET
A regression was discovered during maintenance, we are working now to patch the issue.
Posted 6 months ago. Dec 19, 2018 - 02:49 CET
Today's maintenance is running longer than usual.
Posted 6 months ago. Dec 19, 2018 - 02:09 CET
This incident affected: Germany (de1.compact), Finland (fi1.compact), South Africa (sa1.compact), Japan (jp1.compact), United Kingdom (uk1.compact, uk2.compact), General (dev1.compact), Switzerland (zh1.compact, zh2.compact, ch1.lagoon), and on-premise servers.