Rafał is an Engineer, Architect, Manager (in this order), remembering times when Java 1.3 was a thing ;-). Working on solving non-functional requirements for complex systems. Switching focus between Application Design and Infrastructure. Nowadays leading a DevOps team, learning a lot about new technologies from teammates, sharing the experience about solving ever occurring issues impacting Reliability, High Availability and Stability. Privately surprised about many things there are in common between Photography and Sailing and Software Design.
Everyone expects stability from software systems. At the same time we expect to be able to deploy features frequently.
Every such deployment - by definition - introduces instability while the change is being deployed. Is this a conflict that can be solved?
In this talk we will guide you through our team journey towards maintaining Reliability and Stability of a ever changing system built from dozens of micro services supporting critical Tier 0 Tesco Identity Service.
We will present how we tackled this problem relying on our Team Culture involving processes and procedures, as well as solutions like Observability, Alerts, Deployment Automation and Certificate Management. We will not offer a silver bullet to kill a chaos daemon. Instead we will present pros and cons of the solutions we carefully selected to improve stability in our system.
Searching for speaker images...