You will take ownership of the operational performance of mission-critical services, ensuring they meet agreed SLAs, SLOs, and business expectations. The role includes leading distributed teams across geographies and time zones, strengthening collaboration between operations, engineering, vendors, and business stakeholders, and building a culture of accountability and continuous improvement.
You will play a key role in crisis situations, leading cross-functional response teams, driving structured incident management, and ensuring effective executive communication. The role also includes shaping and governing a 24/7 Operations Centre / NOC-like function with clear escalation paths, on-call models, and operational procedures.
Continuous improvement is central, with responsibility to identify structural weaknesses, reduce recurring incidents, and translate insights into measurable improvements in availability, recovery, and operational predictability. You will work closely with engineering teams to ensure operability and resilience are built into solutions while promoting automation, observability, and infrastructure as code.