Energy / Infrastructure·Product Owner — Platform Engineering

Platform Observability @ Enexis

2022 – Present

TL;DR

Problem: Fragmented monitoring across 12+ teams with no unified observability, causing slow incident response
Action: Designed and owned a centralized observability platform (Grafana/Prometheus) with SLO-based alerting and self-service dashboards
Outcome: 65% MTTD reduction, 80+ dashboards, 99.9% uptime SLO achieved across critical infrastructure

65%

MTTD Reduction

12+

Teams Onboarded

80+

Dashboards Created

99.9%

Uptime SLO

CONTEXT

Building a scalable observability platform for the Dutch energy grid, enabling real-time monitoring of critical infrastructure across 3M+ connections.

THE PROBLEM

Enexis lacked unified visibility into platform health. Teams operated in silos with fragmented monitoring, leading to slow incident response and blind spots in system reliability.

CONSTRAINTS

—Legacy infrastructure with heterogeneous tech stacks across teams
—Strict compliance and data governance requirements in the energy sector
—Needed to onboard 12+ teams without dedicated platform engineers per team
—Budget constraints required open-source-first tooling strategy

THE APPROACH

Designed and implemented a centralized observability stack using Grafana, Prometheus, and custom dashboards. Introduced SLO-based alerting, runbooks, and a platform-as-product mindset to shift from reactive firefighting to proactive reliability engineering.

THE OUTCOME

Mean Time To Detection dropped by 65%. Platform teams gained self-service dashboards, and incident postmortems became data-driven. The observability platform became a shared capability across 12+ engineering teams.

MY ROLE & OWNERSHIP

As Product Owner, I owned the full observability platform roadmap. I defined the platform vision, prioritized the backlog based on team adoption metrics and incident data, and worked directly with SREs to design alerting strategies. I drove stakeholder alignment across engineering leadership to secure buy-in for the platform-as-product approach. Key ownership areas: roadmap, backlog prioritization, SLO definitions, team onboarding strategy, and vendor/tool evaluation.

LEARNINGS

→Platform adoption is a product challenge, not a technical one — onboarding UX and documentation matter more than features
→SLO-based alerting dramatically reduces alert fatigue vs threshold-based approaches
→Self-service dashboards scale better than centralized dashboard teams
→Incident postmortems become 10x more valuable when backed by observability data

GrafanaPrometheusLokiKubernetesAzureTerraform

Want to know more about this project or work together?

GET IN TOUCH