Program Management

The Only Three Questions Your TPM Dashboard Should Answer

Most TPM dashboards track the wrong things. Build success rates and sprint velocity tell you nothing about what matters. Every metric should answer one of three questions: Can we ship safely? Are teams fast enough? Will customers succeed? Stop measuring everything. Start measuring what matters.

Someone asked me last week: "As a TPM, what would you want on your dashboard?"

I gave them an answer about build health, test coverage, and technical debt. But the question stuck with me. After 11 years at Microsoft, building platforms for Fortune 100 companies, what information helps me do my job?

The answer isn't more metrics. It's better questions.

Every metric on your dashboard should answer one of three questions:

Can we ship safely?
Are teams moving fast enough?
Will customers succeed with what we're building?

If a metric doesn't answer one of these questions, delete it.

Why Traditional Metrics Fail

Walk into any engineering organization and you'll see the same dashboards. Build success rate: 94%. Code coverage: 85%. Average build time: 12 minutes. Sprint velocity: 47 points.

These numbers make you feel informed while hiding what matters.

I've seen teams whose dashboards show all green metrics. 95% build success. 90% code coverage. 15-minute average build time. Leadership loves it.

These are watermelon metrics - green on the outside, red underneath.

What's actually happening? P95 build times are 2+ hours. Security scans pass 95% of the time because developers turn off the hard checks. Code coverage is high because teams test getters and setters, not business logic.

Teams ship on schedule. They also ship security breaches. The dashboard shows green the entire time.

Traditional metrics fail because they measure activity, not results. They track what's easy to measure, not what matters.

Question 1: Can We Ship Safely?

This is the foundation. If you can't ship safely, nothing else matters.

Most teams think they answer this with test coverage and defect counts. They're wrong. Safety isn't about how many tests you have. It's about catching problems before customers do.

Security Gate Pass Rate (First Attempt) Track how often code passes security scans on the first try. If developers understand your security requirements, they write secure code from the start. Show this as a trend line, not a single number. Watch for patterns - are new developers struggling? Did you add new rules without training?

Time to Green When builds break, how long until they're fixed? Track the 95th percentile to catch the outliers. DORA research shows elite teams recover from failures in under an hour.

Production Incident Root Cause Patterns Don't count incidents. Group them. Configuration errors? Missing tests? Dependency failures? Track the top 3 patterns weekly. When the same root cause appears twice, you have a system issue.

Change Failure Rate DORA research shows elite performers maintain 0-15% change failure rates. Track what percentage of your deployments cause problems in production. This measures how well your safety systems work before code reaches customers.

Question 2: Are Teams Moving Fast Enough?

Speed without safety is reckless. Safety without speed is stagnation. You need both.

But "fast enough" doesn't mean typing speed or hours worked. It's about removing delays in the development cycle.

Build Time Trends Average build time hides problems. Track 95th percentile build time for each major component. Google research shows faster builds improve developer productivity. When builds take too long, developers lose focus and context.

Pull Request Review Time How long do pull requests wait for review? Google's engineering practices recommend responding within one business day. LinearB research shows elite teams review PRs in under 6 hours. Slow reviews delay progress and group changes.

Deploy Frequency How often does each team deploy to production? DORA research shows elite performers deploy multiple times per day. High performers deploy daily to weekly. Teams that deploy less often build up risk.

Developer Wait Time How long do developers wait for builds, tests, or environments? This is pure waste. Track peak times and patterns. When developers spend time waiting instead of creating, you have an infrastructure problem.

Question 3: Will Customers Succeed?

This is the question most engineering dashboards ignore. You can ship fast and safe, but if customers can't use what you build, you've failed.

Customer success isn't about bugs. It's about building what customers need.

Error Rates by Feature Which parts of your product fail most for customers? Include all errors, not just server crashes. 4xx errors show confusion. High error rates mean customers are struggling.

Feature Adoption For each major feature, track adoption over the first 30 days. Healthy features show steady growth. Flat curves mean customers don't see value. Declining curves mean you built the wrong thing.

Customer-Reported Issues How fast do you fix problems customers care about enough to report? Track response times and patterns. Slow fixes mean customers learn to live with broken features.

Documentation and Onboarding How many features ship without documentation? How long until docs catch up? When customers can't figure out how to use features, they don't use them.

How These Questions Work Together

The three questions interact in surprising ways.

Speed vs Safety Push teams to deploy faster and change failure rates often increase. This isn't always bad - it might mean teams are taking smart risks. But if you only track deployment frequency, you'll miss quality problems.

Safety vs Customer Success Add more security gates and customer features slow down. Increase test coverage requirements and teams spend less time building what customers want. Both matter, but the balance depends on your business needs.

Team Structure Affects What You Can Measure Conway's Law applies to metrics. Your measurement approach has to match your team structure.

If teams can't deploy independently, deployment frequency becomes meaningless. If you have shared services, platform teams need different metrics than product teams.

Start Small, Think Big Start with one question, but build your measurement system to handle all three. You'll expand over time. If your first dashboard can't grow, you'll rebuild it twice.

Making the Business Case

Engineering metrics mean nothing if they don't connect to business results.

Safety Metrics Drive Customer Trust When change failure rates increase, customers experience more bugs. When recovery times are slow, outages last longer.

Take the Azure outage on July 30, 2024. What started as a routine DDoS attack turned into an 8-hour global outage affecting millions of users. Microsoft detected service degradation at 11:47 UTC, but it took until 12:10 UTC to correlate the symptoms to the underlying network issue. Recovery took until 19:43 UTC because teams couldn't see what was happening across their distributed systems.

Track the cost. Calculate revenue lost during outages. Measure customer churn after incidents.

Speed Metrics Drive Competitive Edge When teams deploy slowly, competitors ship features first. When build processes are slow, developers lose progress and context-switch between tasks.

If slow build processes cause developers to lose just 2 hours per day to context switching and delayed feedback, that's 10 hours per week per developer. For a 50-person team, that's 500 lost hours weekly. At typical loaded costs of $100-150/hour, slow development processes cost $50,000-75,000 per week. Over a year, that's $2.6-3.9M in lost productivity.

Customer Success Metrics Drive Growth When API error rates are high, customers struggle to integrate. When documentation lags, feature adoption stays flat. Track feature adoption curves after launches and measure support ticket volume related to poor documentation.

The Reality Check

Here's what happens when you try to build this dashboard: pushback.

Engineers will say tracking takes too much time. It doesn't. Most of these metrics come from systems you have. Build logs. Git history. APM tools. The data exists. You need to query it correctly.

Middle management will want their pet metrics included. Stand firm. Every metric must answer one of the three questions or it doesn't make the cut.

Executives will ask for aggregates and averages. Educate them. Show them how P95 metrics reveal problems averages hide.

Your 30-Day Plan

Week 1: Pick One Question Pick your most painful question. Add 2-3 metrics that answer it. Use existing tools. Don't buy anything new yet.

Week 2: Share and Iterate Show the dashboard in team meetings every day. Make problems visible. Celebrate improvements. Track which metrics drive action and which get ignored.

Week 3: Add Automation Manual tracking won't last. Write scripts to collect data for your proven metrics.

Week 4: Expand to Question Two Add metrics for your second question. Keep the same rhythm: add, share, write scripts. Don't remove working metrics.

Making It Stick

A dashboard only works if people use it.

Daily Stand-up Integration: Start every stand-up with the dashboard. Red metrics get discussed first.

Weekly Team Reviews: Deep dive one question each week. Rotate through all three. Find patterns. Plan improvements.

Monthly Leadership Reviews: Show trends, not snapshots. Connect metrics to business outcomes.

The Three Questions That Matter

Stop measuring everything. Start measuring what matters.

Pick the question causing you the most pain right now. Add 2-3 metrics that answer it. Share them daily. Fix what they reveal. Then expand.

Your developers spend too much time on process. Your customers are struggling with your product. Your executives want answers. Give them a dashboard that helps.

What question will you answer first?

Want more on engineering metrics and technical leadership? Sign up for my newsletter to ensure you don't miss future articles like this one.

The Only Three Questions Your TPM Dashboard Should Answer

Why Traditional Metrics Fail

Question 1: Can We Ship Safely?

Question 2: Are Teams Moving Fast Enough?

Question 3: Will Customers Succeed?

How These Questions Work Together

Making the Business Case

The Reality Check

Your 30-Day Plan

Making It Stick

The Three Questions That Matter

Read more

The Scale Doesn't Matter, The Mistake Is the Same

Where Do I Even Start? (The Wrong First Question)

Your AI Agent Shouldn't Have Your Employee's Job Description

What I Learned Building a Document Format from Scratch