How to Measure Engineering Team Performance (Without Breaking It)

I used to have a dashboard. Very beautiful, very complete. Story points, velocity, PR throughput - all tracked, all shown on a big screen. And it meant almost nothing.

It took me a few years to understand why. And also one very bad quarter where I realized I had no idea what was actually happening inside my team, even though the numbers looked fine.

The problem is simple: when you pick a number to optimize for, people will optimize for that number. Not the real thing. Just the number. This is called Goodhart's Law and it applies everywhere, but I think software teams are especially good at accidentally proving it.

Why Velocity and Story Points Don't Work

My first team was five engineers. I was so focused on velocity. We tracked it every sprint, I put it on a screen, we celebrated when it went up. Maybe six weeks later, I noticed something - the estimates had quietly gotten bigger. Nobody said anything. It just happened. Engineers are smart people. They figure out what makes them look good.

Lines of code is even worse. I have seen engineers write 300 lines when 30 would be enough, because the culture was rewarding visible output. You can't really blame them. If that's what gets noticed, that's what people do.

The real problem with these metrics is they measure activity, not progress. You can be very busy and still not be moving forward. Closing tickets that don't matter. Shipping features nobody uses. Having standups where everyone sounds productive but the important things are still blocked.

There Are Three Different Things We Usually Mix Up

I think about it this way:

Output is what the team ships - features, PRs, deployments
Outcome is what actually changes because of it - user adoption, error rates, performance
Impact is what that means for the business - revenue, retention, satisfaction

Most teams measure output because it's easy to count. Outcome is harder - you need to connect engineering work to product data. Impact is even harder because most engineers don't have visibility into business results.

Output matters, I'm not saying it doesn't. But output without outcome is just... work that may or may not matter. I try to think about all three, even when I can only directly see one of them.

The Metrics I Actually Use Now

After trying a lot of things, here's what I keep coming back to.

Deployment frequency and lead time. How often are we shipping, and how long does it take from writing the code to it being in production? These reflect real things - how healthy your process is, how confident the team is in testing, how messy the codebase has become. Hard to fake for a long time.

Incidents and time to recovery. Not to blame people. But if incidents are going up, something is wrong. Could be the code, could be the process, could be that people are tired and burnt out. When engineers are burning out, it usually shows up in on-call first.

Feature adoption in the first 30 days. This is the one I find most useful for connecting output to outcome. Did anyone actually use the thing we built? If we keep shipping and nobody is using it, that's a real problem - and it's worth surfacing, even if it's uncomfortable.

What engineers tell me directly. Not a number. But every week in 1:1s I ask "what slowed you down this sprint?" and I write down the answers. Over time, patterns appear. This is honestly some of the most reliable signal I have.

I Use 1:1s and Retros as Measurement Tools

By the time a problem shows up in your metrics, it has usually been happening for a while. Engineers will often tell you early - maybe just a small comment about something frustrating, or they seem less excited than usual. If you are paying attention, these are signals.

Retros give me qualitative data about friction. What is slowing the team down, what is confusing, what is quietly making people tired. I take notes across retros and look for things that come up multiple times. If the same problem appears three sprints in a row, it is not random.

This kind of measurement is messy. You can't put it in a spreadsheet easily. But I find it more accurate than most of the charts I've made.

Be Careful With Public Metrics

This is something I feel very strongly about. Be careful about which metrics you show to the whole team, or to leadership.

A few years ago I put a "tickets resolved per person" chart in a team retro. I thought it would help us talk about capacity. Instead, engineers started picking up easy tickets and avoiding the complex ones. The person doing the most important architectural work - hard, unglamorous work that didn't fit neatly into tickets - suddenly looked like the worst performer on the chart. She nearly quit.

I took the chart down. We had a long conversation. But the trust problem took months to fix.

Public metrics create incentives whether you want them to or not. If you're going to show something to the whole team, think hard about what behavior it might accidentally push people toward.

Leading vs Lagging - A Practical Split

Lagging indicators like deployment frequency and incident rates tell you how things went. They're good for understanding trends but by the time you see a problem, it already happened.

Leading indicators are harder to define. Things like: how people are feeling in 1:1s. Technical debt conversations that keep getting pushed. An engineer who stopped speaking up. These things predict problems before the metrics catch them.

A good measurement system has both. The lagging stuff gives you honest data. The leading stuff gives you time to actually do something.

What I've Stopped Measuring

Story points. Velocity. Lines of code. Number of PRs opened. Meeting attendance. All of these either get gamed, measure the wrong thing, or both.

The honest truth is that good engineering management is not really about finding the perfect metric. It's about building enough trust that people give you real signal - in conversations, in how they talk about their work, in how they show up when something goes wrong.

Metrics can be useful. Sometimes very useful. But they are a tool, not a replacement for actually knowing your team.