Some businesses want to rank developers like salespeople. That's not how it works.

If we assume that every company has become a software company, that means software developers have become some of the most valuable (and expensive) employees on the payroll. How do companies know they're getting the most out of their investment in those employees?

Some businesses want to rank developers like salespeople. That's not how it works.
Software development isn't just about coding, and neat measurements of individual productivity are doomed to fail. Photo by Christina @ / Unsplash

If we assume that every company has become a software company, that means software developers have become some of the most valuable (and expensive) employees on the payroll. How do companies know they're getting the most out of their investment in those employees?

It's an old debate in information technology that took on new life last month after McKinsey (yep, that McKinsey) published an article claiming it had developed a new metrics-based approach to measuring individual developer productivity. It was met with an immediate backlash from several corners of the software world, led by a two-part series on the Pragmatic Engineer and extended thoughts from Bryan Finster.

It turns out that the vast majority of engineering managers believe they get the best out of their employees when they treat them less like coding automatons and more like, well, people. There are several popular metrics for measuring the productivity of a software development team, but the idea that you can measure the productivity of an individual developer with metrics designed by a notorious consulting firm is a solution waiting to backfire, according to several interviews with people who have been both individual contributors and engineering leaders.

"The activity of building software is still a highly creative and human-centered collaborative exercise, which doesn't lend itself to easy measurement," said Brendan Humphreys, head of engineering at Canva. "There are some parts of the software development process that are amenable to measurement, but those types of measures are fraught with danger."

It's no coincidence that McKinsey's article appeared at a time when business leaders are watching their spending on technology very closely and that developers, like any workers in any discipline, fall along a spectrum of productivity and competence. At many companies, engineering managers are being asked to rank their developers should the business determine that it needs to cut costs across the board.

"To truly benefit from measuring productivity, leaders and developers alike need to move past the outdated notion that leaders 'cannot' understand the intricacies of software engineering, or that engineering is too complex to measure," the McKinsey authors wrote.

There are some individual metrics that can give engineering managers a heads-up that a developer isn't living up to their potential or is a wrong fit for the organization. But relying on metrics alone to make those calls all but ensures they will be gamed by smart people who want to save their jobs, and actually discourages non-technical business leaders from trying to understand the intricacies of software engineering because they'll assume they can fall back on McKinsey's "proprietary algorithm to account for nuances" and "talent capability score."

"You should be measuring productivity at a team level to achieve the outcomes that you want to achieve," said Milin Desai, CEO of Sentry. "Because otherwise you end up with busy work without impact."

No I in team

Most engineering organizations measure some individual metrics; the number of lines of code written by a developer, the number of times their code is committed to the live production environment, or how long it takes them to resolve issues that are assigned to them, said Arvind Jain, CEO of Glean.

"We tell every individual engineer that, just from a look-back perspective for you, look at how many lines of code you wrote last month, and how did it compare with the month before: What are the long-term trends?" Jain said. "We show all of these metrics and make them available for individuals themselves to get some analytics on (their activity) and try to decide what are some of the things that they could be doing differently."

However, "we've never felt that we could just look at these metrics and make a judgment on anyone," Jain said. Some developers are asked to code more frequently than others, who might be more valuable shaping the big-picture strategy or pairing up with less-experienced colleagues and helping them understand the most efficient way to ship their code, he said.

Dylan Etkin co-founded Sleuth, a developer productivity measurement startup, along with three former colleagues from Atlassian in 2019. The company makes a dashboard based around the principles of the DORA project, a research initiative started by Puppet and other researchers about ten years ago to measure the productivity of software development teams.

"We recognized very early on that developers are very concerned about being measured on things that don't really matter," Etkin said. "We have a long history of the industry trying to measure things that don't matter; lines of code, or cyclomatic complexity, or number of pull requests, and all these sorts of things."

DORA focuses on four key team-oriented metrics including deployment frequency — overwhelmingly cited by those interviewed for this article as the most important productivity metric they worry about — and time to resolve issues.

"It really seeks to understand what makes up high-performing teams when it comes to using software for delivering business value," said Nathen Harvey, a developer advocate for both DORA and Google Cloud, which acquired DORA in 2019. The idea is to identify bottlenecks in the overall software development process and assess whether or not development teams are being asked to focus on the right set of priorities to achieve their overall goal: to help their companies make money, he said.

"I know that the minute those (metrics) end up in a dashboard, or in a system, they're eventually going to get rolled up and now an executive is going to make a choice."

The McKinsey article acknowledged the usefulness of DORA and another team-oriented developer productivity framework known as SPACE, but argued "while deployment frequency is a perfectly good metric to assess systems or teams, it depends on all team members doing their respective tasks and is, therefore, not a useful way to track individual performance."

But that's why companies hire engineering managers, said Camille Fournier, managing director and global head of engineering and architecture for J.P. Morgan Chase and Co.'s Corporate Investment Bank division, and author of the popular guide for a generation of tech professionals, The Manager's Path.

"My expectation is a line manager of individual contributors shouldn't generally need to look at a lot of metrics to figure out whether people are productive or not," she said. "I really believe that you need to empower your managers to do the right thing and to push their teams and to ultimately be the deciders of who's doing well and who isn't, because that's part of their job."

Harvey agreed.

"I think that the worry and sort of the fear that I think from a practitioner level that you see is, maybe I don't mind if myself or my team are looking at me individually. But I know that the minute those (metrics) end up in a dashboard, or in a system, they're eventually going to get rolled up and now an executive is going to make a choice," he said. "And they're just going to draw a line without any context for understanding and there are going to be people above and below that line. And that has some potentially really dire consequences for everyone involved, including that executive."

AI fixes this

One of the more compelling aspects of the generative AI boom has been its potential to improve software development productivity.

A poorly kept secret in programming circles is how often even experienced developers search Google for the small snippets of frequently used code that underpins their more creative and differentiated products. Stack Overflow is another popular resource for developers looking to solve mundane problems that are nonetheless blocking their progress.

GitHub's Copilot coding assistant, unveiled before the generative AI craze really kicked off, is already in widespread use among developers looking for an automated way to plug in those code snippets, including developers working for companies featured in this story.

"The way I look at it is, does it boost my top developers to be more creative?" Sentry's Desai said. "Could assistants alleviate maybe 20% or 30% of the mundane work that they have to do sometimes, freeing up that time for them to be more creative?"

Fournier believes coding assistants could also play a very interesting role helping developers program in languages in which they lack experience, greatly reducing the time it takes them to get up and running on something new.

"I'm working in a language that I'm not that familiar with, but I know code; I know how to recognize whether something is kind of logical or not," she said. Developers that understand how code works should be able to examine the raw output produced by a coding assistant in a new language and decide how to move forward, which could otherwise be a painstaking process.

But if businesses decide to bring AI development tools into their organizations in search of productivity while also adopting McKinsey's "coders should code" metrics-driven productivity advice, they'll have just automated the process of gaming those metrics, Desai said.

"If we assume that the way we measure productivity is how many commits (you make) and what you're doing with the lines of code, then essentially with AI, it all becomes extremely mirrored."

McKinsey's system seems to be designed for companies that don't trust their technology leaders, which is, of course, a sign of a much deeper organizational problem. Should it appear at your workplace, it's probably time to test the market.