An Interview With Sprkl

Note: This interview was reworked into 37 Tips For Improving Productivity In Software Development Teams on the Sprkl developer blog. I'm posting the original Q&A version of it here; it's probably less SEO-ready than the bullet list of hot tips, but I think it's a little more readable.

Can you please tell us about yourself? Your technical experience, cool projects you’re working on, and all the fun stuff:)

I’ve been working on web-based applications for as long as the web has been around to work on: the browser in use at my first job was NCSA Mosaic. (We were all very excited when Netscape came along and introduced innovations such as a background color that wasn’t gray.) It was pretty easy to get into the industry back then, you just had to learn all five HTML tags and you were all set; from there it was just a matter of keeping up with changes as new ideas get introduced and old ones fall out of favor. I’ve worked as a designer, a UX specialist, in product development, and in engineering teams; I’ve done stints as an IC and in management. I even tried the solopreneur thing for a while, during which I learned I really need someone setting external deadlines for me if I’m going to get anything done.

Please describe your organization's R&D department structure. Specifically, your team? (Size, roles, etc.)

I’ve done my share of “real” jobs, but much of my work has been as a consultant, which means I’ve been able to see inside a lot of different organizations at everything from startup to global scale. In practice, while the details differ across orgs, this generally boils down to one of three general situations:

  • Startup scale. Small team, low process overhead, everyone has full visibility into what everyone else is doing, the codebase is greenfield and the product definition is still pretty vague.
  • The transitional phase, where that startup starts to find success and suddenly needs to start working at scale. There are lots of potential pitfalls here – practices that worked just fine when you were all in the same room start to fail, and it’s really easy for companies to lose focus or jump to the wrong solutions.
  • The well-established organization: there’s a process for everything, roles are well defined, and the scope of authority for any given team or individual is smaller and more focused. It’s hard to ruin a company at this stage – all the bureaucracy that’s built up over time acts as an insulator against bad ideas – but it’s also hard to keep up the pace or do anything truly innovative. (Unless you spin off a skunkworks and start the cycle anew!)

Each of those phases has different challenges and needs a different structure.

In an early startup you probably shouldn’t even try setting up too much process or code architecture; any predictions you make about what you’ll need before you find your product market fit are likely to be wrong.

In a very large organization the main challenge is visibility and trust: now you’ve got multiple departments – design, product, engineering, program management, etc – each with multiple layers of management all trying to work on the same product. All of that structure is necessary to organize the large numbers of people involved, but it can lead to a disconnect between decision-makers and actual reality.

That middle zone is the really challenging one – you really do have to throw out a lot of what used to work, which can be very painful for the people used to working that way. (It’s also, in my opinion, the most interesting: it’s where you really get the opportunity to make big changes and to build for the future. At a startup you have to just keep trying things until one works; at an established organization most of the interesting decisions have already been made.)

On a scale from 1 to 10, how complex do you think your code base is and why? (i.e., Kubernetes, local clusters, Monolith or microservices, DBs, Git methodology (trunk or feature branches), Monorepo or multi repo, etc.)

I’m afraid I’m going to dispute the premise of the question a little bit here. There are different kinds of complexity; some of them are unavoidable (algorithmic complexity), some of them are self-inflicted (tech debt or conflicting code styles), some are organizational (business rules evolve faster than code does), some of them are tradeoffs you make to reduce other kinds of complexity.

So the trick isn’t “let’s make our codebase less complex,” it’s identifying which complexity is harmful, and which is necessary for the current stage of your company.

The architectural complexity of microservices or Kubernetes is a great example of tradeoff complexity: microservices, Kubernetes, Terraform et all are inherently much much more complex than their traditional equivalents; that complexity is the price you pay for scalability. I see way too many small or mid-size companies jump into these complicated solutions too early, or just because they think they need to, and end up accepting that architectural complexity before they’re feeling the pain points that would justify it. If you’re not routinely needing to spin up and configure new containers, you don’t actually need Kubernetes. That pool of Terraform-configured step functions could probably have been built a lot more quickly and easily as a plain old API. Basically, if you’re not working at FAANG scale, you probably don’t need FAANG-scale tooling to get your job done.

Git methodologies are another good one. I’m old enough that my first encounter with source control was CVS, in which developers would literally “check out” the file they were working on, blocking anyone else from touching that file until they checked it back in. (That’s one way to prevent merge conflicts!) Subversion improved on this, it had what could charitably be described as “branches”, but it wasn’t until git that a real branch-and-merge strategy was really viable; to work effectively before git you had to do a lot of communication and keep your changes small and focused.

What engineers who grew up on git tend to forget is that it was designed for a specific purpose – managing a large number of contributions from a variety of external sources into the linux kernel – which doesn’t necessarily match the way a lot of engineering teams work. (In a way git itself is another architectural complexity tradeoff: you accept the fact that merges are hard, in exchange for allowing independent teams to work simultaneously on the same code.) GitFlow was a fairly elaborate process that evolved, mainly, to minimize the pain of those merges. The thing is, though, most engineering orgs aren’t the linux kernel, they’re not made up of teams of external contributors who have no choice but to work independently and deal with the conflicts at the end. So it’s interesting to me that that trend is now starting to swing the other way, as more and more organizations realize that smaller, more focused commits merged constantly are just a better way to work (for lots of reasons, not just avoiding merge headaches, but that’s a key one).

I’m 100% a convert to trunk-based development after seeing how much it improved our process at my last organization; switching from long-lived branches to quickly-merged code and a simple feature flag system was one of those rare cases of an absolutely unambiguous improvement; the whole problem of code merges just evaporated overnight. Don’t get me wrong, I love git, you couldn’t pay me enough to go back to the bad old days of file locks, but long-lived branches are unnecessary and cause more problems than they solve. I wouldn’t be surprised if some future version of “source control” ends up looking a lot more like real-time collaborative editing than like what we’re used to doing today.

Can you describe your development process? (Describe a common process from writing the code locally until it reaches production) Do you follow a certain methodology?

Every single organization I’ve worked with in the last couple decades has described themselves as using an Agile methodology. None of them mean the same thing by it – the only thing everyone agrees on about Agile is that everybody else is doing it wrong. “Agile” as a concept is a little overripe at this point; the consultants have got hold of it and formalized a bunch of rules and roles that I’m not convinced fit every organization that tries to use them.

But the core principles of it are still essential:

  • Work in short, iterative cycles, both for planning future feature development and for the design and implementation.
  • Check what you’re building against real users as frequently as possible, to make sure you’re building the right thing.
  • Maintain a culture of open feedback between product, engineering, and design – they are not separate processes.
  • Same thing between labor and management. Developers need to feel safe telling their bosses they've made a wrong decision.
  • Same thing within engineering teams; developers should be reviewing each others code, suggesting improvements, and accepting feedback from one another.
  • Move small changes into production frequently, rather than doing a few big releases with big changes.
  • Iterate, don’t redesign.
  • Automate everything you can, earlier than you think you need to.

Where do you think the blind spot in your delivery process is?

I think the most important and challenging part of coding is not the delivery process, or the build process, or the testing or the framework or the documentation – it’s in deciding what to deliver in the first place. If you can nail that part, if you can predict accurately enough what your users need, the rest pretty much just follows on its own.

Keeping your releases small and frequent makes this much easier, because you don’t have to predict with perfect accuracy: you can try out a small thing, iterate on it if the users like it, or abandon it without too much sunk cost if they don’t.

This is tremendously challenging for a lot of organizations; especially in those not used to continuous delivery there can be a lot of pressure to stuff too many ideas into every new feature. Which leads to a vicious cycle: overstuffed releases take longer to develop, which increases the pressure to stuff even more things into version n+1, because everyone knows it’s going to be a long time before you’ll get a chance to add anything in version n+2.

At which phase do you perform testing? (i.e., local, CI, production)

Personally I’m a big fan of test driven development when I’m building something complicated, less because of the resulting code coverage and more because it’s just easier to think through whatever the problem is one edge case at a time instead of trying to reason about it all at once. Write a test for each case, write code until all the tests pass, and you’re done.

This doesn’t necessarily result in tests that will be particularly useful after the code has been written, though. It's pretty rare that an old unit test will suddenly pick up on a bug in unmodified code. So you also need to be doing some integration testing to make sure all your bits and pieces are working together -- it's much more common for a change in module A to trigger unexpected consequences in module B. I'm not sure there's a one-size-fits-all approach to this, it really depends on the product and the scale you're working at. Maintaining a quality test suite can be a lot of work especially when your product is going through a lot of changes: it's really a whole separate codebase that has to be kept up to date with any changes in feature or business logic, and there's always pressure to be writing more features instead of writing more tests. At small scales, or in periods of rapid change, manual testing may be more efficient than trying to maintain a fully automated test suite; as the features start to stabilize, or grow past the point where testing everything by hand is feasible, it's important to make time to set up a real staging environment with production-like data, and set up automated workflows against it to make sure the outcome is what you expect.

And of course you'll miss things. It's inevitable. So you have to have production monitoring, some telemetry that'll notify your engineers when something breaks in real users' hands. (And you have to have a process in place that ensures the engineers actually see those notifications and can claim the time to do something about them.)

Do you see the connection between complexity, feedback, and productivity? Do you measure developers’ productivity, and if you do, how?

I actually spend a lot of my time trying to talk management out of leaning too heavily on productivity metrics. They’re tempting traps, because there are so many numbers you can measure and charts you can make and it all looks so much like science that you could be forgiven for forgetting that they don’t actually measure productivity.

We’ve all figured out that counting lines of code or hours-on-keyboard are useless metrics, so far so good; now we just need to learn the same lesson about code coverage, PR size, TTO, TTM, story points per sprint, rework percentage… All of those things can be meaningful signals, but they’re not measurements of productivity. A long cycle time might indicate any number of different things: maybe the requirements were unclear or are changing, maybe the team is having to refactor a lot of legacy code before they can progress, maybe the team got walloped by some other unplanned work, maybe there’s a lot of friction somewhere in your development process, maybe the thing the team is working on is just genuinely difficult, maybe there’s disagreement within the team about how to build the thing, maybe somebody’s slacking off and bottlenecking the rest of the team. The point is you just don’t know from the metric itself whether it even is a problem, let alone what to do about it. (I’d much rather have a team with the slow cycle time because they went back to the product team to clarify the requirements, than the team that just plowed ahead and built the wrong thing very quickly.)

What efforts do you implement or recommend to enhance your or your teams’ productivity?

The thing to remember, the really important thing, is that engineers like solving complicated problems. They like writing code and building things, that’s why they’re in this line of work! Give an engineer a meaty problem, the scope and authority to solve that problem, and a development environment that isn’t going to get in their way, and they’re going to be productive every time.

My main goal as an engineering manager is to get out of my teams’ way, and to do my best to keep the rest of the organization out of their way too.

The biggest productivity killers:

“I don’t know why we’re building this feature” is a huge one. If a team is building something just because they were told to, and not because they understand its value, they’re going to be undermotivated (and will likely build the wrong thing, to boot.) The best solution to this is to have as few people as possible between the engineers and the actual users, and to empower your engineering teams to push back on ideas that don’t make sense to them.

“I spent all day in meetings” or “interrupted by questions” or “responding to pagerduty tickets” or similar. This one’s pretty well-known; software engineers need uninterrupted focus time. Make sure they get it. Encourage them to block out segments of the day in their calendars, reduce the number of routine scrum ceremonies and planning sessions you subject them to, and if someone keeps peppering your engineers with questions or one-off requests, empower your engineers to say “no” (and find some less-interruptive way to get that person what they need).

“I spent all day tracking down [a bug / the right config / a particular error message / a stray semicolon]”. Everyone gets hit by the occasional head-scratcher of a bug that takes way longer to sort out than it theoretically should have, and to a point that’s fine. But if it’s happening routinely, that’s another signal that needs looking at: maybe your code isn’t sending out accurate error messages, making bugfixes take longer than they should; maybe your engineers are working in an environment or context they’re not familiar enough with or that is too bleeding-edge; maybe they’re working in an inefficient way. (This last one is, ironically, especially common in orgs that put a lot of emphasis on fast delivery: engineers under pressure to churn out code don’t feel empowered to take the time to figure out, say, why their sourcemaps stopped working three months ago, so instead spend time trying to debug compiled production code. You need to give your team time to streamline their own processes.)