A Taxonomy of Tech Debt (2018)

technology.riotgames.com

294 points by jakey_bakey 9 months ago

abc-1 9 months ago

Contagion is exactly why interfaces are one of the most important pieces of design and should be given significant thought. A beautiful interface with a suboptimal implementation can be easily cleaned up when time is allotted. The reverse is rarely true.

majormajor 9 months ago

I don't disagree but I think commonly you are missing one of two things that are necessary for a proper design:
1) time to design it 2) knowledge of exactly what it needs to do today and in a year
Sometimes you're missing both.
In which case I think you can prevent contagion from being too terrible by enforcing smaller modules and single responsibility in a compositional way. That doesn't require as much knowledge of the future or time, but just requires you to avoid high-surface-area interfaces that end up with lots of behavioral variants controlled via parameters in a nesting-doll style. Instead, move your config/parsing/behavioral decisions to the edges of your logic instead of letting them seep into all your underlying models too.
- BobbyJo 9 months ago
  
  > In which case I think you can prevent contagion from being too terrible by enforcing smaller modules and single responsibility in a compositional way.
  I would classify that as thoughtful interface design.
- njtransit 9 months ago
  
  > 1) time to design it
  Do good interfaces take more time than bad interfaces to write? Does adding more time really make interfaces better? I find that engineering quality (of which interface design is one facet) is largely a function of talent and experience. Time doesn't usually play a factor. Writing good code takes the same amount of time as writing good code, for the most part.
appplication 9 months ago

Agree, but I’ve found designing robust, future proof interfaces to be one of the hardest problems in developing software. Even intentionally setting out to avoid tech debt at all costs, it’s just hard to do correctly. It requires more than technical bravado and architectural vision. It really does get into the realm of predicting the future.
- grues-dinner 9 months ago
  
  > 15. (Shea's Law) The ability to improve a design occurs primarily at the interfaces. This is also the prime location for screwing it up.
  https://spacecraft.ssl.umd.edu/akins_laws.html
  - mumblemumble 9 months ago
    
    It's important to accept that you will screw it up. Repeatedly. Interfaces have to be designed before you can start using them, which means that you will never have less information about how a module will be used than you do when you design its interface.
    The best defense against this that I've found is to ensure, as much as possible, that interfaces can be replaced. The single responsibility and interface segregation principles can help here. Using small, focused interfaces and letting modules implement more than one of them makes it easier to use the strangler pattern to replace interfaces that no longer work well with new and improved ones.
    Also avoid temporal coupling as much as is feasible. Unnecessary statefulness is the easiest way to make this sort of thing harder than it needs to be.
    
    grues-dinner 9 months ago
    
    Mr Akin's gotchu, fam:
    > 2. . To design a spacecraft right takes an infinite amount of effort. This is why it's a good idea to design them to operate when some things are wrong .
    > 3. Design is an iterative process. The necessary number of iterations is one more than the number you have currently done. This is true at any point in time.
    > 4. Your best design efforts will inevitably wind up being useless in the final design. Learn to live with the disappointment.
    Also 9 10, 11, 12, 13, 14 and a bunch of the others apply too.
- intelVISA 9 months ago
  
  A good middle ground is modularize everything into stateless funcs where possible so it can be reassembled in different configurations without much stress.
  An excellent interface will eventually be deformed beyond recognition chasing the architectural dragon; a well-crafted library will outlive the project.
- abc-1 9 months ago
  
  Look at how mathematicians build minimal yet complete definitions for inspiration. An algebraic system can be created with a set of operations such as multiplication and addition, and existing concepts can be mapped to this system, such as money, but the underlying algebraic system will never change. It is complete.
  Much of the system can be complete like this with forethought. The pieces that cannot can be factored out to the edges.
  - appplication 9 months ago
    
    You’re not wrong in a theoretical sense, but building useful interfaces that your average dev can grok enough to build on top of requires higher level abstractions, approximations, and “reasonable defaults”. My experience is that only a small number of devs actually well understand the codebases they work in (and care enough to be thoughtful in interfacing with it).
    The majority of devs generally are happy to tack on their features and PRs to whatever random scaffolding they can, without regard or awareness for how their individual component fits into the larger system, or how it may be extended. And to be honest it’s not necessarily a bad thing, because they do need to get work done, and merging PRs shouldn’t be reserved for the enlightened.
    I guess I’m just pessimistic. The reason we don’t see perfect software is because we are not capable of producing it. At a certain point it all becomes spaghetti. If you work with software that isn’t spaghetti, it’s only because the people who care about it not becoming spaghetti haven’t left yet. This is good, but eventually they will leave, standards will decline, and you will become one with the pasta.
    
    hnthrow289570 9 months ago
    
    You're not too pessimistic yet. Projects that devolved into spaghetti paid their engineers roughly the same as they will pay new ones. Looking at the incentives, it's hard to take on the burden of undoing technical debt if your salary isn't going to change much. Businesses take advantage of passions to fix things like technical debt because they know they don't have to pay too much extra for it.
    
    abc-1 9 months ago
    
    Keep fighting the good fight. It’s more satisfying, even if entropy inevitably wins ;)
  - pjc50 9 months ago
    
    > forethought
    Forethought is only possible if people tell you the requirements precisely correctly upfront. Real systems design is you get 90% built and someone drops a hard requirement that's also a layering violation on you.
  - AtlasBarfed 9 months ago
    
    That is incredibly naive.
    Math arises from first principles, human behavior does not.
    
    abc-1 9 months ago
    
    Consider how often SQL, a hash implementation, a data compression algorithm, or a standard library changes. Not often, because they are complete systems. If you don’t like them, you don’t change them- you switch to another system. But they can support an infinite variety of use cases. Hopefully that clears it up for you.
TexanFeller 9 months ago

One must understand what a good underlying implementation looks like to expose a good interface, it's easy to implicitly bake stupid implementations into an interface that cannot be fixed by just changing the implementation. The example that comes to mind is sorting and paging behavior. Junior devs, and many seniors that should know better by now, OFTEN start with requests that use some variant of limit/offset parameters for paging which leads to terrible performance issues and anomalous behavior. How paging works efficiently and what sorting options can be supported with good performance is inherently coupled to the shape of your data and your choice of datastore. People that haven't been through this exercise at a lower layer have little chance of shaping the higher level interface appropriately unless they put work into the implementation up front.
- immibis 9 months ago
  
  Another example is synchronous vs asynchronous I/O.
yodsanklai 9 months ago

Which is why I like languages that make interfaces very explicit, like OCaml or Ada. Most of the time, I don't want to see the implementation, just a properly documented interface. If people can't describe in simple terms the behavior of an interface, something is wrong.
taeric 9 months ago

History seems somewhat full of counterexamples, though? QWERTY is rather famous for not being an optimal physical interface. Steering wheels would probably be up there?
In computers, you have x86 being the poster child of ostensibly suboptimal interfaces.
- TremendousJudge 9 months ago
  
  QWERTY proves GPs point, it's a suboptimal interface and basically impossible to get rid of now, even though we've moved the underlying implementation from typewriters to computer keyboards to touchscreens
  - shiroiushi 9 months ago
    
    Actually, I wonder if QWERTY is actually better for touchscreens than alternative layouts. In a better layout like Dvorak, the most commonly-used keys are grouped close together, mostly on the home row. This is great for typing because you don't have to move your hands and fingers as much and can reduce wrist strain. But QWERTY does the opposite, moving all the most-used keys to the non-home rows so you have to constantly move between the top and bottom rows. On a computer keyboard, this gives you RSI, but on a small touchscreen, this means the "keys" you're tapping on are generally farther from each other, so perhaps it makes it easier since you rarely tap on two keys that are adjacent.
    
    taeric 9 months ago
    
    Amusingly, I switched my phone to colemak. Mainly as I am just happy with the layout. Though, I confess I hate inputting anything on my phone, as I am not a good phone typist. I can almost make the swipe thing work, but I learned how to touch type on a keyboard and it feels very very weird to try and use a phone's keyboard.
  - taeric 9 months ago
    
    Ah, if the point is just that you can't get rid of bad interfaces, I suppose that works. I was taking it more as a systemic problem caused by bad interfaces. Which is to say, I'd be hesitant to cede that this has caused any actual problems.
    Would be like complaining that AC is being superseded by DC and how this is proof of an early choice locking us into a bad choice. But it ignores all of the progress made in the interim. And the odd reality that enough effort can migrate anything. It just takes a lot of effort. And we are often quite willing to throw effort at things.
InDubioProRubio 9 months ago

But contagion deforms interfaces. Its that moment in discussions, were everyone goes away from how it ought to be, to how we must implement it, due to the previously existing modules, you learn about that..
dogleash 9 months ago

> A beautiful interface with a suboptimal implementation can be easily cleaned up when time is allotted.
That won't happen. Why toy around with your ticket database like that? Just close it to WONTFIX.

resonious 9 months ago

I gotta say, it's pretty amazing to me that this was written by an engineering manager. None of the EMs I've worked with would be capable of discussing our codebase at this level of technical detail. Even the ones that used to be engineers.

Although to be fair, we don't have any EMs who were promoted from within. We have a bad habit of hiring managers from outside, as nobody internally really wants to stop doing engineering (myself included).

cynicalpeace 9 months ago

It seems to be missing the most common type of debt I've seen:

Founder's debt.

This was debt that was created by the founders to get the fast, good value tech out the door. Low hanging fruit that ends up being the foundation of the whole shebang.

The founding documents of many countries fall into this category lol (but not USA! USA! USA!)

Macgyver debt and foundational debt come closest, but neither quite outline this phenomenon.

bbor 9 months ago

Great article, from a technical perspective! I would say it’s more a “nomenclature” than a “taxonomy” because it’s neither exhaustive nor discrete (by design), but I might be mistaken there. I loved the physical examples for each especially, really thought provoking.

As always, I have a philosophical nit to pick: the “three axes” introduced at the top are just “Return” and “Investment” from good ol’ RoI, with a subcategory added for a particular type of forward-looking/conditional Return. I’m guessing this decision has worked in practice and I don’t expect video game development practices to be absolutely scientifically sound, but some extra philosophical certainty never hurts!

dang 9 months ago

Discussed at the time:

A Taxonomy of Technical Debt - https://news.ycombinator.com/item?id=16810092 - April 2018 (113 comments)

also this bit:

A Taxonomy of Tech Debt (2018) - https://news.ycombinator.com/item?id=39782923 - March 2024 (1 comment)

ChrisMarshallNY 9 months ago

> I define tech debt as code or data that future developers will pay a cost for.

One of the best descriptions I’ve encountered.

As in all debt, there’s a “threshold” that should be applied, at the time the debt is incurred, which balances the immediate needs, against the future costs. I feel that most people (not just developers) amplify the immediate, and deprecate the future costs.

For myself, I have an almost pathological aversion to debt, of any kind. I will spend an extra day, factoring out stuff that might be useful, in the future. I seem to be right, about 50% of the time. That said, every time I do something like that, I reinforce habit, which accelerates my basic workflow.

jakjak123 9 months ago

I have worked in 3 "startups" now, only coming in after they have started making enough revenue to pay normal ish salaries. The thing I have seen the most, is several of the founders have a blurry concept of what were ideas they had, what was actually built, and what of parts of what was implemented actually works.

brightball 9 months ago

I've used Contagion to describe tech debt ever since I first read this article. Does a great job.

sanitycheck 9 months ago

I'm not sure I'd even call "local debt" technical debt in ordinary circumstances - realistically there's always going to be mess somewhere, and encapsulating it away where it can't hurt anyone is normal. If it probably never needs to change unless requirements change (in which case any other implementation would also need to) it's fine.

Perhaps if 24 minion instances constitute an actual problem (rather than just inelegance) their example for it is actually foundational debt related having a "minion" being the simplest primitive that would do the job when maybe something lighter could have existed.

igornadj 9 months ago

The article goes into it a tiny bit, but the cost is the mental cost of when you do need to work on it, understanding it, and I would add keeping the tooling the same.
Encouraging devs to have their changes include all modules, even those that are old and mature and don't need to be touched, is a good way of ensuring this doesn't build up to where it becomes a problem.

leni536 9 months ago

One important aspect is when you knowingly take on tech debt in return of some short-term benefit. Then this benefit becomes an other axis to weigh against.

grues-dinner 9 months ago

Just like real debt. Want a new building now to get work done, not in 15 years when you have the capital? Take out debt, baby!
It's a tool, but a powerful and dangerous tool, and if you don't acknowledge you're using it and respect it, it'll hurt you. Or it'll hurt someone who accepts the grenade from you. Just like real debt.
- et-al 9 months ago
  
  And just like real debt, some tech debt has higher compounding effects than others. (Consider this fix cost and impact in the author's framework.)
  - grues-dinner 9 months ago
    
    Yep. Missing a deadline because your debt kicked off a death spiral is Jimmy the Facestabber coming looking for his vig at 25% a week and breaking your knees, gently pushing back some nice to have features next release while you deal with the debt is "only booked a 4 star hotel because the mortgage was paid first that month".
jamesfinlayson 9 months ago

I've heard this called "tactical debt" instead of "technical debt".
salomonk_mur 9 months ago

Typically speed.

ooterness 9 months ago

Great article. The "contagion" factor is a useful concept that I hadn't seen before. Needs a [2018] tag.

uterusxdxs 9 months ago

[dead]

APublicMan 9 months ago

My experience at big corporate is that (edit: unmanageable) tech debt is caused by undisciplined and unorganized scrum team.

When you have a proper backlog of tickets, including tech debt tickets, the team will eventually fix the tech debt when there are not enough feature tickets to exhaust capacity.

bbojan 9 months ago

> the team will eventually fix the tech debt when there are not enough feature tickets to exhaust capacity
I have yet to visit this misterious universe you describe.
- Swizec 9 months ago
  
  > I have yet to visit this misterious universe you describe.
  The trick is to have 1 backlog. Tech debt and features live on the same list and it is up to the PM to prioritize. Engineering’s job is to argue cost.
  Good PMs will prioritize relevant tech debt or pull it in with feature work in the same area. They understand the tradeoff of go slow to go fast. They also understand when tech debt will never become relevant (because the feature is getting nixed, or hasn’t shown desired impact yet, or because the cost of interest is waaaay lower than the cost of paying it off in many cases).
  This only works when engineers have the discipline to look stinky awful code in the eye and say “not today” and stay within agreed timeboxes. You blow this estimate once or twice, get the PM in hot water with leadership, and you’ve lost the trust.
  - NAHWheatCracker 9 months ago
    
    All of the teams I've been on have used one list. I've never seen a PM prioritize the technical work. I still think it's a good idea for it to be one list, but it's not sufficient.
    For teams that don't have a good PM, you also need a tech champion. Failing that, engineers need to inflate estimates and do tech work under other stories. Then everything becomes less predictable and teams never develop trust.
    
    rqtwteye 9 months ago
    
    All PMs I have seen so far were just passing on management’s desire for more features quickly. The only approach I have seen work is if engineering adds refactoring as part of the normal work that needs to be done without asking for permission.
    
    NAHWheatCracker 9 months ago
    
    That's the practical advice to engineers who are stuck in a dysfunctional organization where they can't really effect change, which is probably 90%+ of all organizations.
    
    Swizec 9 months ago
    
    > For teams that don't have a good PM, you also need a tech champion
    Yes. And to add some nuance, you need a [trusted] engineer who can say “This will take 3 weeks because of tech debt items A, B, C. We can fix those in 1 week and then take 1 week to implement this. How would you like to proceed?”
    Any decent PM will take the 2 week option that also cleans up the codebase.
    But if fixing the tech debt would take 3 weeks and then another 2 weeks to build the feature, then any decent PM will take the option that doesn’t fix tech debt unless there’s a bunch more stuff coming in this area in which case taking 3 weeks to fix stuff is totally worth it.
    Their job is to make those tradeoffs. Our job is to highlight the tradeoffs they’re making so they can make informed decisions.
    
    gregmac 9 months ago
    
    > “This will take 3 weeks because of tech debt items A, B, C. We can fix those in 1 week and then take 1 week to implement this. How would you like to proceed?”
    I've experienced something like this, but only on a project that mostly had the original team that built it (including me) still working on it. We were able to keep things in check, and in the above case would just do it that way without really asking.
    On many other projects I've been involved in, there's years of tech debt that has accumulated: the typical retrospectively incorrect design decision, followed by layers and layers of band-aids, each time making the real fix more complicated and a bigger scope.
    These things undoubtedly increase the cost of everything else, but it's really hard to articulate. The fixes take weeks, the break-even won't come until months later, the long-term team members are a mix of skeptical and defensive of their work (eg: don't want to do the real fix). In some cases, there's a war story "we heard that about x, but that caused so many bugs we had to revert and abandon it, why is this going to be different?"
    Any tips for anyone working in this environment?
    
    The_Colonel 9 months ago
    
    Those are easy calls for which everyone's incentives are aligned.
    The problems come from the calls where personal incentives are not aligned. A typical example - the team builds a feature hidden by a feature toggle which is, after a period of A/B testing, enabled globally on the product.
    The existence of the feature toggle raises the complexity of the code - let's say it's used in 10 different places, each of those double the amount of possible code paths. Removing it may be a question of a couple of hours of work and is very clearly work paying for itself in the long term, but PM will not schedule this work, because there's no immediate upside for them personally and the cost of keeping the toggle in code is a long term one, spread over the whole organization.
    In other words, PM is more likely to get a bonus by slashing work on such tech debt items (and thus them personally delivering the features faster) rather than punished for keeping the toggles/complexity behind.
    
    NAHWheatCracker 9 months ago
    
    I agree with you completely that you need trust.
    > Our job is to highlight the tradeoffs they’re making so they can make informed decisions.
    This is an oft-stated thing that I oft-disagree with. It states that engineers ought to be subordinate to PMs, which shouldn't always be the case.
    If you have shit engineers and great PMs, the best outcome is likely to shift decision making to PMs. If you have great engineers and shit PMs, decision making should shift towards engineers.
    If they are both equivalently shit or great, it should be a balance. I believe this is the most likely scenario. I believe that balance is thrown out the window if engineers "highlight the tradeoffs" while the actual decision making is lies with the PMs.
    How to actually achieve balance is extremely idiomatic to the team and organization. It's hard to get people to have adult, non-confrontational discussions about this sort of thing, however. Too many people will treat it as a negotiation.
    
    Swizec 9 months ago
    
    > This is an oft-stated thing that I oft-disagree with. It states that engineers ought to be subordinate to PMs, which shouldn't always be the case.
    I think of it more as a partnership.
    If I’m in charge of getting groceries and you’re in charge of budgets, we need to have an informed discussion on what exactly is our budget and what food we need so we don’t starve. Sure I could blow the whole budget on steak and I might even love eating nothing but steak for 3 days, but eventually some carbs would be nice. Likewise neither of us will be happy if I go max stingy and buy nothing but bags of rice for the week.
    The reason I think PMs should make the final call is not that engineers are subordinate, it’s that PMs are accountable. (RACI – responsible, accountable, consulted, informed). The person whose ass is on the line makes the call.
    Usually when I ask engineers if they want to be accountable for making the call (and its outcome), things get real quiet real fast :)
    
    NAHWheatCracker 9 months ago
    
    If PMs are accountable, then I'm with you. Decision making should lie with those accountable.
    From what I've seen, accountability doesn't mean much. Could be the places I've worked. Poor PMs get promoted despite running projects into the ground, good engineers get held back despite pushing through adverse project plans, vice versa.
    
    jamesfinlayson 9 months ago
    
    I remember working in a team where the backlog was controlled by the PM and he created a separate backlog that developers got to use - unsurprisingly, pretty much nothing ever got moved out of the separate backlog.
    
    patrickmay 9 months ago
    
    > For teams that don't have a good PM, you also need a tech champion.
    That's part of the role of a Technical Program Manager. The Eng Manager, Product Manager, and TPM should form a holy trinity of mutual support, filling in for each other's gaps. When that happens, you get much better odd of having a high performing team.
    Source: I've been both an engineering manager and a TPM. Never the PM, though.
    
    NAHWheatCracker 9 months ago
    
    Perhaps that can work, but I'm skeptical whenever the solution is "another manager".
  - FridgeSeal 9 months ago
    
    > it is up to the PM to prioritize. Engineering’s job is to argue cost.
    That’s a lot of words to say “more features lol” which is basically what every PM I’ve worked with has only wanted.
    
    hinkley 9 months ago
    
    That just lets the lazy devs scapegoat the PM for “not letting the “ work on the tech debt.
    Most people don’t want to work on it. That’s why there is so much. Generating it is like eating candy. It’s unhealthy but you just want to have something sweet right now and the bowl is in reach…
    
    FridgeSeal 9 months ago
    
    Hmmm, not sure I buy the argument.
    Most co-workers I’ve had would have _loved_ to fix the shortcuts and hacks that were done to meet deadlines, but were never given the time. “Refactor while you do new features” works sometimes, but doesn’t work on anything larger scale - e.g. if your overall architecture is collapsing under its own weight, it’s hard to “sneak in” the sort of major work you need to do to fix it.
    
    hinkley 9 months ago
    
    Oh there’s always a few of those, but then there’s the corner cutters who make more of the mess than everyone else, and few impostors who fade into the bushes when there’s a gap in the schedule.
    
    jakjak123 9 months ago
    
    Well, sometimes they want fewer bugs!
  - jakjak123 9 months ago
    
    A good PM will understand that to get to C, we need to build and support A + B before we can build C, and plan for this. Like, if we built B to be a terrible barely working mess, they understand that this will make C basically worthless. But in my experience, this ability is surprisingly rare.
- xarope 9 months ago
  
  you need a smart PM who works closely with the CTO to craft the narrative to sales, that the next critical feature milestone is gated behind fixing said tech debt...
- hinkley 9 months ago
  
  I and one, maybe two other coworkers will fix some of the tech debt while everyone else tries to avoid making eye contact, and we fantasize about a world where voodoo dolls actually work.
- jakjak123 9 months ago
  
  Me too. I have never seen this world
rqmedes 9 months ago

Agile is perfectly optimised for creating tech debt. Corporate software is almost always impossible to change once released so it’s obvious that frequent iterative deliverables that you can only code around or on top of propagate technical debt
- hinkley 9 months ago
  
  Waterfall has time to bury the debt and let the grass grow over the crime scene before people come asking questions.
  This is dev culture not agile culture.
loloquwowndueo 9 months ago

“Not enough feature tickets to exhaust capacity” - I don’t think I’ve ever seen this happen :) PMs and sales always manage to book all available capacity.
- hinkley 9 months ago
  
  I’ve done it once or twice. One particular time there was a lot of hand wringing about how there was nothing to work on. I about saw red. Tech debt and bugs. That’s what you work on.
  That incident really changed my perspective on people who talk about how tech debt is bad. Some of them will roll up their sleeves, but some just want to look high minded without putting in the effort.
deknos 9 months ago

if tech debt would depend on some kind of methodology it would not pop up with XP/Kanban/waterfall.
techdebt can even pop up in unorganized slowmo opensource software.
arrjayh 9 months ago

> My experience at big corporate is that (edit: unmanageable) tech debt is caused by undisciplined and unorganized scrum team.
Yeah, this is 100% correct. I comically left Riot after ~6 months for this exact reason. Obviously it's a large company with many different flavors of teams, and it sounds like this team maybe has gotten it together, but by in large most haven't.
While I was there I was working on some of their core games tooling and felt uneasy about my day-to-day. My teams tech debt was quite literally owning them. Constantly missing sprint scopes, spending countless hours arguing and debating about trivial stuff, it was all a mess. They ended up laying off a number of people from that team in a pretty shifty manner so maybe things have gotten better since then.
- intelVISA 9 months ago
  
  What was the rough team composition?
  - arrjayh 9 months ago
    
    ~20 engineers, ~3 people managers. From what I recall the team had high attrition and shuffled through a number of people managers. When I joined 1 manager was new hire, 1 manager was new-ish hire, 1 manager was fairly seasoned at Riot and had a "good reputation". Was still a total mess.
pjc50 9 months ago

> not enough feature tickets to exhaust capacity
This puts you at grave risk of redundancies.