Catching up on lean (again)

The most interesting revelations come by chance. During a regular one-on-one meeting with a TL, one thing led to another and he asked me if I had read The Phoenix Project. I vaguely remembered my wife saying that a “Phoenix book” is upsetting her because it’s too much like real life. Didn’t need telling thrice, I decided to pick the book pronto Toronto.

The IT novel captured my attention from the very start. It is relatable, funny and focussed on accurately illustrating life in the technology industry. I especially enjoyed the portrayal of day to day problems and instinctive solutions. The concepts, although known to me, were so interesting that I decided to pick up the original book that The Phoenix Project is based on — The Goal.

I had my doubts about The Goal. It is based solely on the manufacturing sector. But it turns out, software development “management” can review manufacturing lines directly to explain things that are very close to home (no surprises for many of you, right?).

FWIW I think the story and the lessons are worth sharing.

Since Jan 2020, two of my teams have been grappling with (if not chocking out on) a never ending stream of tickets from our users \ customer support (CS). This started due to a very justifiable change in our department.

As part of a major process change at that time, we had decided to rip the band-aid and let go of a centralized “Support initiative” (SI) pod. The SI pod was comprised of full stack developers from all of our teams on a rotation basis. Instead of this, we decided to have “in team OPS”. We knew this would solve many problems that I won’t go into. We had hoped that this would increase each team’s domain expertise, allow them to detect and solve root problems faster overtime and eventually make for a better, healthier ownership. And it did achieve all of those things, for the most part.

From the get go it became clear that these two domains were generating the highest number of issues. In terms of effort, the number of developers required to maintain our SLAs was equal to the original SI pod model.

Fast forward to summer of 2020. Like many other SAAS companies & startups, we needed to make quick tactical decision to survive, thrive & inspire. The Phoenix Project talks about a concept of Value Stream.

A value stream is an ordered sequence of activities required to take an idea/feature from its inception to the hands of the customer.

From a value stream POV, we increased throughput to breakneck speed in order to meet the new market demands and changes required . And given our higher education seasonality, that work had almost all come to production once Fall semester started. We hit many crises and the team’s (and department’s) dedication managed to get us past the difficulties.

I was determined to understand the root causes and find ways to prevent such a situation in the future. I spent many weeks trying to unpack the many factors and arrive at a conclusion. The Goal and its Five Focusing Steps helped me do just that.

Strengthening any link of a chain (apart from the weakest) is a waste of time and energy”

Basically what is our Bottleneck? Wait, could it be the domains of these two teams? Obvious? Sh*t!

The moment that idea came to my mind, a wave of supporting evidence hit me all at once: The backlog of work, the projects that keep running into our “cool downs”, how other team’s work keeps generating CS issues for these two teams. To be clear, I could be wrong, but the evidence is overwhelming. The more I looked at it, the more data I was able to find. But TBH , just seeing how big the CS backlog of lower priority tickets is probably sufficient.

Armed with this new revelation I’m at the start of this journey to apply lean in the way I never have before. Yes, Scrum & Kanban have been awesome to “work in” as a dev, “work with” as a manager. But working with stake holders & execs to adjust our department processes to ensure we work with the bottleneck — now that seems like a new challenge! So on we go.

The output of the constraint governs or restricts the output of the organization as a whole.

Like The Goal, we have recently worked with CS, Product & QA to plot the value stream. We then aligned on what is our goal and easily identified the bottleneck to achieving it. And after some brainstorming, we decided that for incoming tickets we will add the following info:
a. “is reproducible” by PM\QA\Dev in their sandbox
b. “Confidence” that we have in the reach of issue
Those two factors, along with “impact” & “reach”, will be taken into account for prioritization with the hope that they maximize throughput of “fixed” and reduce “can’t repro”, “wont fix” & “duplicate”. The latter resolutions could (should?) be viewed as a waste of developers time (even though both we & our users can see the pain of these issues in Fullstory).

We have about 2 months to observe if this works. Then the summer term will begin and that usually means 10% usage of our regular Fall & Winter terms. Either way, I’m hopeful that we will learn and iterate another attempt before summer, or be prepared to try in the Fall.

There are 3 more steps that are getting planned at the moment. If you have reached this part and hopefully have decent context, I’d be happy to discuss any thoughts or suggestions we could consider for either of the steps.
Until then, staying true to stockdale’s paradox

“You must never confuse faith that you will prevail in the end — which you can never afford to lose — with the discipline to confront the most brutal facts of your current reality, whatever they might be.” — Admiral James Stockdale.

Thanks for reading.

Director of Engineering at Top Hat. Worked at successful startups and large technology companies in various high responsibility roles.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store