Friday, March 20, 2020

Production Support


I have seen my share of IT processes across a number of companies over the past 25 years. The ones around production support have always been the most interesting, specifically, in situations where teams/organizations get themselves into trouble and then spend their life on crisis/bridge calls trying to get themselves out of it. A lot can be learned about the culture of a company going through one of these ..

Whilst the basics of best-practices around IT production support have been enshrined in standards such as ITIL, just following a book/rules has never been a recipe for extra-ordinary.

One specific dynamic I have often debated in my mind is around the "segregation of duties". I remember my time in the early 90s where it became a bad thing suddenly for us developers to have access to production. Heaven forbid, we would make changes in production on the fly .. this was true in IT telecom (i.e. at least until telecom became IT), primarily driven via the discipline of managing mission critical networks and related lifeline services. I did have a certain respect for this, especially given the hard-coded culture of service availability in telecom companies.

Of course, in the finance industry, I found that even data-center sysadmins were not trusted with privileged access to the very servers they were meant to administer. Also two-eyes-four-eyes .. something we can thank the banking sector for.

To my point and the picture above. I have tried to capture what I see is an often misunderstood and consequently dysfunctional state of affairs within an IT organization. I define "commando" as the behaviors where folks make changes to production without any due diligence or testing etc. I define "process driven" as everything by the book and in the extreme case, overly constraining and time consuming (without adding any value).

On the other axis, I define "trial & error" as the mode of analysis/resolution that teams resort to when they lack the technical knowledge/understanding/skills of what they are supporting. This can be methodical and process driven and will ultimately yield a result (however, on speed, you need to be lucky). NB> I don't classify the "bounce the servers" solution as necessarily "trial & error" as it is an effective step in cutting your losses on troubleshooting when you have SLAs. I define "knowledge/skills" based as state where the highest technical skills (typically the original developers/engineers) are applied and engaged on problem solving. NB> This is NOT a line that defines Tier2 application support vs Tier3.

On target zones, it really depends on the business impact. Typically, however, in most companies, it is a one-size fits all approach. Companies have a hard enough time getting consistency in their performance.

Break-glass is an interesting one and typically is meant as a safety or "panic" button, when normal process doesn't work or speed is required. This allows for the developers to take control and break the "segregation of duties" barriers.

Questions to ask when you assess where you are on the chart :
1. when in a crisis, are your smartest/highest skilled people engaged (and accountable) ?
2. do they have access when needed (and the tools) ?
3. are they allowed to lead or are they muted by process ?


Friday, March 13, 2020

I am always amazed when I find IT teams with this behavior :
1. Business requires something urgently from IT
2. IT assesses the change
3. IT then designs and solutions the change
4. IT then risk assesses the change as "high risk"
5. IT then presents the solution with a "high risk" profile to the business pretty much scaring the pants off everyone.
6. Business then backs off the ask
7. End = do nothing. IT feels happy they made a good "risk" based decision

Bright futures in such companies .. 

Monday, August 23, 2010

Good software starts with good people - not cheap people

In well over a decade of working for software consultancies, I have observed a recurring problem that I find personally frustrating.

Total cost of software development is a very difficult figure to calculate - and even more difficult to compare across projects.  Because of this difficulty, some corporates attempt to use developer day rates as their measure of spending effectiveness.  This approach tends to mean that they hire only the cheapest developers available.  They then turn to "consultancies" to provide additional "expertise" when quality plummets and costs begin to spiral out of control.  The expectation is that the consultancy will provide better developers than they are willing to hire themselves.  The client often seems to assume that, because they are paying more for a consultant than for in-house staff, the consultant will be more skilled.  

Unfortunately, this approach is driven by a fundamental lack of understanding of the software development market.  In reality, the inflated price of a "consultant" is usually driven entirely by the margin that a consultancy must charge in order to themselves make a profit.  This margin is often quite large, as a consultancy must cover their overheads while they are short on work - not just while their staff are gainfully employed.

A consultancy is driven by entirely different incentives than its clients.  While their clients expect (in some fashion) to profit off the software they write, a consultancy expects to make money off the writing of the software.  Unless they charge a margin, they will make a loss.

Although hiring better developers can lead to a better quality product and lower development costs, (and my own experience has shown this to be the case, over and over,) these factors are not of direct relevance to a consultancy.  When they are done, they will move on to the next project without ever having to actually use the software.  If the software achieves it return on investment goals, they will generally not see a dime.  Providing better quality (and therefore more expensive) staff is only profitable to a consultancy if they can charge a higher margin - and that is on top of their base cost!  Given the propensity for client organisations to seek lower day rates, it is sometimes difficult to keep a senior consultant charged out at the margin necessary for profitability.  One thing is certain - those clients willing to pay top rates are the ones that get first call on the top talent.

Because they derive no direct benefit from better quality software and/or lower delivery costs, most consultancies (despite what they might prefer) will attempt to address quality and cost only to the extent demanded by the market.  (i.e. their clients.)  The same market that sees most of their clients more driven by day rate than by total cost of development, also forces them to adapt their own hiring practices accordingly.  (In fact, I have personally worked for a consultancy that tried to adopt to the opposite approach and only hire the best possible developers.  Unfortunately, they tended to find that their clients did not want to pay the inflated day rate that the best developers required.)  Most software consultancies are forced to just "build the pyramid".  i.e. Hire a few senior developers and supplement (or swamp) them with a multitude of minimum-rate staff.

So, those very organisations that stand to profit the most from hiring top-quality development staff, tend to instead pour their money into paying consultancy margins in order to sort out the problems that they create by cutting down on day rates in the first place.  At best, they will sometimes be lucky enough to find a consultant who is actually worth the margin they are paying.  At worst, they will end up with the cheapest (although still expensive, compared to their in-house staff) consultants they can find and simply "prove" to themselves that more expensive developers are not really worth the extra money.  Regardless, they then get rid of the consultancy (in order to cut costs) as soon as the immediate problem is solved - and the cycle begins again.

Almost laughably, those organisations sometimes then try to address the inadequacies of their minimum-rate development staff by dictating in detail how they should go about their jobs - as if they had a clue themselves!

To those organisations that I am describing, I would like to note, for the record, that I have personally (on too many occasions,) replaced teams of minimum-rate developers with developers whose day rates are 3 times that of the incumbents and still realised overall cost savings of greater than 50%.  (Note that before-and-after comparisons of total development cost are much easier to draw than cross-project comparisons.)  While this might keep me employed, (or run off my feet, more often than not,) that employment is (in light of the larger picture I have attempted to paint here,) gainful only to myself and hardly fulfilling. Perhaps it is time to consider another approach?

 The only path to good software is through good software developers.

Thursday, January 28, 2010

barely sufficient

I observed a very talented engineering team making a basic mistake today.

I think we often mis-interpret what 'barely sufficient' means in the context of agile software development and delivery. It is mistakenly interpreted as an excuse to cut down requirements. In reality, I believe it really applies more to engineering and design than to requirements. There are two different syndromes to be careful of in a software project - 'scope creep' and 'creeping elegance'.

In an agile methodology, we iterate through cycles of 'design a little', 'code a little', 'test a little'. We embarq on writing software often without fully understanding the problem. I am a big fan of this vs up front. Personally, I pretty much think on the keyboard as I believe, people learn incrementally. That said, I am aware always that there is a risk in this mode until I have a full grasp of the problem. My energy is always directed towards activities or areas that help me flesh out unknowns. Whilst in this mode, I 'hack' for speed and refactor only after I think I have my arms around the business problem.

What I found the team doing was laying the 'foundation' down aka building middleware. That by itself wasnt a problem, however, they had taken their eyes of the business problem and failed to deliver the results in the needed timeframe.

Programming competitions are a great way to teach developers this mindset. You have a fixed duration so you have to be quick. You have to focus on the problem and only the problem .. no deviating onto bunny trails. You have to solve the problem in the simplest quickest way. As engineers, we love complexity so, the last discipline is the hardest.

Thursday, June 4, 2009

Service granularity and re-use

Architects in the IT organisation where I work display an interesting tendency to equate re-usabilty with granularity.  The received wisdom seems to be that the more granular a service is, the more re-useable it is.  To a certain extent it is useful to have the ability to mix and match just the bits of functionality you require.  This becomes detrimental at the point where it starts to push behaviour to toward the consuming systems, of which there are usually more than one.

As a case in point, a new system is being written, which (among other things) exposes the ability to lookup items in a cache.  It employs a side-caching strategy to do this.  It's API looks something like this:
MyItemCache
    findItem(itemId : String) : Item
    createItem(item : Item)
Consumers of this service will call findItem() to check for the item in the cache, using the Item if it is there.  If the Item is not there, they are expected to fetch the Item from the source system - which is a different system entirely - and then add it to the cache, using createItem().

There are a number of issues with this approach.
* The fact that the cache is a side cache (and therefore does not front the source system) means that consumers of this cache must also have knowledge of the source system - making the overall architecture more complicated and brittle.
* Each consuming system is expected to implement over again the logic required to check in the cache, then fetch and add the Item if it is not already cached.

While this last point may seem like a small amount of code to write, it should be remembered that forcing each client system to re-implement this logic means that the difficultly of changing this logic is multiplied by the number of client systems, with all the associated co-ordination of teams that this involves.  Add to that the impact that differing or buggy implementations might add to the mix and you have a problem far more damaging that the cost of a few lines of code.

An alternative approach, which would eliminate both of these problems, would be to make the cache a through-cache, with the cache itself handling the "if not cached: fetch from source" behaviour. Having eliminated the need to manually add things to the cache, the service interface is simplified, as follows:
MyService
    findItem(itemId : String) : Item
Client systems are then relieved of responsibility for implementing this logic over and over, and need not have knowledge of yet another system.  Re-use is enhanced, while complexity is reduced.  Everyone is happy!  :-)