Even the best service support system can sometimes behave like a whack-a-mole arcade game with issues popping up like moles from their holes. Whack one mole down with the big mallet and more appear. Solve one problem and two new problems pop up. Solve the new problems and, well, you get the picture. To win the whack-a-mole support game you need the best mallet you can find—I recommend the ITIL best practices framework.
The way the ITIL best practices mallet is swung has been changing. For a long time, incident management was the first ITIL practice implemented at a site, often in parallel with problem management. Recently, I’m seeing that many sites opt to establish a change management practice before problem management.
Starting with incidents aligns with many maturity models, where the first stage is "reactive" and "proactive" comes later. Incident management is reactive. After an interruption or degradation of the service, the incident manager reacts to the incident and acts to return to business as usual. Problem management is proactive. Problems are identified and fixed in order to prevent future incidents.
Still, it’s surprising that sites are implementing incident management, bypassing problem management, and implementing change management. With the expectation of simultaneous implementation, incident and problem management are seldom talked about separately. Service desk vendors usually sell incident and problem management as part of a single package.
Prior to the rise of ITIL, most sites combined elements of both incident and problem management in a single process. Service desk managers would seek out and fix root causes when they had the time and resources. Consequently, most non-ITIL (or perhaps pre-ITIL) service desks have some problem management skills.
If ITIL problem management is not being implemented, is ITIL problem management broken?
I don’t think so. There are practical reasons for deferring—but not skipping—implementation of problem management practice, especially if a site can jump on change management.
It’s all about ROI. Incident management has a high and quick ROI because it is relatively easy to implement and touches on all support issues. From my experience, the primary benefit from incident management comes from the razor sharp focus it places on impact analysis and service restoration.
A service desk responding to an incident faces two questions: 1) How important is this? 2) How do I fix it? But service desks tend to make mistakes if they combine the two questions. The importance of the incident must determine the urgency and resources that are applied to the resolution.
Frequently, the technician will confuse the technical and business significance of an issue. He or she might assign a high priority to an incident that affects a big piece of equipment but fail to gauge the immediate consequences for business. In other words, they are lead astray by thinking about what has to be fixed before they have decided on the relative importance.
For example, the failure of a large backup storage unit is important, but the probabilities are good that it can be down for a day or two without any business consequences. At the same time, a minor situation that has major business impact may not receive the attention it deserves. For example, a routine printer problem that is holding up publication of an audit probably should take precedence over the storage unit. In this case, if the team starts on the big fix to the storage unit, instead of the big business problem, after hours of effort the service desk team gets whacked by an angry executive demanding his or her audit report.
Good incident management is the mallet that hits the right mole. Resolve the incidents first that affect your business most. That is the first step in winning the whack-a-mole support game.
Comments