IT Incidents are the result of service failures or unplanned interruption to service,a reduction in the quality of a service or an event that has not yet impacted the service to the customer. Incident Management is often referred to as the way that the Service Desk puts out the ‘daily fires.’.
The challenges are significant - a recent global study by the SANS Institute showed that 61% of businesses reported experiencing at least one critical incident involving a data breach, unauthorized access, denial of service or malware infection over the past two years. The largest percentage of respondents (48%) experienced up to 25 incidents.
Effective incident management requires a framework for proactive planning, testing and activation of incident scenarios, and systems that ensure the right people have the right information to responds in the right manner as incidents invariably occur.
ITIL® Incident Management Lifecycle
Incidents are a daily occurrence in most businesses, and rather than try to eliminate them altogether, most organisations work to resolve these on a staged basis.
The “Incident Management Lifecycle”, derived from ITIL® (Figure 1.1, below) is a good indication of a standard framework to follow when your business faces an IT incident.
ITIL® Provides a Series of Recommended Steps for Management of IT Incidents
Major incident management steps
Identify and log
A service manager recognizes a major incident and logs into the system.
Categorize and prioritize
The type of incident is identified, and prioritised by potential impact.
The service manager steps outside the regular incident processvand alerts the major incident managers on duty.
The major incident manager who accepts the case determines whether the alert is a false alarm and what the incident is. Depending on severity, the incident might be further escalated, or immediately resolved if the current team has the capability to implement a fix.
The final stage is to review the actions that took place during the incident to identify improvements that can prevent a similar incident from occurring again.
Incident Communication Challenges
Incidents are categorized by severity, (e.g. High, Medium, Low) based on impact and urgency, and resolved accordingly.
Low impact incidents can be deprioritized and dealt with when resources become available, while high impact events need to be treated with appropriate urgency. The real issue with this approach is the overwhelming deluge of notifications flooding the inbox of IT staff every day, which in larger organisations can exceed 100,000 per day.
These notifications range from basic updates and progress reports to maintenance updates and outage warnings, making it entirely possible to miss the major incident warnings that do need urgent response.
The other major challenge is the problem of identifying the right person, and bringing all of the team together who need to manage the response.
According to the SANS Institute, more than one-quarter of Incident Response (IR) professionals (26%) are dissatisfied with their current organization’s IR capabilities, calling them ineffective, while only 9% categorize their processes as very effective.
68% of respondents to the SANS Survey projected that improvements in their IR capabilities and processes would come from Automation and SIEM integration tools that increase visibility into threats and how they apply to their environment, including scoping and remediation capabilities. State of the art incident management systems built around a Unified Communications platform, combined with proactive scenario planning provides the tools needed to ensure rapid incident response and resolution.
Resilient Communications - Best Practices
Having a clearly defined plan in place for communicating in different scenarios cuts down response time, improves accuracy of contact, and ensures the right people are able to be reached in a timely manner.
Multi-Channel Message Streams
Message templates should be prepared with specifics which can be rapidly altered during incidents, thereby saving time by providing pre-defined communication and response options.
Where possible, communications platforms should be integrated with monitoring systems, allowing details to be auto-populated into message templates. Tickets can be raised automatically and sent directly to the resolution team members.
Two-way Conversation Flow
It’s not enough to just send messages. There needs to be a system in place to track receipt, allow the receiver to respond as needed, and escalate when required.
The best defence is a good offence. Incidents affecting business operations are a daily occurrence, and without proper management and communications, incidents can escalate into critical events that could put an organization’s survival at risk.