Resuming Business Operations After a Virus Infection
When dealing with viruses (or any variety of malware), the adage "Prevention is the best medicine" rings true. Still, infections are almost inevitable, and even a minor instance can be debilitating. Organizations that rely on 24/7 availability of their IT resources need a documented incident-response (IR) plan that coordinates their resources to fight infections and restore services with minimal downtime, while simultaneously ensuring data integrity.
An overall incident-response plan requires public relations and internal communications departments to relay information at the appropriate times to employees, business partners, customers, and the public. This article focuses on the benchmarks that let you know when the infecting virus or Trojan is neutralized, and systems can be restored to production safely.
Immediate Response
As some point in the course of operating a network on the Internet, your organization's infrastructure very likely will become infected. As soon as an infection is identified, the incident-response team must swing into action, quarantining affected systems or files, and then remediating any problems caused by the infection. The following sections discuss these processes.
Identification
Your IT department may learn of the infection through an alert from a security monitoring tool, such as a desktop firewall or antivirus software. The product will indicate that it has encountered a file, email message, connection, connection attempt, or other activity that is suspicious and bears closer scrutiny.
Identification of the infection also may come from Help Desk calls complaining of unexpected behavior on desktop or laptop PCs. It usually takes more than one call before people realize what's happening; unfortunately, this lag time gives the infection time to propagate through the network, compromise hosts, steal or corrupt data, and build its footprint.
Without an incident-response plan in place, confusion often erupts over what exactly is happening and what must be done when an organization's security measures have been penetrated by a virus. Organizations with an IR plan, on the other hand, can marshal resources and manpower more quickly to fight the infection. An IR plan tells everyone which steps need to be taken and in what order. It's important that such plans document when systems can be reconnected and business operations resumed.
Quarantine
After the infection has been verified, the next step is to quarantine all infected hosts, limiting the ability of the attacker to spread, transmit data outbound, or receive instructions from command-and-control servers managing a potential bot network. Depending on how far and wide the infection has spread, the quarantine may need to include hosts, subnets, or entire domains. It's possible that restricting the flow of the virus may require disconnecting the entire network from the Internet, at least temporarily. The infected organization must avoid two risks:
- The attacker may alter, destroy, or transmit the organization's data off-network.
- The infected network may serve as a jumping-off point for the infection to affect other networks.
Simultaneously, the staff must log all witnessed behavior and symptoms of the infection. Post this info on the walls of the incident command room to ensure that all known information is shared among staff investigating and fighting the infection. Given the speed of virus propagation and the fact that viruses can lie dormant for varying periods of time, the data-gathering process must continue while you're quarantining hosts and disinfecting machines.
With the infection successfully quarantined, it's time to remediate.
Remediation
Remediation is the process of cleaning the infected hosts or mitigating all witnessed behavior and symptoms of the infection. A host is deemed "clean" if any of the following statements are true:
- A means of deleting the infecting virus, Trojan, or other malware is known, and reinfection has been prevented (for example, if you're using a virus-removal tool or production virus signature from an antivirus/security vendor).
- The host can be compared to a clean image, and data integrity is maintained.
- The host can be rebuilt from a clean image.
These options may not always be available in the midst of an infection, especially in the case of zero-day attacks or infections from "unpopular" viruses against which signatures are not actively being developed.
Your IT team may want to stay offline until every infected host has been thoroughly cleaned or rebuilteven if this process takes a week or longer. However, in many organizations such a loss of computing resources may simply be intolerable. On the other hand, businesses certainly shouldn't operate with an active virus on the network. You need a "middle ground" that allows for resuming operations while disinfection activities are underway. The following section describes such a system.
Mitigation and Countermeasures
With mitigation, security countermeasures are in place to block or deny each witnessed action and behavior of the infectionthis is where the running list of the infection's activity is taken into consideration. For example, let's assume that the following is a listing of witnessed actions and behaviors of the infection:
- Changes to normal operations:
- Windows Explorer breaks
- Unexpected reboots
- Regular and frequent reboots
- Machines shut down upon initial infection
- Changes to log settings
- System configuration changes:
- Windows Registry settings edited
- New user accounts created
- Software installed on host:
- Keystroke logger
- Spyware
- Files created on infected hosts:
- "Password" text file seen on infected hosts
- Filenames corresponding to the name of the virus
- Unusual communication attempts:
- Attempted FTP connections from infected machines to multiple unrecognized addresses
- SMB connections to and between IPC$ shares originated by infected hosts to potentially clean hosts (potential means of virus propagation)
Some of this behavior can be observed by IT personnel, gleaned from user reports to the Help Desk, researched online, or reported by AV vendors or industry and government security alerts. Other evidence requires research into the virus itself. If files are being written to infected hosts, for example, it's helpful to track the path in the directory structure where the files are written, as well as the naming convention used for the directories and files, so this information can be used to search other hosts for signs of the same infection.
Once we know what the virus does, we can design, test, deploy, and verify security measures to block these symptoms. For example, the following measures can address the symptoms from the preceding list:
- Configure and run antivirus software with the latest signatures and rules to block the creation, writing, and execution of all suspected bad files.
- Run scripts in loops to delete all virus-related files and kill its processes and services.
- Prevent repopulation of infected files after deletion.
- Restore Windows Registry settings to correct/default settings.
- Verify normal operations:
- Windows Explorer operating normally
- No unrecognized rebooting
- Delete unauthorized accounts
- Prevent further outbound FTP connection attempts to known "bad" IP addresses
- Change system and domain administrator passwords
Testing and Adjusting Countermeasures
To ensure that countermeasures are working, network connectivity and services can be restored in a deliberate, phased approach, with testing taking place at each stage to ensure that the infection is held in check. For example, we might allow a server to operate disconnected from the network for a period of time, such as three hours or half of a working day, and monitor its behavior. If we see normal behavior (for instance, no unexpected FTP requests), we can take the further step of restoring internal connections, such as connecting a mitigated LAN or host to the serveragain, monitoring operations for evidence of virus activity. All business rules should remain implemented and functioning properly.
Throughout this time, we look for signs telling us that the virus is active, or is being blocked. We also work with users to ensure data integrity.
During these tests, if new or additional hazardous behaviors are witnessed, additional countermeasures must be developed, tested, and implemented before fully restoring services. For example, if infected files reappear in the original directory and/or the Windows\ directory on one or more hosts after they have been deleted, that fact suggests that the virus isn't fully contained and the services are not ready to be restored.
There are many ways in which such re-creation of once-deleted files may happen:
- The virus/Trojan source code remains resident on the host even after deletion of the discovered infected files, such as under alternate filenames or within a zipped file, and are rewritten after deletion.
- The files may be written during system startup, prior to the execution of antivirus software.
- Files may be reintroduced to the system through an infected USB data key.
- The virus may be able to write files to the server while acting as an account with greater privileges than those of the antivirus application (or the agent deleting the files).
These situations can be addressed in part by adjusting and adding the following countermeasures:
- Continuous deletion of all infected files through an automated script.
- Implementing antivirus rules that block execution of all infection-related executables.
- Restoring the Windows Registry settings to their correct/default values and checking the settings to ensure that they don't revert to their prior settings.
- Temporarily disabling USB ports on hosts.
- Changing the password for all system, domain, admin, and privileged user accounts. This practice relies on the virus being unable to recapture these passwords. As long as virus execution is blocked, the virus shouldn't be able to recapture the passwords.
If no new or additional behavior is witnessed, testing of the countermeasures can be expanded to additional hosts and services. If the countermeasures continue to hold against the infection, management can make the tough decision to bring all systems back online.
Virus Removal
Restoring services doesn't complete the incident-response effort. The task of removing the virus remains, and it must be completed by one of the three processes identified in the "Remediation" section. From a strict security perspective, it's preferable to restore services only when you're fully confident that the infection is cleared out. However, security is often only one of the considerations that CIOs must juggleproductivity and profit concerns play important roles as well.
This compromisereconnecting services and resuming operations once the known behavior of the infection is neutralized, but before the virus is truly cleaned outcarries risk. Organizations must be willing to accept this risk to get back online prior to fully cleaning the network.
One additional note: The decision to bring services back onlineor even how you fight infectionsmay not be yours alone. Organizations that have connections with business partners over public or private networks may need to make these decisions in concert with their partners. At the very least, security countermeasures used to mitigate an infection must be shared with all partners prior to reconnecting networks and services, and those partners should be prepared to implement similar measures.
On the Internet, we really are all in this together.