Catching hackers in the act

October 8, 2018

In May, the FBI issued a warning to owners of home routers that their devices might have fallen prey to a malware attack by a group of hackers with ties to the Russian military.

The malware, called VPNFilter, allowed hackers to collect personal information and attack other devices. This attack was notable for its breadth — but it certainly wasn’t unique. An estimated 5.99 billion malware attacks took place in the first half of 2018 alone.

Malware is a favorite tool among cyber criminals who use it in a broad spectrum of cyberattacks, from large-scale banking trojans that steal money from individual accounts to ransomware attacks that destroy data and have led to IT shutdowns at hospitals around the world. Large-scale theft of intellectual property is often accomplished by sophisticated, targeted malware attacks on organizations, such as the 2006 attack on the Democratic National Committee.

At Los Alamos National Laboratory, where some of the nation’s most precious secrets are kept, information is not only closely guarded, tools are being developed to help others detect and respond quickly to targeted attacks.

Understanding the capabilities and intent of malware — a process known as reverse engineering — is a difficult, manual process that can take days or even weeks for an expert analyst. Los Alamos has long been a leader in manual malware analysis, and has found that expert intuition can be augmented by machine learning tools that rapidly identify patterns across large sets of related malware, collected over time.

The lab’s work is loosely based on a biological analogy of code evolution. Functional software — even malicious software — is difficult and expensive to create. Malware developers, like all software engineers, create their programs iteratively by incorporating existing code and refining existing malware to meet their objectives.

Once malware is detected by cyber defenses, attackers make only small changes to circumvent existing detection mechanisms — similar to the small mutations a biological virus develops to avoid destruction by the human immune system. For cyber defenders, it is critical to track these iterative refinements in malware because it allows them to compare new threats to previously analyzed attacks.

Defenders ask: Is this new malware sample simply a cosmetic change to hide old code, or could the small change be a significant new strategy on the part of the attacker?

Code writers have a style, or voice, similar to writers who have recognizable ways of arranging their words. So the coder leaves fingerprints on the bits of malware code that remain unchanged, leaving a trail back to the source of the threat. This broad evolutionary analysis of malware, especially with an interest in source attribution, distinguishes Los Alamos research from anti-malware efforts that focus largely on blocking malware rather than studying it.

LANL’s newest research is based on a kind of machine learning called deep learning, which is used to compute the similarities between related malware samples that have been disguised by attackers. LANL takes the same approach used in state-of-the-art language translation systems, such as Google Translate.

In language translation, these novel deep learning methods summarize a sentence or paragraph in a language-agnostic, computerized representation. This pattern then becomes the key to decode the sentence or paragraph into other languages. Importantly, these language translation approaches are trained in a statistical manner, requiring only translated pairs of training documents in different languages. Similarly sets of related malware code, collected over time, are used to learn a “translation” that allows us to track adversaries better than existing anti-virus tools.

Keeping up with innovative adversaries means LANL has have to anticipate new types of threats and more sophisticated versions of existing ones. Malware analysis won’t prevent all cyber-attacks, though.

The future of cyber security might instead depend on analyzing the behavior of an already-infected machine rather than just screening for malware as it arrives. While biological viruses operate according to their own objectives, computer viruses often facilitate remote control of their host by an attacker.

The real signature of cyber attacks, therefore, is left by the actions of an attacker.

Ongoing research at Los Alamos on advanced user-behavior analysis holds the promise of uncovering these patterns of attacks in real time. Whatever the future holds, cyber attacks will only grow increasingly sophisticated with each passing year. So must our ability to stop them.

Juston Moore is a data scientist and project leader in the Advanced Research in Cyber Systems group at Los Alamos National Laboratory.

Update hourly