
Open source AI model for cybersecurity
Researchers in the US have used three key cybersecurity databases to train an open source AI model to protect all kinds of electronic systems.
A team from the US Department of Energy’s Pacific Northwest National Laboratory, Purdue University, Carnegie Mellon University and Boise State University developed a natural language AI model that that automatically links vulnerabilities to specific lines of attack. This should help defenders spot and prevent attacks more often and more quickly.
The model, VWC-MAP, is open source with a portion now available on GitHub. The team will release the rest of the code soon.
- BrainChip teams for grid AI cyber threat detection
- Report highlights major industrial cybersecurity risk
The model automatically links vulnerabilities to the appropriate weaknesses with up to 87 percent accuracy, and links weaknesses to appropriate attack patterns with up to 80 percent accuracy.
One hurdle is the lack of labelled data for training. For example, currently very few vulnerabilities—less than 1%—are linked to specific attacks so there is a limited amount of data for training.
So the team fine-tuned pretrained natural language models, using both an auto-encoder (BERT) and a sequence-to-sequence model (T5). The first approach used a language model to associate CVEs to CWEs and then CWEs to CAPECs through a binary link prediction approach.
The second approach used sequence-to-sequence techniques to translate CWEs to CAPECs with intuitive prompts for ranking the associations. The approaches generated very similar results, which were then validated by the cybersecurity expert on the team.
“Cyber defenders are inundated with information and lines of code. What they need is interpretation and support for prioritization. Where are we vulnerable? What actions can we take?” said Mahantesh Halappanavar, a chief computer scientist at PNNL who led the overall effort.
- Schneider, Omron targeted by electricity grid malware
- Vulnerabilities in PMbus can brick server boards
“If you are a cyber defender, you may be dealing with hundreds of vulnerabilities a day. You need to know how those could be exploited and what you need to do to mitigate those threats. That’s the crucial missing piece,” added Halappanavar. “You want to know the implications of a bug, how that might be exploited, and how to stop that threat.”
The new AI model uses natural language processing and supervised learning to bridge information in three separate cybersecurity databases.
Over 200,000 “common vulnerabilities and exposures” or CVEs, are listed in a National Vulnerability Database maintained by the Information Technology Laboratory.
A slimmer set of definitions classifies the vulnerabilities into categories based on what could happen if the vulnerabilities were acted upon. There are about 1,000 “common weakness enumerations” or CWEs listed in the Common Weakness Enumeration database maintained by MITRE.
Actual exploitations of vulnerabilities are recorded in the Common Attack Pattern Enumeration and Classification resource, or CAPEC, also maintained by MITRE.
While all three databases have information crucial for cyber defenders, there have been few attempts to knit all three together so that a user can quickly detect and understand possible threats and their origins, and then weaken or prevent these threats and attacks.
“If we can classify the vulnerabilities into general categories, and we know exactly how an attack might proceed, we could neutralize threats much more efficiently,” said Halappanavar. “The higher you go in classifying the bugs, the more threats you can stop with one action. An ideal goal is to prevent all possible exploitations.”
“We’re putting this out there for others to test, to go through the vulnerabilities and make sure the model bins them appropriately,” he said.
