Open source tool tackles untrustworthy AI

Technology News | July 29, 2024

By Nick Flaherty

AI Software & Embedded tools

Cette publication existe aussi en Français

The US government has commissioned an open source tool to assess how safe AI frameworks are.

The Dioptra tool tests the effects of adversarial attacks on machine learning models to help developers and customers tackle untrustworthy AI and see how well a frameworks stands up to a variety of adversarial attacks.

One of the vulnerabilities of an AI system is the model at its core. By exposing a model to large amounts of training data, it learns to make decisions. But if adversaries poison the training data with inaccuracies — for example, by introducing data that can cause the model to misidentify stop signs as speed limit signs — the model can make incorrect, potentially disastrous decisions.

The open-source Dioptra software is available for free download from Github to help companies businesses conduct evaluations to assess AI developers’ claims about system performance and so tackle untrustworthy AI.

Dioptra provides a REST API, which can be controlled via an intuitive web interface, a Python client, or any REST client library of the user’s choice for designing, managing, executing, and tracking experiments.

The US National Institute for Standards and Technology (NIST) commissioned Dioptra to allow a user to determine what sorts of attacks would make the model perform less effectively and quantifying the performance reduction so that the user can learn how often and under what circumstances the system would fail. This is key for the use of AI in safety critical systems.

NIST has also developed a profile to identify unique risks posed by generative AI as another type of untrustworthy AI. This proposes actions for generative AI risk management that best aligns with their goals and priorities. This is based on a list of 12 risks and just over 200 actions that developers can take to manage them.

The 12 risks include a lowered barrier to entry for cybersecurity attacks, the production of mis- and disinformation or hate speech and other harmful content, and generative AI systems confabulating or “hallucinating” output. After describing each risk, the document presents a matrix of actions that developers can take to mitigate them, mapped to the AI RMF.

More guidelines for the Secure Software Development Practices for Generative AI and Dual-Use Foundation Models are designed to be used alongside the Secure Software Development Framework (SSDF). The SSDF is broadly concerned with software coding practices, and the companion resource expands the SSDF in part to address a major concern with generative AI systems: Untrustworthy AI systems can be compromised with malicious training data that adversely affect the AI system’s performance.

In addition to covering aspects of the training and use of AI systems, this guidance document identifies potential risk factors and strategies to address them. Among other recommendations, it suggests analyzing training data for signs of poisoning, bias, homogeneity and tampering.

github.com/usnistgov/dioptra

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to : eeNews on Google News