End-to-end machine learning tools such as AI Studio from Blaize are aimed at removing the need for data scientist in developing and deploying edge AI applications, especially in camera systems. But those data scientists have also highlighted that AI frameworks are vulnerable to bias in the data that is used for training.
New tools are emerging this week to analyse and even correct bias in such frameworks, which is a key step for edge AI system developers.
Amazon SageMaker Clarify is a new tool released this week that helps customers detect bias in machine learning (ML) models, and increase transparency by helping explain the behaviour of the model. These models are built by training algorithms that learn statistical patterns present in datasets, but there are issues on how the framework makes a prediction and how to detect anomalies.
With AI Studio, launched yesterday, Blaize has tried to address some of this with transparent steps and open frameworks such as ONNX and openVL.
Even with the best of intentions, bias issues may exist in datasets and be introduced into models with business, ethical, and regulatory consequences, says Julien Simon, Artificial Intelligence & Machine Learning Evangelist for EMEA at Amazon Wes services (AWS). This means it is important for model administrators to be aware of potential sources of bias in production systems.
For simple and well-understood algorithms like linear regression or tree-based algorithms, it’s reasonably easy to crack the model open, inspect the parameters that it learned during training, and figure out which features it predominantly uses, he says.
However, as models become more and more complex with deep learning, this kind of analysis becomes impossible. Many companies and organizations may need ML models to be explainable before they can be used in production. In addition, some regulations may require explainability when ML models are used as part of consequential decision making, and closing the loop, explainability can also help detect bias.
Sagemaker Clarify is integrated with SageMaker Studio, a web-based integrated development environment for ML, as well as with other SageMaker capabilities like Data Wrangler, Experiments and Model Monitor. This allows data scientists to detect bias in datasets prior to training, and in models after training and measure the bias using a variety of statistical metrics. This helps to explain how feature values contribute to the predicted outcome, both for the model overall and for individual predictions. This can also detect bias drift over time by monitoring the learning process.
A tool developed by Synthesized in London aims to correct the problem. The tool is based on synthetic data that allows companies to rebalance biased datasets. Rather than image recognition, Synthesized’s technology analyses details in the PII and makes it unbiased by creating data for demographics that are missing.
The key is to integrate these bias monitoring and mitigation tools into the edge AI workflow so that developers can use them rathe than data scientists.
“This is important as we recognise there are exiting tools and workflows – providing the proper integration points would be the way to connect to the existing tools,” said Dmitry Zakharchenko, VP Research & Development at Blaize. “Tools for the bias is part of the data preparation workflow. We have developed a robust set of APIs to integrate with existing tools especially in the data wrangling tools and we should be taking advantage of those tools,” he added.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.