How to Incorporate QA in AI Machine Learning Applications

By Celin Das
16-11-2022
Technology

Quality Assurance for AI/ML applications requires a different approach than traditional applications. Unlike the latter, which set business rules with clear outcomes, the evolving nature of AI models makes their products ambiguous and unpredictable. Quality Assurance methods must accommodate this complexity and overcome issues related to broad scene coverage, lack of security, privacy, and trust.

Verifications and Validations of AI/ML applications?
The standard approach to creating AI models, known as the Standard Cross-Industry Process for Data Mining (CRISP-DM), starts with data collection, preparation, and cleaning. The resulting data is then iteratively used in multiple model approaches before finalizing the perfect model. Testing this model first uses a subset of the information that has gone through the above process. By feeding this data (test data) into the model, run multiple combinations of hyperparameters or variants on the model to see its correctness or accuracy, subtly supported by appropriate metrics.

These test datasets are randomly generated from the original dataset and applied to the model. This process is very similar to new data simulation methods and will determine how future AI models will scale.

Data-Driven Quality Assurance Challenges for AI/ML Applications
There are countless issues with data-driven testing and quality assurance of the above AI/ML applications, some of which are listed below.

Interpretability
The decision-making algorithm of an AI model has always been regarded as a black box. Recently, there has been a clear trend toward making models transparent by explaining how they arrive at a set of results based on a set of inputs. It aids in understanding and improving model performance and helps recipients understand model behavior. This is even more important in areas with more complaints, such as insurance or the health care system. Some countries also require explanations for decisions made in conjunction with AI models.

Post facto analysis is the key to interpretability. By performing post-analysis on specific instances misclassified by the AI model, data scientists can understand the parts of the dataset that the model actively focuses on when making decisions. Results classified as positive were also analyzed similarly.

Bias
The decision-making ability of an AI model depends mainly on the quality of the data it is exposed to. There are many cases where bias seeps into how input data or models are streamed, such as Facebook’s sexist ads or Amazon’s AI-based automated recruiting systems that expose discrimination against women.

The historical data Amazon uses for its systems has been heavily skewed over the past decade due to the dominance of men in the workforce and the tech industry. Even large models like Open AI or Code pilot suffer from world bias permeating their models as they are trained on inherently biased global datasets. While removing bias, it is enough to understand what the data was selected for and which features contributed to the decision.

A bias in the model can be detected by identifying the attributes that excessively impact it. Once these attributes are identified, they are tested to see if they represent the entire dataset.

Safety
According to the Deloitte State of AI in Enterprise Survey, 62% of respondents believe cybersecurity risk is an important issue for AI adoption. Forrester Consulting's Emergence of Offensive AI report found that 88% of security industry decision-makers believe offensive AI is on the horizon.

Since AI models are built on the principle of getting more intelligent with each iteration of actual data, attacks on such systems also tend to get smarter. Things are further complicated by the advent of adversarial hacks, which aim to attack AI models by modifying a simple aspect of the input data down to a single pixel in an image. Such small changes can introduce more severe disruptions in the model, leading to misclassification and erroneous results.

The starting point for overcoming such security issues is understanding the types of attacks and vulnerabilities in the model that hackers can exploit. It is critical to collect literature on such attacks and domain knowledge to create a repository that can predict such attacks in the future. Employing AI-based cybersecurity systems is an effective technique for deterring hackers. AI-based methods can predict how hackers will react, similar to other outcomes.

Privacy
As privacy concerns such as GDPR, CCPA, etc., become increasingly common across all applications and data systems, AI models are also under scrutiny. Not to mention that AI systems rely heavily on massive amounts of real-time data to make intelligent decisions—data that can at least reveal a wealth of information about a person's demographics, behavior, and consumption attributes.

The AI model needs to be examined to assess how it discloses information to address privacy concerns. Privacy-conscious AI models take appropriate steps to de-anonymize, pseudonymize, or use state-of-the-art privacy-enhancing techniques. The model can be evaluated for privacy violations by analyzing how a privacy attacker takes training data input from the model and effectively modifies it to gain access to personally identifiable information.

The two-step process of discovering derivable training data through an inference attack and then identifying the presence or absence of PII in the data helps identify privacy concerns when deploying models.

Conclusion
Accurate testing of AI-based applications requires extending the concept of quality assurance from the scope of performance, reliability, and stability to new dimensions of explainability, security, bias, and privacy. The international standardization community is also working on this idea by extending the traditional ISO 25010 standard to include the above aspects. As AI/ML model development continues, focusing on all of these aspects will result in more robust, continuously learning, and compliant models capable of producing more accurate and realistic results.

Share It

Author

Celin Das

Celin Das is the Senior Consultant (QAT Spoc) - Technical Content Writer at Trigent Software Inc. She has 10 years of extensive experience in Quality Assurance Testing. She has a Post Graduate Degree in Computer Application and has worked on various domains during her time during software testing. She excels in writing technical papers, and blogs, and doing research on new technologies.