ISTQB Certified Tester AI Testing (CT-AI) v2.0 - ISTQB CT-AI v2.0 Sample Exam 1
| question.id | Questions |
|---|---|
| question_th |
Q1:
Chapter: 1 - Introduction to Artificial Intelligence
Which of the following statements BEST highlights the difference between conventional and AI- based systems?
A
AI-based systems tend to be more suitable for critical tasks where explainability is crucial, while conventional systems are better at handling complex tasks in unregulated areas
B
AI-based systems tend to be faster than conventional systems because they do not rely on explicit programming, while conventional systems are slower due to rigid rule-based processing
C
AI-based systems tend to be deterministic and explainable, while conventional systems are probabilistic and often difficult to interpret
D
AI-based systems tend to learn from patterns in data and can adapt to new scenarios, while conventional systems follow predefined rules and produce consistent outputs for the same inputs
Correct Answer:
D :
AI-based systems tend to learn from patterns in data and can adapt to new scenarios, while conventional systems follow predefined rules and produce consistent outputs for the same inputs
This correctly differentiates AI-based systems from conventional systems. AI-based/ML systems learn patterns in the data, and many can adapt through self-learning, while conventional systems use predefined logic and produce predictable outputs. Note: this does not apply to rule-based expert systems. |
| question_th |
Q2:
Chapter: 1 - Introduction to Artificial Intelligence
Which TWO of the following statements BEST distinguish between narrow AI, general AI, and super AI?
A
Narrow AI operates independently of human input, general AI is used only in advanced robotics, and super AI improves human decision-making in specialized fields
B
Narrow AI is task-specific, general AI is a concept with limited real-world applications, and super AI describes cutting-edge generative AI models
C
Narrow AI is self-learning, general AI focuses on solving specialized problems, and super AI is limited to tasks defined during its development
D
Narrow AI performs specific tasks efficiently, general AI can handle a wide range of intellectual tasks like a human, and super AI surpasses human intelligence
Correct Answer:
D :
Narrow AI performs specific tasks efficiently, general AI can handle a wide range of intellectual tasks like a human, and super AI surpasses human intelligence
This accurately differentiates the three types of AI, highlighting narrow AI's task-specific nature, general AI's human-like versatility, and super AI's hypothetical nature and superior capabilities. |
| question_th |
Q3:
Chapter: 1 - Introduction to Artificial Intelligence
Which of the following statements BEST describes the relationship between AI, ML, and DL?
A
AI encompasses both ML and DL as distinct but important methodologies, working in parallel to solve complex problems
B
DL and ML are essentially interchangeable terms, while AI represents a separate approach to problem-solving
C
ML is a subset of AI, and DL is a further subset of ML, representing a hierarchy of specialized technologies
D
DL is the foundational technology, and both ML and AI are specialized applications built upon its principles.
Correct Answer:
C :
ML is a subset of AI, and DL is a further subset of ML, representing a hierarchy of specialized technologies
AI is the broadest term. It refers to the general concept of creating machines that can perform tasks that typically require human intelligence. This includes a wide range of approaches and techniques, not just machine learning. ML is a subset of AI. It's a specific approach to achieving AI where machines are given the ability to learn from data without being explicitly programmed. Instead of writing specific rules for every situation, ML algorithms learn patterns and make predictions based on the data. DL is a subset of machine learning. It's a specialized type of ML that utilizes artificial neural networks with multiple layers (hence "deep"). |
| question_th |
Q4:
Chapter: 1 - Introduction to Artificial Intelligence
Which of the following statements BEST explains generative AI?
A
It creates models that are limited to generating text and images and cannot be used for creative tasks like drug discovery or data simulation
B
It focuses mainly on improving existing content, such as text, images, or audio, by optimizing it for classification or prediction tasks
C
It is designed to analyze and understand existing data, such as text or images, allowing machines to categorize information and extract key insights
D
It creates new content, such as text, images, or audio, by learning patterns from training data and generating outputs similar in nature
Correct Answer:
D :
It creates new content, such as text, images, or audio, by learning patterns from training data and generating outputs similar in nature
This accurately describes generative AI, highlighting its ability to create new content based on learned patterns in training data. |
| question_th |
Q5:
Chapter: 1 - Introduction to Artificial Intelligence
Which of the following statements BEST compares the hardware options available for implementing machine learning systems?
A
AI-specific hardware, such as ASICs, is primarily used for training ML models, while GPUs are better suited for edge computing
B
GPUs are well-suited for training and running ML models due to their ability to handle massively parallel processing, while CPUs are better for general-purpose computing
C
Neuromorphic processors are a form of AI hardware specifically optimized for running on the von Neumann architecture
D
CPUs are more efficient than GPUs for ML tasks because they have faster clock speeds and can handle complex operations
Correct Answer:
B :
GPUs are well-suited for training and running ML models due to their ability to handle massively parallel processing, while CPUs are better for general-purpose computing
This comparison accurately highlights the advantages of GPUs over CPUs for machine learning tasks, due to their parallel processing capabilities, while CPUs excel in general-purpose computing. |
| question_th |
Q6:
Chapter: 1 - Introduction to Artificial Intelligence
Given the following statements about AI model development and hosting:
i. It achieves lower development costs by using public cloud resources, eliminating the need for local hardware investment ii. It uses local development of data preparation components for sensitive data for increased security before moving to the cloud for training of the full system iii. It results in lower costs because laptops are used for local development and there are low upfront hardware costs by hosting AI models on public clouds iv. It simplifies development and hosting by standardizing processes on local servers, removing the need for complex cloud-based configurations v. It guarantees the highest security by hosting AI models on private clouds, thereby avoiding the risks associated with local hardware vulnerabilities Which of the following BEST reflects advantages of using a hybrid approach for this development and hosting?
A
i, iv and v
B
iii and iv
C
ii and iii
D
i, ii and v
Correct Answer:
C :
ii and iii
Given the following statements about AI model development and hosting: |
| question_th |
Q7:
Chapter: 2 - Quality Characteristics for AI-Based Systems
Given the following ISO/IEC 25059 quality characteristics:
1. AI Robustness 2. User controllability 3. Functional adaptability 4. Intervenability And the following examples of quality characteristic measures: A. The success rate of a remote operator in forcing a drone into the safe-landing protocol when its AI navigation system exhibits hazardous behavior B. The average time required to successfully override a fraud management system’s automated decision to block a customer's transaction C. The F1-score of an object detection model in an autonomous car in heavy rain D. The time required for an e-commerce recommendation engine to update its suggestions to reflect a new, rapidly emerging fashion trend Which of the following BEST matches the quality characteristics with the example measures?
A
1C – 2B – 3D – 4A
B
1C – 2D – 3B – 4A
C
1A – 2D – 3C – 4B
D
1D – 2A – 3C – 4B
Correct Answer:
A :
1C – 2B – 3D – 4A
Considering each of the examples in turn: |
| question_th |
Q8:
Chapter: 2 - Quality Characteristics for AI-Based Systems
Which of the following statements BEST describes a key challenge when AI is used in safety- related systems?
A
The potential for non-determinism and self-learning makes some AI-based systems unpredictable as they diverge from their original tested state
B
The mature safety-related standards tend to be out-of-date and require the use of outdated AI technologies, which hinders innovative AI solutions in AI-based systems
C
When the requirements are too detailed, it leaves little room for the ML system to learn from the implicit goals contained within the training data
D
Because self-learning safety-related systems stop adapting after deployment, safety is often compromised by the need for manual updates of the operational AI-based system
Correct Answer:
A :
The potential for non-determinism and self-learning makes some AI-based systems unpredictable as they diverge from their original tested state
This unpredictability is a core safety challenge when deploying AI in safety-related systems, as it complicates verification, validation, and ongoing assurance of safe operation. |
| question_th |
Q9:
Chapter: 2 - Quality Characteristics for AI-Based Systems
Which of the following examples is LEAST likely to be a valid acceptance criterion for AI-specific quality characteristics defined in the ISO 25059 standard for an AI-based system?
A
When the analysis tool flags a retinal scan for severe diabetic retinopathy, it shall display a visual heatmap overlay on the image, highlighting the key features
B
The spam alert control system is easy for the user to set up and requires minimal technical expertise to maintain
C
The security guard in the museum control room can trigger an immediate 'all-stop' command that causes the patrol robot to cease all motion within 0.5 seconds to prevent collision with a sculpture
D
A greenhouse control system reacts within 20 minutes when the measured humidity is greater than 10% from the optimum humidity
Correct Answer:
B :
The spam alert control system is easy for the user to set up and requires minimal technical expertise to maintain
It is the LEAST likely option because it focuses on some of the sub-characteristics of usability, such as learnability and operability, which are generic quality characteristics applicable to most systems, and are not directly associated with the AI-specific characteristics in ISO 25059 (i.e. user controllability and transparency, which are sub- characteristics of usability for AI-based systems). The acceptance criterion is also more subjective than the other options and so more challenging to test. |
| question_th |
Q10:
Chapter: 3 - Machine Learning
Given the following forms of ML:
1. Clustering 2. Reinforcement learning 3. Classification 4. Regression And the following examples: A. The mobile game app updates its feedback, response timing, and the number of user options it provides based on how much the players spend B. The language translation app searches the internet to find text provided in multiple languages to improve its translation function C. A manufacturing company predicts when equipment is likely to fail based on sensor data and historical maintenance records D. A social network platform groups its users into communities based on their interactions with each other and their stated interests Which of the following BEST matches the examples with the forms of ML?
A
1C – 2D – 3A – 4B
B
1B – 2C – 3D – 4A
C
1D – 2C – 3B – 4A
D
1A – 2D – 3B – 4C
Correct Answer:
B :
1B – 2C – 3D – 4A
Considering the descriptions of example systems: |
| question_th |
Q11:
Chapter: 3 - Machine Learning
Given the following activities from the ML workflow:
1. Deploy the Model 2. Prepare & Test Data 3. Test the Model 4. Evaluate the Model And, given the following descriptions: A. Model performance is tested using validation data B. The origin of the test data used to test the model is identified C. Test data are used to verify the agreed performance criteria are met D. The model is tested on the target platform Which of the following BEST matches the descriptions with the activities in the ML workflow?
A
1D – 2B – 3C – 4A
B
1C – 2D – 3B – 4A
C
1D – 2B – 3A – 4C
D
1A – 2C – 3B – 4D
Correct Answer:
A :
1D – 2B – 3C – 4A
1. Model performance is tested using validation data This is part of the ‘Evaluate the Model’ activity. (D) |
| question_th |
Q12:
Chapter: 3 - Machine Learning
Which of the following statements about the use of pre-trained models is CORRECT?
A
To successfully adapt a pre-trained neural network, fine-tuning requires that additional training with new data is applied to all layers of the network
B
When applied to a neural network, the RAG approach works by adding additional layers that hold documentation specifically related to the prompt
C
Fine-tuning a biased LLM with high-quality, task-specific data prevents unfair outputs based on sensitive attributes
D
The RAG approach requires the identification and acquisition of data relevant to the task up front, but requires no changes to the pre-trained model
Correct Answer:
D :
The RAG approach requires the identification and acquisition of data relevant to the task up front, but requires no changes to the pre-trained model
RAG relies on curating and indexing relevant external data before use, but it does not require modifying the architecture or parameters of the pre-trained model itself; instead, it augments the model's inputs with retrieved information at runtime. |
| question_th |
Q13:
Chapter: 3 - Machine Learning
Which of the following statements BEST describes a key activity in data preparation for machine learning?
A
Data pre-processing includes augmentation and sampling, which either add or reduce the number of examples in the training data respectively
B
Feature engineering of the data is performed after an ML model has been trained to optimize its performance for real-world deployment
C
Exploratory data analysis (EDA) is a form of exploratory testing applied to the data preparation activities of acquisition, pre-processing and labelling
D
Data preparation comprises the identification and gathering of data from various sources, providing the algorithm with raw data to learn from
Correct Answer:
A :
Data pre-processing includes augmentation and sampling, which either add or reduce the number of examples in the training data respectively
This correctly identifies two common data pre-processing techniques: augmentation (increasing data by creating modified versions) and sampling (reducing data by selecting subsets). |
| question_th |
Q14:
Chapter: 3 - Machine Learning
Which of the following statements BEST contrasts the roles of training, validation, and test datasets in ML model development?
A
The training dataset is used to optimize hyperparameters, the validation dataset is used to tune predictions, and the test dataset is used to generate training data
B
The training dataset ensures the model generalizes well, the validation dataset is used to deploy the model, and the test dataset is used for initial evaluation
C
The training dataset is used for final model evaluation, the validation dataset ensures the model does not overfit, and the test dataset is used for tuning hyperparameters
D
The training dataset is used to create the model, the validation dataset is used to tune the model, and the test dataset evaluates its performance on unseen data
Correct Answer:
D :
The training dataset is used to create the model, the validation dataset is used to tune the model, and the test dataset evaluates its performance on unseen data
The training dataset is used to fit or create the model, the validation dataset is used to tune the model’s hyperparameters and prevent overfitting, and the test dataset is reserved for evaluating the model’s performance on unseen data. It accurately describes the distinct and sequential roles of each dataset in the ML workflow. |
| question_th |
Q15:
Chapter: 3 - Machine Learning
Consider the following confusion matrix for an image classifier: Which of the following options represents the CORRECT formula for calculating the precision of the classifier?
A
(22/100) *100
B
(78/120) *100
C
(20/120) *100
D
(78/100) *100
Correct Answer:
D :
(78/100) *100
The formula for Precision = TP/ (TP+FP) *100 = 78/(78+22) = 78/100 *100 |
| question_th |
Q16:
Chapter: 3 - Machine Learning
During the training of a deep neural network, the network produces an output, and the loss is calculated. Which of the following CORRECTLY describes the next step in the training process?
A
The same training data is passed through the network again to confirm the loss value
B
The network's hidden layers are reset with new random weight values
C
The weights and biases across the network are adjusted to reduce the calculated loss
D
The activation functions are altered to different non-linear formulas to find a better fit
Correct Answer:
C :
The weights and biases across the network are adjusted to reduce the calculated loss
This is the next step in the ML training loop. The loss is fed back through the network to adjust the values of the weights and biases. |
| question_th |
Q17:
Chapter: 4 - Testing AI-Based Systems
Given the following examples of AI-based systems:
i. A system that learns from real-time data to improve its failure predictions and automatically updates maintenance schedules ii. A spam filter for an email app, which identifies spam based on predefined rules iii. A recommendation engine on a streaming service that updates its suggestions based on a user’s changing viewing habits and preferences iv. A personal assistant that learns continuously from its user v. A rule-based system for medical diagnosis Which of the following BEST describes the systems which can be considered to be locked AI-based systems?
A
i, iii, and iv
B
ii and v
C
i, and iii
D
ii, iv, and v
Correct Answer:
B :
ii and v
Considering the provided example AI-based systems: |
| question_th |
Q18:
Chapter: 4 - Testing AI-Based Systems
Which of the following options BEST explains why a statistical approach is necessary when testing an AI-based system?
A
The system is trained on real-world data and therefore does not require a separate test dataset; instead, it requires statistical validation
B
A single test case is sufficient to determine if a model is well-calibrated, but only statistical methods can verify accuracy
C
The system is typically large and complex, making test automation impractical without the use of statistical sampling
D
The system exhibits non-deterministic behavior, requiring large and representative test datasets to draw meaningful conclusions
Correct Answer:
D :
The system exhibits non-deterministic behavior, requiring large and representative test datasets to draw meaningful conclusions
AI-based systems are non-deterministic and require extensive, representative test datasets to achieve statistical significance, especially in assessing functional correctness statistically. |
| question_th |
Q19:
Chapter: 4 - Testing AI-Based Systems
Why is setting a ‘seed’ only a LIMITED solution for addressing the test oracle problem in AI systems?
A
It introduces a high degree of subjectivity into the model's behavior, which complicates the test evaluation process.
B
It is a method that only functions for AI systems built from complete and detailed technical specifications
C
It ensures reproducibility for individual test runs but cannot alter the model's inherent probabilistic nature in production
D
Its success depends upon extensive consultation with domain experts to select the most appropriate seed value for the tests
Correct Answer:
C :
It ensures reproducibility for individual test runs but cannot alter the model's inherent probabilistic nature in production
The setting of a ‘seed’ is bound to a specific test execution and does not solve the fundamental probabilistic nature of the model when it operates in a real-world environment. |
| question_th |
Q20:
Chapter: 4 - Testing AI-Based Systems
Which of the following statements BEST describes a common approach to testing GenAI models?
A
Since GenAI models are probabilistic, formal testing is unnecessary because outputs will always vary and cannot be evaluated effectively
B
GenAI models are tested by verifying that their outputs exactly match predefined expected results
C
It involves manipulating diverse inputs and parameters and then assessing the output's adherence to rules, because a direct input-to-output match is not feasible
D
GenAI models are mainly tested through manual review, as automated testing does not apply to creative AI outputs
Correct Answer:
C :
It involves manipulating diverse inputs and parameters and then assessing the output's adherence to rules, because a direct input-to-output match is not feasible
Because of the variability and complexity in GenAI outputs, testing focuses on coherence, rules compliance, and plausibility, rather than matching fixed expected outputs. Diverse inputs, optional prompts, and parameters all influence the test results. |
| question_th |
Q21:
Chapter: 4 - Testing AI-Based Systems
Your organization plans to deploy a GenAI-powered legal document generator. During implementation planning, the security team argues for focusing red teaming efforts on preventing prompt injection attacks, while the compliance team wants to prioritize bias detection in legal advice generation. The development team suggests running red teaming after the system goes live to save time. Which of the following is the MOST effective red teaming implementation strategy for this scenario?
A
Focus on bias detection since legal advice accuracy is more critical than security concerns
B
Deploy immediately and conduct red teaming reactively based on incidents and user feedback
C
Use attack scenarios covering both security and bias vulnerabilities before deployment
D
Prioritize security vulnerabilities first, then address bias issues in a separate phase after deployment
Correct Answer:
C :
Use attack scenarios covering both security and bias vulnerabilities before deployment
Red teaming should be conducted before deployment, not after and multiple vulnerability types (security and bias) should be tested together. |
| question_th |
Q22:
Chapter: 4 - Testing AI-Based Systems
An ML-based weather prediction system provides excellent results for most locations; however, it has been noticed that the predictions for areas in the UK with an altitude greater than 1250 meters are regularly inaccurate. Which of the following test levels should have been performed MORE thoroughly?
A
ML model testing
B
Input data testing
C
Component integration testing
D
System testing
Correct Answer:
B :
Input data testing
Input data testing focuses on data quality and on the representativeness of data. It appears that areas in the UK with an altitude greater than 1,250 meters were not adequately represented in the training data. |
| question_th |
Q23:
Chapter: 4 - Testing AI-Based Systems
Which of the following statements about applying risk-based testing to ML systems is CORRECT?
A
Security and usability risks can apply to any system, while risks associated with data bias and ML model performance are specific to ML systems
B
Risk management in conventional systems is a static approach, while in a self-learning system risk needs to be adjusted dynamically
C
Conventional systems handle risks related to data bias, whereas ML systems focus on risks related to algorithmic bias
D
In conventional systems, functional correctness is the primary risk factor, whereas functional adaptability is the primary risk factor for an ML system
Correct Answer:
A :
Security and usability risks can apply to any system, while risks associated with data bias and ML model performance are specific to ML systems
Security and usability are generic quality characteristics and can apply to any system. Data bias and model performance are potential risks specific to ML systems. |
| question_th |
Q24:
Chapter: 5 - Input Data Testing for Machine Learning Systems
The team developing a new machine learning model has received a dataset from a new, unverified third-party vendor. They are uncertain about the origin of this dataset and concerned that the raw data may have been tampered with. Which of the following test approaches is MOST suitable for addressing this specific risk?
A
Data representativeness testing
B
Data provenance testing
C
Dataset constraint testing
D
Feature testing
Correct Answer:
B :
Data provenance testing
The scenario describes a need to verify the data's origin and to determine that it hasn't been tampered with. This maps to data provenance testing. |
| question_th |
Q25:
Chapter: 5 - Input Data Testing for Machine Learning Systems
A financial services company has developed an ML system for loan approval. A tester responsible for testing this system wants to determine if there is potential bias in the system related to factors such as gender and age attributes. Which of the following approaches would be MOST suitable for detecting bias in the training data early?
A
Performing disparate impact analysis using counterfactuals based on gender, age or both
B
Reviewing the overall ML workflow and data preparation processes to identify potential sources of bias introduction
C
Testing using a dataset that is representative and analyzing the predictions for statistically significant differences in outcomes across age and gender
D
Conducting static analysis of the model’s source code to identify how its operations on the age and gender related attributes could lead to bias
Correct Answer:
B :
Reviewing the overall ML workflow and data preparation processes to identify potential sources of bias introduction
This approach could help highlight where bias might be introduced, but it relies on process checks rather than actual evidence from the data. |
| question_th |
Q26:
Chapter: 5 - Input Data Testing for Machine Learning Systems
What is a KEY difference in the test strategy for a data pipeline built for training versus one designed for an operational system?
A
Testing a training pipeline would primarily focus on determining that data is handled correctly, while an operational pipeline's testing would emphasize high performance and reliability under load
B
Configuration management reviews would be critical for operational pipelines but are considered unnecessary for the less formal nature of exploratory training pipelines
C
Testing an operational pipeline would focus mainly on validating individual transformation scripts, whereas a training pipeline's testing would prioritize end-to-end system testing
D
A training pipeline would rely almost exclusively on fault injection and back-to-back tests, while an operational pipeline would be limited to unit and component integration testing
Correct Answer:
A :
Testing a training pipeline would primarily focus on determining that data is handled correctly, while an operational pipeline's testing would emphasize high performance and reliability under load
The purpose of the pipeline dictates the test strategy. Training pipelines prioritize generating high quality data, whereas live operational pipelines focus on non-functional aspects such as performance efficiency, scalability, and AI robustness. |
| question_th |
Q27:
Chapter: 5 - Input Data Testing for Machine Learning Systems
What is the purpose of creating a ‘reference dataset’ during data representativeness testing?
A
It is used to apply stratified sampling directly to the training data to verify that all subgroups are covered
B
It serves as a universally applicable dataset derived from trusted industry benchmarks
C
It functions as a quantitative baseline for formally comparing the statistical properties of the training data
D
It serves as the primary dataset for performing the final model validation and testing with high independence
Correct Answer:
C :
It functions as a quantitative baseline for formally comparing the statistical properties of the training data
The reference dataset’s core purpose is to provide a statistical baseline, allowing practitioners to objectively compare the distributions, feature correlations, and coverage of the training data against what is expected in real-world operational environments. |
| question_th |
Q28:
Chapter: 5 - Input Data Testing for Machine Learning Systems
You are testing an ML dataset for a banking loan approval system. The dataset contains the following attributes:
• applicant_id (integer, unique) • annual_income (decimal, in USD) • loan_amount (decimal, in USD) • credit_score (integer, 300-850) • employment_years (integer, 0-50) • monthly_payment (decimal, in USD) The following business rules apply: • Loan amount cannot exceed 5 times annual income • Monthly payment = loan_amount / 324 (for 30-year loans) • Credit scores must be within the standard range • All monetary values must be positive When applying dataset constraint testing to validate the ‘monthly_payment’ for 30-year loans, which type of constraint would be MOST appropriate?
A
Multi-value count constraint
B
Comparison correlate constraint,
C
Single-value range constraint
D
Multi-value duplicate constraint
Correct Answer:
B :
Comparison correlate constraint,
This scenario requires validating a mathematical relationship between two attributes (monthly_payment and loan_amount). |
| question_th |
Q29:
Chapter: 5 - Input Data Testing for Machine Learning Systems
Which of the following BEST explains the role of multiple annotations in data label correctness testing?
A
Multiple annotations point to defects in data points with high model loss
B
Multiple annotations can point to label defects based on the comparison of annotations
C
Multiple annotations compare label distributions across datasets to detect potential defects
D
Multiple annotations can be used to automate label verification to ensure consistency
Correct Answer:
B :
Multiple annotations can point to label defects based on the comparison of annotations
Multiple annotation involves data points being independently labeled by multiple annotators. Comparison of these labels leads to the discovery of disagreements among annotators, highlighting potential defects. |
| question_th |
Q30:
Chapter: 6 - Model Testing for Machine Learning Systems
Given the following test techniques and test types:
1. ML functional performance testing 2. Testing for bias 3. Adversarial testing 4. Drift testing And the following risks: A. The ML model might perform differently for different demographic groups B. Slightly modified inputs to the ML model might cause quite different and unexpected responses C. Predictions made by the ML model might be inaccurate in some cases D. ML model accuracy might have significantly decreased since it was deployed Which of the following options BEST matches the test techniques and test types with the risks?
A
1C – 2A – 3B – 4D
B
1A – 2D – 3C – 4B
C
1A – 2C – 3D – 4B
D
1B – 2A – 3C – 4D
Correct Answer:
A :
1C – 2A – 3B – 4D
Considering each of the risks in turn: |
| question_th |
Q31:
Chapter: 6 - Model Testing for Machine Learning Systems
Which of the following statements about the documentation of ML models is CORRECT?
A
Changes made by self-learning ML systems are fully documented at the time of the update to ensure up-to-date documentation
B
Information related to the speed of model prediction is a part of the documentation of an AI component
C
Bias-related characteristics of input data are challenging to assess as the model documentation doesn’t contain the source of the training data
D
Interaction of AI and non-AI components is within the scope of the model documentation
Correct Answer:
B :
Information related to the speed of model prediction is a part of the documentation of an AI component
This is a part of the non-functional requirement/documentation. |
| question_th |
Q32:
Chapter: 6 - Model Testing for Machine Learning Systems
Upon release, an ML model provided correct predictions fewer times than expected. Additional tests have been conducted, and the level of accuracy for these tests was measured at 83%. Previously, the ML model had achieved an accuracy of 83% ± 4% at a confidence level of 94%. Which of the following options is MOST likely to represent the current situation?
A
accuracy of 83% ± 4% at 92% confidence level
B
accuracy of 85% ± 4% at 94% confidence level
C
accuracy of 83% ± 2% at 94% confidence level
D
accuracy of 83% ± 6% at 94% confidence level
Correct Answer:
C :
accuracy of 83% ± 2% at 94% confidence level
When more tests are run for the same measured accuracy (83%), the margin of error typically decreases if the confidence level stays the same. This is because a larger sample size can reduce the margin of error and/or increase our statistical confidence in the measured value. If the ML model previously achieved 83% accuracy with a margin of error of ± 4% at a 94% confidence level, and additional tests were run while maintaining the same 83% accuracy measurement and maintaining the confidence level at 94%, this would result in a lower margin of error (e.g. ± 2%). This indicates we can be more certain that the true accuracy is indeed around 83%. |
| question_th |
Q33:
Chapter: 6 - Model Testing for Machine Learning Systems
Which of the following statements BEST summarizes adversarial testing of machine learning systems?
A
It verifies the machine learning system’s functionality by using previously working tests to avoid ML model failures during evaluation and tuning
B
It focuses on generating manual adversarial examples, without automation, to test the vulnerability of machine learning systems
C
It is a form of black-box testing, ignoring knowledge of the internals of the machine learning system to create adversarial examples
D
It identifies model vulnerabilities through adversarial examples, which are minimally perturbed inputs that induce misclassification
Correct Answer:
D :
It identifies model vulnerabilities through adversarial examples, which are minimally perturbed inputs that induce misclassification
In adversarial testing, the adversarial examples are often created to identify vulnerabilities by perturbing working inputs. |
| question_th |
Q34:
Chapter: 6 - Model Testing for Machine Learning Systems
An AI-based mobile phone search system provides a list of phones that it believes are most suitable for the user based on its knowledge of the user’s previous mobile phone usage and their specified preferences.
Metamorphic testing is being used with the following source test case: Which of the following options is MOST likely to be a valid list of recommended phones for the follow-up test cases?
A
T1: SnapHappy_X1, SnapHappy_M2, SnapHappy_M3, ClickNow_1000x, ClickNow_1000xs T2: SnapHappy_X1, SnapHappy_M2, SnapHappy_M3
B
T1: SnapHappy_M2, SnapHappy_M3, ClickNow_1000xs T2: SnapHappy_X1, ClickNow_1000x
C
T1: SnapHappy_X1, SnapHappy_M2 T2: ClickNow_1000x, ClickNow_1000xs
D
T1: SnapHappy_X1, SnapHappy_M2, SnapHappy_M3, ClickNow_1000x, ClickNow_1000xs T2: SnapHappy_X1, SnapHappy_M2, SnapHappy_M3, ClickNow_1000x, ClickNow_1000xs
Correct Answer:
B :
T1: SnapHappy_M2, SnapHappy_M3, ClickNow_1000xs T2: SnapHappy_X1, ClickNow_1000x
There is no overlap between the outputs of T1 and T2, and no camera is missing. |
| question_th |
Q35:
Chapter: 6 - Model Testing for Machine Learning Systems
An organization uses an ML model to predict customer churn but has no mechanism to get direct user feedback or track when customers leave. They want to test for drift by analyzing the data being fed into the live ML model. Which test type would be MOST appropriate in this situation and why?
A
Static drift testing, because it identifies drift by detecting changes in the data distributions without requiring ground truth
B
Dynamic drift testing, because comparing ML model predictions to actual results is the most direct way to measure performance degradation
C
Dynamic drift testing, because it can infer the current ground truth by analyzing the input data's statistical properties
D
Static drift testing, because it compares the ML model's live performance metrics against a predefined acceptance threshold
Correct Answer:
A :
Static drift testing, because it identifies drift by detecting changes in the data distributions without requiring ground truth
Because ground truth is unavailable, static drift testing, which doesn't require it, is the only viable option. |
| question_th |
Q36:
Chapter: 6 - Model Testing for Machine Learning Systems
When testing a trained ML model, the development team found that the model was highly accurate when evaluated using validation data but performed poorly with independent test data. Which of the following options is MOST likely to cause this situation?
A
Overfitting
B
Underfitting
C
Concept drift
D
Low acceptance criteria
Correct Answer:
A :
Overfitting
The poor performance on test data and good performance on validation data suggest overfitting. |
| question_th |
Q37:
Chapter: 6 - Model Testing for Machine Learning Systems
Which of the following statements BEST describes how A/B testing is used in the context of a machine learning system (MLS)?
A
A/B testing determines if an updated version of an MLS performs better than the previous version
B
A/B testing is primarily used to generate diverse test cases that cover all possible inputs for an MLS
C
A/B testing is used to verify that all components within an MLS interact correctly with each other
D
A/B testing focuses on analyzing the internal algorithm structure of an MLS to identify potential defects
Correct Answer:
A :
A/B testing determines if an updated version of an MLS performs better than the previous version
As stated in the syllabus, “Whenever the system is updated, A/B testing is used to test that the updated variant performs as well as, or better than, the previous variant.” This accurately describes how A/B testing is used in the context of ML systems. |
| question_th |
Q38:
Chapter: 6 - Model Testing for Machine Learning Systems
Which of the following descriptions of the creation of a pseudo-oracle used to support the back-to- back testing of an ML model is MOST likely to be CORRECT?
A
By fine-tuning the ML model that is being tested
B
By supplementing the ML model that is being tested with retrieval-augmented generation
C
By using a different ML development framework than the ML model that is being tested
D
By varying the hyperparameters used for training the ML model that is being tested
Correct Answer:
C :
By using a different ML development framework than the ML model that is being tested
Using a different ML development framework means using different libraries, potentially different underlying algorithms or implementations of algorithms, and a different ML development environment. This option maximizes the independence of the pseudo- oracle from the SUT, reducing the risk of shared defects and increasing the potential effectiveness of back-to-back testing. |
| question_th |
Q39:
Chapter: 7 - Machine Learning Development Testing
Which TWO of the following scenarios describe ML development risks that can be EFFECTIVELY mitigated by performing ML functional performance testing?
A
After a new installation, the development team needs a quick test to determine if the ML development framework’s essential services are running correctly
B
A team notices that their ML development framework becomes slow and even unresponsive when processing large batches of data
C
A new version of a core library used by the ML development framework appears to be causing the ML model to produce unexpected and inaccurate results
D
A project leader needs to decide between two potential algorithms based on their suitability for the project's goals
Correct Answer:
C :
A new version of a core library used by the ML development framework appears to be causing the ML model to produce unexpected and inaccurate results
This describes a potential used library defect. ML functional performance testing can evaluate model behavior and expose anomalies stemming from the use of a new library that is defective and causing a change in test results. |
| question_th |
Q40:
Chapter: 7 - Machine Learning Development Testing
Which of the following statements CORRECTLY describes how shadow testing differs from canary testing in the deployment of ML models?
A
Canary testing compares ML models running offline, while shadow testing is based on the use of live user data
B
Shadow testing affects the responses from real users, while canary testing does not involve real users
C
Shadow testing mirrors traffic without impacting users, while canary testing provides responses from the new ML model
D
Canary testing is used for performance testing, while shadow testing is mainly used for component integration testing
Correct Answer:
C :
Shadow testing mirrors traffic without impacting users, while canary testing provides responses from the new ML model
Shadow testing provides a risk-free way to see how a new ML model behaves with production data, whereas canary testing deliberately exposes a small subset of users to the new model to observe its real-world performance and impact. |