ISTQB Certified Tester AI Testing (CT-AI) v2.0 - ISTQB CT-AI v2.0 Sample Exam 1

question.id	Questions
question_th	Q1: Chapter: 1 - Introduction to Artificial Intelligence Which of the following statements BEST highlights the difference between conventional and AI- based systems? A AI-based systems tend to be faster than conventional systems because they do not rely on explicit programming, while conventional systems are slower due to rigid rule-based processing B AI-based systems tend to be deterministic and explainable, while conventional systems are probabilistic and often difficult to interpret C AI-based systems tend to be more suitable for critical tasks where explainability is crucial, while conventional systems are better at handling complex tasks in unregulated areas D AI-based systems tend to learn from patterns in data and can adapt to new scenarios, while conventional systems follow predefined rules and produce consistent outputs for the same inputs Correct Answer: D : AI-based systems tend to learn from patterns in data and can adapt to new scenarios, while conventional systems follow predefined rules and produce consistent outputs for the same inputs This correctly differentiates AI-based systems from conventional systems. AI-based/ML systems learn patterns in the data, and many can adapt through self-learning, while conventional systems use predefined logic and produce predictable outputs. Note: this does not apply to rule-based expert systems.
question_th	Q2: Chapter: 1 - Introduction to Artificial Intelligence Which TWO of the following statements BEST distinguish between narrow AI, general AI, and super AI? A Narrow AI operates independently of human input, general AI is used only in advanced robotics, and super AI improves human decision-making in specialized fields B Narrow AI is self-learning, general AI focuses on solving specialized problems, and super AI is limited to tasks defined during its development C Narrow AI performs specific tasks efficiently, general AI can handle a wide range of intellectual tasks like a human, and super AI surpasses human intelligence D Narrow AI is task-specific, general AI is a concept with limited real-world applications, and super AI describes cutting-edge generative AI models Correct Answer: C : Narrow AI performs specific tasks efficiently, general AI can handle a wide range of intellectual tasks like a human, and super AI surpasses human intelligence This accurately differentiates the three types of AI, highlighting narrow AI's task-specific nature, general AI's human-like versatility, and super AI's hypothetical nature and superior capabilities.
question_th	Q3: Chapter: 1 - Introduction to Artificial Intelligence Which of the following statements BEST describes the relationship between AI, ML, and DL? A ML is a subset of AI, and DL is a further subset of ML, representing a hierarchy of specialized technologies B DL and ML are essentially interchangeable terms, while AI represents a separate approach to problem-solving C DL is the foundational technology, and both ML and AI are specialized applications built upon its principles. D AI encompasses both ML and DL as distinct but important methodologies, working in parallel to solve complex problems Correct Answer: A : ML is a subset of AI, and DL is a further subset of ML, representing a hierarchy of specialized technologies AI is the broadest term. It refers to the general concept of creating machines that can perform tasks that typically require human intelligence. This includes a wide range of approaches and techniques, not just machine learning. ML is a subset of AI. It's a specific approach to achieving AI where machines are given the ability to learn from data without being explicitly programmed. Instead of writing specific rules for every situation, ML algorithms learn patterns and make predictions based on the data. DL is a subset of machine learning. It's a specialized type of ML that utilizes artificial neural networks with multiple layers (hence "deep").
question_th	Q4: Chapter: 1 - Introduction to Artificial Intelligence Which of the following statements BEST explains generative AI? A It focuses mainly on improving existing content, such as text, images, or audio, by optimizing it for classification or prediction tasks B It creates new content, such as text, images, or audio, by learning patterns from training data and generating outputs similar in nature C It is designed to analyze and understand existing data, such as text or images, allowing machines to categorize information and extract key insights D It creates models that are limited to generating text and images and cannot be used for creative tasks like drug discovery or data simulation Correct Answer: B : It creates new content, such as text, images, or audio, by learning patterns from training data and generating outputs similar in nature This accurately describes generative AI, highlighting its ability to create new content based on learned patterns in training data.
question_th	Q5: Chapter: 1 - Introduction to Artificial Intelligence Which of the following statements BEST compares the hardware options available for implementing machine learning systems? A Neuromorphic processors are a form of AI hardware specifically optimized for running on the von Neumann architecture B GPUs are well-suited for training and running ML models due to their ability to handle massively parallel processing, while CPUs are better for general-purpose computing C AI-specific hardware, such as ASICs, is primarily used for training ML models, while GPUs are better suited for edge computing D CPUs are more efficient than GPUs for ML tasks because they have faster clock speeds and can handle complex operations Correct Answer: B : GPUs are well-suited for training and running ML models due to their ability to handle massively parallel processing, while CPUs are better for general-purpose computing This comparison accurately highlights the advantages of GPUs over CPUs for machine learning tasks, due to their parallel processing capabilities, while CPUs excel in general-purpose computing.
question_th	Q6: Chapter: 1 - Introduction to Artificial Intelligence Given the following statements about AI model development and hosting: i. It achieves lower development costs by using public cloud resources, eliminating the need for local hardware investment ii. It uses local development of data preparation components for sensitive data for increased security before moving to the cloud for training of the full system iii. It results in lower costs because laptops are used for local development and there are low upfront hardware costs by hosting AI models on public clouds iv. It simplifies development and hosting by standardizing processes on local servers, removing the need for complex cloud-based configurations v. It guarantees the highest security by hosting AI models on private clouds, thereby avoiding the risks associated with local hardware vulnerabilities Which of the following BEST reflects advantages of using a hybrid approach for this development and hosting? A iii and iv B i, iv and v C ii and iii D i, ii and v Correct Answer: C : ii and iii Given the following statements about AI model development and hosting: i. It achieves lower development costs by using public cloud resources, eliminating the need for local hardware investment => This is not a hybrid approach as it only talks about development on the cloud. ii. It uses local development of data preparation components for sensitive data for increased security before moving to the cloud for training of the full system => This is hybrid as we have local development for data preparation and training on the cloud. iii. It results in lower costs because laptops are used for local development and there are low upfront hardware costs by hosting AI models on public clouds => This is hybrid as there is local development on laptops and hosting on public clouds. iv. It simplifies development and hosting by standardizing processes on local servers, removing the need for complex cloud-based configurations => This is not hybrid as everything is on local servers. v. It guarantees the highest security by hosting AI models on private clouds, thereby avoiding the risks associated with local hardware vulnerabilities => This is not hybrid as everything is on private clouds.
question_th	Q7: Chapter: 2 - Quality Characteristics for AI-Based Systems Given the following ISO/IEC 25059 quality characteristics: 1. AI Robustness 2. User controllability 3. Functional adaptability 4. Intervenability And the following examples of quality characteristic measures: A. The success rate of a remote operator in forcing a drone into the safe-landing protocol when its AI navigation system exhibits hazardous behavior B. The average time required to successfully override a fraud management system’s automated decision to block a customer's transaction C. The F1-score of an object detection model in an autonomous car in heavy rain D. The time required for an e-commerce recommendation engine to update its suggestions to reflect a new, rapidly emerging fashion trend Which of the following BEST matches the quality characteristics with the example measures? A 1A – 2D – 3C – 4B B 1C – 2B – 3D – 4A C 1C – 2D – 3B – 4A D 1D – 2A – 3C – 4B Correct Answer: B : 1C – 2B – 3D – 4A Considering each of the examples in turn: A. The success rate of a remote operator in forcing a drone into the safe- landing protocol when its AI navigation system exhibits hazardous behavior. This measure assesses the effectiveness of human intervention in the system's operation during hazardous AI behavior, aligning with the definition of intervenability: the ability to permit external intervention in automated processes. This matches with intervenability (4). B. The average time required to successfully override a fraud management system’s automated decision to block a customer's transaction. This measure is concerned with how quickly a user can override an automated system's decision and the degree to which a user can control or influence the system's actions. This is a direct indicator of user controllability (2). C. The F1-score of an object detection model in an autonomous car in heavy rain. The F1-score in adverse conditions (heavy rain) tests how well the model performs under challenging scenarios, which is a measure of AI robustness (1). D. The time required for an e-commerce recommendation engine to update its suggestions to reflect a new, rapidly emerging fashion trend. The speed at which a system adapts its recommendations to new trends reflects its ability to adapt its functionality to changing requirements, which is functional adaptability (3).
question_th	Q8: Chapter: 2 - Quality Characteristics for AI-Based Systems Which of the following statements BEST describes a key challenge when AI is used in safety- related systems? A The mature safety-related standards tend to be out-of-date and require the use of outdated AI technologies, which hinders innovative AI solutions in AI-based systems B When the requirements are too detailed, it leaves little room for the ML system to learn from the implicit goals contained within the training data C The potential for non-determinism and self-learning makes some AI-based systems unpredictable as they diverge from their original tested state D Because self-learning safety-related systems stop adapting after deployment, safety is often compromised by the need for manual updates of the operational AI-based system Correct Answer: C : The potential for non-determinism and self-learning makes some AI-based systems unpredictable as they diverge from their original tested state This unpredictability is a core safety challenge when deploying AI in safety-related systems, as it complicates verification, validation, and ongoing assurance of safe operation.
question_th	Q9: Chapter: 2 - Quality Characteristics for AI-Based Systems Which of the following examples is LEAST likely to be a valid acceptance criterion for AI-specific quality characteristics defined in the ISO 25059 standard for an AI-based system? A The spam alert control system is easy for the user to set up and requires minimal technical expertise to maintain B The security guard in the museum control room can trigger an immediate 'all-stop' command that causes the patrol robot to cease all motion within 0.5 seconds to prevent collision with a sculpture C A greenhouse control system reacts within 20 minutes when the measured humidity is greater than 10% from the optimum humidity D When the analysis tool flags a retinal scan for severe diabetic retinopathy, it shall display a visual heatmap overlay on the image, highlighting the key features Correct Answer: A : The spam alert control system is easy for the user to set up and requires minimal technical expertise to maintain It is the LEAST likely option because it focuses on some of the sub-characteristics of usability, such as learnability and operability, which are generic quality characteristics applicable to most systems, and are not directly associated with the AI-specific characteristics in ISO 25059 (i.e. user controllability and transparency, which are sub- characteristics of usability for AI-based systems). The acceptance criterion is also more subjective than the other options and so more challenging to test.
question_th	Q10: Chapter: 3 - Machine Learning Given the following forms of ML: 1. Clustering 2. Reinforcement learning 3. Classification 4. Regression And the following examples: A. The mobile game app updates its feedback, response timing, and the number of user options it provides based on how much the players spend B. The language translation app searches the internet to find text provided in multiple languages to improve its translation function C. A manufacturing company predicts when equipment is likely to fail based on sensor data and historical maintenance records D. A social network platform groups its users into communities based on their interactions with each other and their stated interests Which of the following BEST matches the examples with the forms of ML? A 1D – 2C – 3B – 4A B 1B – 2C – 3D – 4A C 1A – 2D – 3B – 4C D 1C – 2D – 3A – 4B Correct Answer: B : 1B – 2C – 3D – 4A Considering the descriptions of example systems: 1. Reinforcement learning - The amount spent can be considered the reward function for this system, with the system changing its behavior to increase the amount spent. (B) 2. Classification - The app utilizes text in what can be considered a source language and a corresponding ‘correct’ translation of this source, employing a form of supervised learning with no explicit reward function mentioned. Classification uses parallel training data—pairs of matching sentences in two languages (e.g., "Hello" in English and "Bonjour" in French)—where each pair has a label linking source to target text. The model learns mappings via supervised training, encoding input text features and predicting the correct translation output, minimizing errors across many such pairs to generalize to new sentences. (C) 3. Regression is used in this scenario to predict a continuous outcome (in this case, the time until equipment failure) based on a set of input variables (sensor data and historical maintenance records). By analyzing the relationship between these variables, the ML model can identify patterns and trends that indicate when equipment is at risk of failure. (D) 4. Clustering - By analyzing user interactions (e.g., likes, comments, shared content) and stated interests, the social network platform can identify patterns and similarities among calibera users. These patterns can then be used to group users into communities or "clusters" that share common interests and behaviors. (A)
question_th	Q11: Chapter: 3 - Machine Learning Given the following activities from the ML workflow: 1. Deploy the Model 2. Prepare & Test Data 3. Test the Model 4. Evaluate the Model And, given the following descriptions: A. Model performance is tested using validation data B. The origin of the test data used to test the model is identified C. Test data are used to verify the agreed performance criteria are met D. The model is tested on the target platform Which of the following BEST matches the descriptions with the activities in the ML workflow? A 1A – 2C – 3B – 4D B 1D – 2B – 3A – 4C C 1C – 2D – 3B – 4A D 1D – 2B – 3C – 4A Correct Answer: D : 1D – 2B – 3C – 4A 1. Model performance is tested using validation data This is part of the ‘Evaluate the Model’ activity. (D) 2. The origin of the test data used to test the model is identified This is part of the ‘Prepare & Test Data’ activity. (B) 3. Test data are used to determine that the agreed performance criteria are met This is part of the ‘Test the Model’ activity. (C) 4. The model is tested on the target platform This is part of the ‘Deploy the Model’ activity. (A)
question_th	Q12: Chapter: 3 - Machine Learning Which of the following statements about the use of pre-trained models is CORRECT? A To successfully adapt a pre-trained neural network, fine-tuning requires that additional training with new data is applied to all layers of the network B The RAG approach requires the identification and acquisition of data relevant to the task up front, but requires no changes to the pre-trained model C Fine-tuning a biased LLM with high-quality, task-specific data prevents unfair outputs based on sensitive attributes D When applied to a neural network, the RAG approach works by adding additional layers that hold documentation specifically related to the prompt Correct Answer: B : The RAG approach requires the identification and acquisition of data relevant to the task up front, but requires no changes to the pre-trained model RAG relies on curating and indexing relevant external data before use, but it does not require modifying the architecture or parameters of the pre-trained model itself; instead, it augments the model's inputs with retrieved information at runtime.
question_th	Q13: Chapter: 3 - Machine Learning Which of the following statements BEST describes a key activity in data preparation for machine learning? A Exploratory data analysis (EDA) is a form of exploratory testing applied to the data preparation activities of acquisition, pre-processing and labelling B Data preparation comprises the identification and gathering of data from various sources, providing the algorithm with raw data to learn from C Feature engineering of the data is performed after an ML model has been trained to optimize its performance for real-world deployment D Data pre-processing includes augmentation and sampling, which either add or reduce the number of examples in the training data respectively Correct Answer: D : Data pre-processing includes augmentation and sampling, which either add or reduce the number of examples in the training data respectively This correctly identifies two common data pre-processing techniques: augmentation (increasing data by creating modified versions) and sampling (reducing data by selecting subsets).
question_th	Q14: Chapter: 3 - Machine Learning Which of the following statements BEST contrasts the roles of training, validation, and test datasets in ML model development? A The training dataset is used to create the model, the validation dataset is used to tune the model, and the test dataset evaluates its performance on unseen data B The training dataset is used to optimize hyperparameters, the validation dataset is used to tune predictions, and the test dataset is used to generate training data C The training dataset is used for final model evaluation, the validation dataset ensures the model does not overfit, and the test dataset is used for tuning hyperparameters D The training dataset ensures the model generalizes well, the validation dataset is used to deploy the model, and the test dataset is used for initial evaluation Correct Answer: A : The training dataset is used to create the model, the validation dataset is used to tune the model, and the test dataset evaluates its performance on unseen data The training dataset is used to fit or create the model, the validation dataset is used to tune the model’s hyperparameters and prevent overfitting, and the test dataset is reserved for evaluating the model’s performance on unseen data. It accurately describes the distinct and sequential roles of each dataset in the ML workflow.
question_th	Q15: Chapter: 3 - Machine Learning Consider the following confusion matrix for an image classifier: Which of the following options represents the CORRECT formula for calculating the precision of the classifier? A (22/100) 100 B (78/100) 100 C (20/120) 100 D (78/120) 100 Correct Answer: B : (78/100) 100 The formula for Precision = TP/ (TP+FP) 100 = 78/(78+22) = 78/100 *100
question_th	Q16: Chapter: 3 - Machine Learning During the training of a deep neural network, the network produces an output, and the loss is calculated. Which of the following CORRECTLY describes the next step in the training process? A The network's hidden layers are reset with new random weight values B The activation functions are altered to different non-linear formulas to find a better fit C The same training data is passed through the network again to confirm the loss value D The weights and biases across the network are adjusted to reduce the calculated loss Correct Answer: D : The weights and biases across the network are adjusted to reduce the calculated loss This is the next step in the ML training loop. The loss is fed back through the network to adjust the values of the weights and biases.
question_th	Q17: Chapter: 4 - Testing AI-Based Systems Given the following examples of AI-based systems: i. A system that learns from real-time data to improve its failure predictions and automatically updates maintenance schedules ii. A spam filter for an email app, which identifies spam based on predefined rules iii. A recommendation engine on a streaming service that updates its suggestions based on a user’s changing viewing habits and preferences iv. A personal assistant that learns continuously from its user v. A rule-based system for medical diagnosis Which of the following BEST describes the systems which can be considered to be locked AI-based systems? A ii, iv, and v B i, iii, and iv C i, and iii D ii and v Correct Answer: D : ii and v Considering the provided example AI-based systems: i. A system that learns from real-time data to improve its failure predictions and automatically updates maintenance schedules. This is a form of adaptive AI-based system that changes the schedules based on real-time data. ii. A spam filter in an email app, which identifies spam based on predefined rules. This is a form of locked AI-based system, which generates predictable test results, as it is based on following predefined spam identification rules that will not be changed until the rule-based model is rebuilt. iii. A recommendation engine on a streaming service that updates its suggestions based on a user’s changing viewing habits and preferences. This is a form of adaptive system that adapts based on the user’s viewing habits and preferences. iv. A personal assistant that learns from its user. This is an adaptive AI-based system because it adapts its parameters based on users’ interactions with the system. v. A rule-based system for medical diagnosis. This is a form of a locked AI-based system, which will not change the domain rules until the rule-based model is rebuilt.
question_th	Q18: Chapter: 4 - Testing AI-Based Systems Which of the following options BEST explains why a statistical approach is necessary when testing an AI-based system? A The system is typically large and complex, making test automation impractical without the use of statistical sampling B The system exhibits non-deterministic behavior, requiring large and representative test datasets to draw meaningful conclusions C The system is trained on real-world data and therefore does not require a separate test dataset; instead, it requires statistical validation D A single test case is sufficient to determine if a model is well-calibrated, but only statistical methods can verify accuracy Correct Answer: B : The system exhibits non-deterministic behavior, requiring large and representative test datasets to draw meaningful conclusions AI-based systems are non-deterministic and require extensive, representative test datasets to achieve statistical significance, especially in assessing functional correctness statistically.
question_th	Q19: Chapter: 4 - Testing AI-Based Systems Why is setting a ‘seed’ only a LIMITED solution for addressing the test oracle problem in AI systems? A It ensures reproducibility for individual test runs but cannot alter the model's inherent probabilistic nature in production B Its success depends upon extensive consultation with domain experts to select the most appropriate seed value for the tests C It is a method that only functions for AI systems built from complete and detailed technical specifications D It introduces a high degree of subjectivity into the model's behavior, which complicates the test evaluation process. Correct Answer: A : It ensures reproducibility for individual test runs but cannot alter the model's inherent probabilistic nature in production The setting of a ‘seed’ is bound to a specific test execution and does not solve the fundamental probabilistic nature of the model when it operates in a real-world environment.
question_th	Q20: Chapter: 4 - Testing AI-Based Systems Which of the following statements BEST describes a common approach to testing GenAI models? A Since GenAI models are probabilistic, formal testing is unnecessary because outputs will always vary and cannot be evaluated effectively B GenAI models are tested by verifying that their outputs exactly match predefined expected results C It involves manipulating diverse inputs and parameters and then assessing the output's adherence to rules, because a direct input-to-output match is not feasible D GenAI models are mainly tested through manual review, as automated testing does not apply to creative AI outputs Correct Answer: C : It involves manipulating diverse inputs and parameters and then assessing the output's adherence to rules, because a direct input-to-output match is not feasible Because of the variability and complexity in GenAI outputs, testing focuses on coherence, rules compliance, and plausibility, rather than matching fixed expected outputs. Diverse inputs, optional prompts, and parameters all influence the test results.
question_th	Q21: Chapter: 4 - Testing AI-Based Systems Your organization plans to deploy a GenAI-powered legal document generator. During implementation planning, the security team argues for focusing red teaming efforts on preventing prompt injection attacks, while the compliance team wants to prioritize bias detection in legal advice generation. The development team suggests running red teaming after the system goes live to save time. Which of the following is the MOST effective red teaming implementation strategy for this scenario? A Use attack scenarios covering both security and bias vulnerabilities before deployment B Prioritize security vulnerabilities first, then address bias issues in a separate phase after deployment C Focus on bias detection since legal advice accuracy is more critical than security concerns D Deploy immediately and conduct red teaming reactively based on incidents and user feedback Correct Answer: A : Use attack scenarios covering both security and bias vulnerabilities before deployment Red teaming should be conducted before deployment, not after and multiple vulnerability types (security and bias) should be tested together.
question_th	Q22: Chapter: 4 - Testing AI-Based Systems An ML-based weather prediction system provides excellent results for most locations; however, it has been noticed that the predictions for areas in the UK with an altitude greater than 1250 meters are regularly inaccurate. Which of the following test levels should have been performed MORE thoroughly? A Input data testing B ML model testing C System testing D Component integration testing Correct Answer: A : Input data testing Input data testing focuses on data quality and on the representativeness of data. It appears that areas in the UK with an altitude greater than 1,250 meters were not adequately represented in the training data.
question_th	Q23: Chapter: 4 - Testing AI-Based Systems Which of the following statements about applying risk-based testing to ML systems is CORRECT? A Security and usability risks can apply to any system, while risks associated with data bias and ML model performance are specific to ML systems B Risk management in conventional systems is a static approach, while in a self-learning system risk needs to be adjusted dynamically C Conventional systems handle risks related to data bias, whereas ML systems focus on risks related to algorithmic bias D In conventional systems, functional correctness is the primary risk factor, whereas functional adaptability is the primary risk factor for an ML system Correct Answer: A : Security and usability risks can apply to any system, while risks associated with data bias and ML model performance are specific to ML systems Security and usability are generic quality characteristics and can apply to any system. Data bias and model performance are potential risks specific to ML systems.
question_th	Q24: Chapter: 5 - Input Data Testing for Machine Learning Systems The team developing a new machine learning model has received a dataset from a new, unverified third-party vendor. They are uncertain about the origin of this dataset and concerned that the raw data may have been tampered with. Which of the following test approaches is MOST suitable for addressing this specific risk? A Feature testing B Data representativeness testing C Data provenance testing D Dataset constraint testing Correct Answer: C : Data provenance testing The scenario describes a need to verify the data's origin and to determine that it hasn't been tampered with. This maps to data provenance testing.
question_th	Q25: Chapter: 5 - Input Data Testing for Machine Learning Systems A financial services company has developed an ML system for loan approval. A tester responsible for testing this system wants to determine if there is potential bias in the system related to factors such as gender and age attributes. Which of the following approaches would be MOST suitable for detecting bias in the training data early? A Conducting static analysis of the model’s source code to identify how its operations on the age and gender related attributes could lead to bias B Testing using a dataset that is representative and analyzing the predictions for statistically significant differences in outcomes across age and gender C Performing disparate impact analysis using counterfactuals based on gender, age or both D Reviewing the overall ML workflow and data preparation processes to identify potential sources of bias introduction Correct Answer: D : Reviewing the overall ML workflow and data preparation processes to identify potential sources of bias introduction This approach could help highlight where bias might be introduced, but it relies on process checks rather than actual evidence from the data.
question_th	Q26: Chapter: 5 - Input Data Testing for Machine Learning Systems What is a KEY difference in the test strategy for a data pipeline built for training versus one designed for an operational system? A A training pipeline would rely almost exclusively on fault injection and back-to-back tests, while an operational pipeline would be limited to unit and component integration testing B Configuration management reviews would be critical for operational pipelines but are considered unnecessary for the less formal nature of exploratory training pipelines C Testing a training pipeline would primarily focus on determining that data is handled correctly, while an operational pipeline's testing would emphasize high performance and reliability under load D Testing an operational pipeline would focus mainly on validating individual transformation scripts, whereas a training pipeline's testing would prioritize end-to-end system testing Correct Answer: C : Testing a training pipeline would primarily focus on determining that data is handled correctly, while an operational pipeline's testing would emphasize high performance and reliability under load The purpose of the pipeline dictates the test strategy. Training pipelines prioritize generating high quality data, whereas live operational pipelines focus on non-functional aspects such as performance efficiency, scalability, and AI robustness.
question_th	Q27: Chapter: 5 - Input Data Testing for Machine Learning Systems What is the purpose of creating a ‘reference dataset’ during data representativeness testing? A It serves as a universally applicable dataset derived from trusted industry benchmarks B It functions as a quantitative baseline for formally comparing the statistical properties of the training data C It is used to apply stratified sampling directly to the training data to verify that all subgroups are covered D It serves as the primary dataset for performing the final model validation and testing with high independence Correct Answer: B : It functions as a quantitative baseline for formally comparing the statistical properties of the training data The reference dataset’s core purpose is to provide a statistical baseline, allowing practitioners to objectively compare the distributions, feature correlations, and coverage of the training data against what is expected in real-world operational environments.
question_th	Q28: Chapter: 5 - Input Data Testing for Machine Learning Systems You are testing an ML dataset for a banking loan approval system. The dataset contains the following attributes: • applicant_id (integer, unique) • annual_income (decimal, in USD) • loan_amount (decimal, in USD) • credit_score (integer, 300-850) • employment_years (integer, 0-50) • monthly_payment (decimal, in USD) The following business rules apply: • Loan amount cannot exceed 5 times annual income • Monthly payment = loan_amount / 324 (for 30-year loans) • Credit scores must be within the standard range • All monetary values must be positive When applying dataset constraint testing to validate the ‘monthly_payment’ for 30-year loans, which type of constraint would be MOST appropriate? A Single-value range constraint B Multi-value duplicate constraint C Multi-value count constraint D Comparison correlate constraint, Correct Answer: D : Comparison correlate constraint, This scenario requires validating a mathematical relationship between two attributes (monthly_payment and loan_amount). The correlate constraint is specifically designed to check that values for one attribute correlate with values for another attribute according to a defined rule - in this case, that monthly_payment equals loan_amount/324.
question_th	Q29: Chapter: 5 - Input Data Testing for Machine Learning Systems Which of the following BEST explains the role of multiple annotations in data label correctness testing? A Multiple annotations point to defects in data points with high model loss B Multiple annotations compare label distributions across datasets to detect potential defects C Multiple annotations can be used to automate label verification to ensure consistency D Multiple annotations can point to label defects based on the comparison of annotations Correct Answer: D : Multiple annotations can point to label defects based on the comparison of annotations Multiple annotation involves data points being independently labeled by multiple annotators. Comparison of these labels leads to the discovery of disagreements among annotators, highlighting potential defects.
question_th	Q30: Chapter: 6 - Model Testing for Machine Learning Systems Given the following test techniques and test types: 1. ML functional performance testing 2. Testing for bias 3. Adversarial testing 4. Drift testing And the following risks: A. The ML model might perform differently for different demographic groups B. Slightly modified inputs to the ML model might cause quite different and unexpected responses C. Predictions made by the ML model might be inaccurate in some cases D. ML model accuracy might have significantly decreased since it was deployed Which of the following options BEST matches the test techniques and test types with the risks? A 1C – 2A – 3B – 4D B 1B – 2A – 3C – 4D C 1A – 2D – 3C – 4B D 1A – 2C – 3D – 4B Correct Answer: A : 1C – 2A – 3B – 4D Considering each of the risks in turn: A. The ML model might perform differently for different demographic groups. This is best matched with Testing for Bias, which explicitly examines fairness across different populations. (2) B. Slightly modified inputs to the ML model might cause quite different and unexpected responses. This pairs correctly with Adversarial Testing, which deliberately tests an ML model's ability to handle small input perturbations. (3) C. Predictions made by the ML model might be inaccurate in some cases. This is best addressed by ML Functional Performance Testing, which directly evaluates whether the ML model produces accurate outputs. (1) D. ML model accuracy might have significantly decreased since it was deployed. This is appropriately matched with Drift Testing, which monitors performance changes over time in production. (4)
question_th	Q31: Chapter: 6 - Model Testing for Machine Learning Systems Which of the following statements about the documentation of ML models is CORRECT? A Changes made by self-learning ML systems are fully documented at the time of the update to ensure up-to-date documentation B Interaction of AI and non-AI components is within the scope of the model documentation C Bias-related characteristics of input data are challenging to assess as the model documentation doesn’t contain the source of the training data D Information related to the speed of model prediction is a part of the documentation of an AI component Correct Answer: D : Information related to the speed of model prediction is a part of the documentation of an AI component This is a part of the non-functional requirement/documentation.
question_th	Q32: Chapter: 6 - Model Testing for Machine Learning Systems Upon release, an ML model provided correct predictions fewer times than expected. Additional tests have been conducted, and the level of accuracy for these tests was measured at 83%. Previously, the ML model had achieved an accuracy of 83% ± 4% at a confidence level of 94%. Which of the following options is MOST likely to represent the current situation? A accuracy of 83% ± 6% at 94% confidence level B accuracy of 85% ± 4% at 94% confidence level C accuracy of 83% ± 4% at 92% confidence level D accuracy of 83% ± 2% at 94% confidence level Correct Answer: D : accuracy of 83% ± 2% at 94% confidence level When more tests are run for the same measured accuracy (83%), the margin of error typically decreases if the confidence level stays the same. This is because a larger sample size can reduce the margin of error and/or increase our statistical confidence in the measured value. If the ML model previously achieved 83% accuracy with a margin of error of ± 4% at a 94% confidence level, and additional tests were run while maintaining the same 83% accuracy measurement and maintaining the confidence level at 94%, this would result in a lower margin of error (e.g. ± 2%). This indicates we can be more certain that the true accuracy is indeed around 83%.
question_th	Q33: Chapter: 6 - Model Testing for Machine Learning Systems Which of the following statements BEST summarizes adversarial testing of machine learning systems? A It focuses on generating manual adversarial examples, without automation, to test the vulnerability of machine learning systems B It is a form of black-box testing, ignoring knowledge of the internals of the machine learning system to create adversarial examples C It identifies model vulnerabilities through adversarial examples, which are minimally perturbed inputs that induce misclassification D It verifies the machine learning system’s functionality by using previously working tests to avoid ML model failures during evaluation and tuning Correct Answer: C : It identifies model vulnerabilities through adversarial examples, which are minimally perturbed inputs that induce misclassification In adversarial testing, the adversarial examples are often created to identify vulnerabilities by perturbing working inputs.
question_th	Q34: Chapter: 6 - Model Testing for Machine Learning Systems An AI-based mobile phone search system provides a list of phones that it believes are most suitable for the user based on its knowledge of the user’s previous mobile phone usage and their specified preferences. Metamorphic testing is being used with the following source test case: Which of the following options is MOST likely to be a valid list of recommended phones for the follow-up test cases? A T1: SnapHappy_X1, SnapHappy_M2, SnapHappy_M3, ClickNow_1000x, ClickNow_1000xs T2: SnapHappy_X1, SnapHappy_M2, SnapHappy_M3, ClickNow_1000x, ClickNow_1000xs B T1: SnapHappy_X1, SnapHappy_M2 T2: ClickNow_1000x, ClickNow_1000xs C T1: SnapHappy_X1, SnapHappy_M2, SnapHappy_M3, ClickNow_1000x, ClickNow_1000xs T2: SnapHappy_X1, SnapHappy_M2, SnapHappy_M3 D T1: SnapHappy_M2, SnapHappy_M3, ClickNow_1000xs T2: SnapHappy_X1, ClickNow_1000x Correct Answer: D : T1: SnapHappy_M2, SnapHappy_M3, ClickNow_1000xs T2: SnapHappy_X1, ClickNow_1000x There is no overlap between the outputs of T1 and T2, and no camera is missing.
question_th	Q35: Chapter: 6 - Model Testing for Machine Learning Systems An organization uses an ML model to predict customer churn but has no mechanism to get direct user feedback or track when customers leave. They want to test for drift by analyzing the data being fed into the live ML model. Which test type would be MOST appropriate in this situation and why? A Dynamic drift testing, because it can infer the current ground truth by analyzing the input data's statistical properties B Dynamic drift testing, because comparing ML model predictions to actual results is the most direct way to measure performance degradation C Static drift testing, because it compares the ML model's live performance metrics against a predefined acceptance threshold D Static drift testing, because it identifies drift by detecting changes in the data distributions without requiring ground truth Correct Answer: D : Static drift testing, because it identifies drift by detecting changes in the data distributions without requiring ground truth Because ground truth is unavailable, static drift testing, which doesn't require it, is the only viable option.
question_th	Q36: Chapter: 6 - Model Testing for Machine Learning Systems When testing a trained ML model, the development team found that the model was highly accurate when evaluated using validation data but performed poorly with independent test data. Which of the following options is MOST likely to cause this situation? A Underfitting B Overfitting C Concept drift D Low acceptance criteria Correct Answer: B : Overfitting The poor performance on test data and good performance on validation data suggest overfitting.
question_th	Q37: Chapter: 6 - Model Testing for Machine Learning Systems Which of the following statements BEST describes how A/B testing is used in the context of a machine learning system (MLS)? A A/B testing determines if an updated version of an MLS performs better than the previous version B A/B testing is primarily used to generate diverse test cases that cover all possible inputs for an MLS C A/B testing is used to verify that all components within an MLS interact correctly with each other D A/B testing focuses on analyzing the internal algorithm structure of an MLS to identify potential defects Correct Answer: A : A/B testing determines if an updated version of an MLS performs better than the previous version As stated in the syllabus, “Whenever the system is updated, A/B testing is used to test that the updated variant performs as well as, or better than, the previous variant.” This accurately describes how A/B testing is used in the context of ML systems.
question_th	Q38: Chapter: 6 - Model Testing for Machine Learning Systems Which of the following descriptions of the creation of a pseudo-oracle used to support the back-to- back testing of an ML model is MOST likely to be CORRECT? A By using a different ML development framework than the ML model that is being tested B By fine-tuning the ML model that is being tested C By supplementing the ML model that is being tested with retrieval-augmented generation D By varying the hyperparameters used for training the ML model that is being tested Correct Answer: A : By using a different ML development framework than the ML model that is being tested Using a different ML development framework means using different libraries, potentially different underlying algorithms or implementations of algorithms, and a different ML development environment. This option maximizes the independence of the pseudo- oracle from the SUT, reducing the risk of shared defects and increasing the potential effectiveness of back-to-back testing.
question_th	Q39: Chapter: 7 - Machine Learning Development Testing Which TWO of the following scenarios describe ML development risks that can be EFFECTIVELY mitigated by performing ML functional performance testing? A A new version of a core library used by the ML development framework appears to be causing the ML model to produce unexpected and inaccurate results B A project leader needs to decide between two potential algorithms based on their suitability for the project's goals C After a new installation, the development team needs a quick test to determine if the ML development framework’s essential services are running correctly D A team notices that their ML development framework becomes slow and even unresponsive when processing large batches of data Correct Answer: A : A new version of a core library used by the ML development framework appears to be causing the ML model to produce unexpected and inaccurate results This describes a potential used library defect. ML functional performance testing can evaluate model behavior and expose anomalies stemming from the use of a new library that is defective and causing a change in test results.
question_th	Q40: Chapter: 7 - Machine Learning Development Testing Which of the following statements CORRECTLY describes how shadow testing differs from canary testing in the deployment of ML models? A Canary testing compares ML models running offline, while shadow testing is based on the use of live user data B Shadow testing affects the responses from real users, while canary testing does not involve real users C Shadow testing mirrors traffic without impacting users, while canary testing provides responses from the new ML model D Canary testing is used for performance testing, while shadow testing is mainly used for component integration testing Correct Answer: C : Shadow testing mirrors traffic without impacting users, while canary testing provides responses from the new ML model Shadow testing provides a risk-free way to see how a new ML model behaves with production data, whereas canary testing deliberately exposes a small subset of users to the new model to observe its real-world performance and impact.