๐ŸŽ„

CertoMetrics - 9% OFF Special Discount Offer - Ends In:

0d 00h 00m 00s
Coupon code: SALE2026

Amazon AWS Certified Machine Learning - Specialty (MLS-C01)

Get full access to the updated question bank and confidently prepare for your exam.

Vendor

Amazon

Certification

Specialty Certifications

Content

390 Qs

Status

Verified

Updated

3 days ago

Test the Practice Engine

Experience our interactive testing environment with free demo questions

Launch Free Demo
Best Value Bundle

Premium Bundle

Complete Success Suite

$103 $59

Save $44 Instantly

  • โœ“
    Full PDF + Interactive Engine Everything you need to pass
  • โœ“
    All Advanced Question Types Drag & Drop, Hotspots, Case Studies
  • โœ“
    Priority 24/7 Expert Support Direct line to certification leads
  • โœ“
    90 Days Free Priority Updates Stay current as exams change

Success Metric

98.4% Pass Rate

Verified by 15k+ Students
Secure Checkout
Popular

Standard Simulation

Practice Engine

$54

One-Time Payment

  • Web-Based (Zero Install)
  • Real Testing Environment Virtual & Practice Modes
  • Interactive Engine Drag & Drop, Hotspots
  • 60 Days Free Updates

Compatible with All Devices

Chrome
Verified Secure Checkout

Basic Tier

PDF Study Guide

$49

Digital Access

  • โœ“ Exam Questions (PDF)
  • โœ“ Mobile Friendly
  • โœ“ 60 Days Updates
Download Free Sample PDF

Verified 78-Question Preview (MLS-C01)

Secure Checkout

Verified Community

The CertoMetrics Standard.

Recommend the #1 platform for verified Amazon certification resources.

Success Network

Help a Colleague Succeed.

Invite a peer to get their own updated MLS-C01 prep kit.

Exam Overview

The AWS Certified Machine Learning - Specialty certification is a rigorous validation of your expertise in designing, implementing, deploying, and maintaining machine learning solutions on the Amazon Web Services platform. Achieving this credential signifies a deep understanding of core machine learning concepts, AWS ML services like Amazon SageMaker, and the ability to solve complex business problems using AI/ML. This certification not only enhances your professional credibility and marketability but also empowers you to drive innovation, optimize workflows, and deliver significant business value by leveraging AWS's comprehensive suite of machine learning tools. It's a testament to your capability in operationalizing ML models from data preparation to production deployment.

Questions

65

Passing Score

750/1000

Duration

170 Minutes

Difficulty

Expert

Level

Specialist

Skills Measured

Data Engineering: Ability to design and implement robust data ingestion, transformation, and storage solutions for machine learning workflows using AWS services.
Exploratory Data Analysis (EDA): Expertise in performing statistical analysis, data visualization, and feature engineering to prepare data for model training.
Modeling: Proficiency in selecting appropriate ML algorithms, training, tuning, and evaluating models for various problem types using Amazon SageMaker and other services.
Machine Learning Implementation and Operations (MLOps): Skills in deploying, monitoring, scaling, and managing ML models in production environments, ensuring reliability and performance.
Security and Cost Optimization: Understanding of best practices for securing ML workloads, ensuring data privacy, and optimizing the cost of AWS ML services.

Career Path

Target Roles

Machine Learning Engineer Data Scientist Solutions Architect (with ML focus)

Common Questions

Is the material up to date?

Yes. We update our question bank weekly to match the latest Amazon standards. You get free updates for 90 days.

What format do I get?

You get instant access to both the **PDF** (for reading) and our **Premium Test Engine** (for exam simulation).

Is there a guarantee?

Absolutely. If you fail the MLS-C01 exam using our materials, we offer a full money-back guarantee.

When do I get the download?

Instantly. The download link is available in your dashboard immediately after payment is confirmed.

Free Study Guide Samples

Previewing updated MLS-C01 bank (78 Questions).

QUESTION 1

An interactive online dictionary wants to add a widget that displays words used in similar contexts. A Machine Learning Specialist is asked to provide word features for the downstream nearest neighbor model powering the widget.

What should the Specialist do to meet these requirements?

A
Create one-hot word encoding vectors.
B
Produce a set of synonyms for every word using Amazon Mechanical Turk.
C
Create word embedding vectors that store edit distance with every other word.
D
Download word embeddings pre-trained on a large corpus.

Correct Option: D

โœ… Download word embeddings pre-trained on a large corpus.
Description: Pre-trained word embeddings are vector representations of words that have been learned from massive text corpora (e.g., Wikipedia, Common Crawl, Google News). These embeddings, such as Word2Vec, GloVe, or fastText, capture semantic relationships and syntactic regularities between words based on their co-occurrence patterns in the training data. Each word is mapped to a vector in a continuous vector space, where words with similar meanings are located closer together.

Why this fits: For most machine learning tasks involving natural language processing (NLP), especially when starting a new project or working with limited domain-specific data, downloading pre-trained word embeddings is the most efficient and effective approach. Training custom word embeddings from scratch requires a very large amount of text data and significant computational resources, which is often not feasible. Pre-trained embeddings provide a strong baseline, generalize well across many tasks, and save considerable time and effort. Options A and C describe methods for creating embeddings but are generally less efficient than leveraging pre-trained models, while Option B is related to generating synonyms, not directly creating word embeddings for semantic representation.

Example: An NLP engineer is building a sentiment analysis model for customer reviews. Instead of training word embeddings from scratch on their relatively small dataset of reviews, they can download pre-trained GloVe embeddings. They then use these pre-trained embeddings as the initial layer in their neural network, fine-tuning them slightly during the training process with their review data to adapt them to the specific domain.



QUESTION 2

A data engineer is preparing a dataset that a retail company will use to predict the number of visitors to stores. The data engineer created an Amazon S3 bucket. The engineer subscribed the S3 bucket to an AWS Data Exchange data product for general economic indicators. The data engineer wants to join the economic indicator data to an existing table in Amazon Athena to merge with the business data. All these transformations must finish running in 30-60 minutes.

Which solution will meet these requirements MOST cost-effectively?

A
Configure the AWS Data Exchange product as a producer for an Amazon Kinesis data stream. Use an Amazon Kinesis Data Firehose delivery stream to transfer the data to Amazon S3. Run an AWS Glue job that will merge the existing business data with the Athena table. Write the result set back to Amazon S3.
B
Use an S3 event on the AWS Data Exchange S3 bucket to invoke an AWS Lambda function. Program the Lambda function to use Amazon SageMaker Data Wrangler to merge the existing business data with the Athena table. Write the result set back to Amazon S3.
C
Use an S3 event on the AWS Data Exchange S3 bucket to invoke an AWS Lambda function. Program the Lambda function to run an AWS Glue job that will merge the existing business data with the Athena table. Write the results back to Amazon S3.
D
Provision an Amazon Redshift cluster. Subscribe to the AWS Data Exchange product and use the product to create an Amazon Redshift table. Merge the data in Amazon Redshift. Write the results back to Amazon S3.

Correct Option: C

โœ… Use an S3 event on the AWS Data Exchange S3 bucket to invoke an AWS Lambda function. Program the Lambda function to run an AWS Glue job that will merge the existing business data with the Athena table. Write the results back to Amazon S3.
Description: This solution outlines an event-driven, serverless data integration and transformation pipeline. AWS Data Exchange provides data products, typically delivering data files to an Amazon S3 bucket within the subscriber's account. An S3 event notification can then automatically trigger an AWS Lambda function when new data arrives. Lambda acts as an orchestrator, initiating an AWS Glue job. AWS Glue is a fully managed extract, transform, and load (ETL) service that can process large datasets, integrating new data with existing business data often cataloged in the AWS Glue Data Catalog and queryable via Amazon Athena. The processed and merged data is then written back to Amazon S3.

Why this fits: This approach is highly scalable, cost-effective, and fits a common data lake pattern.

  1. Event-Driven Ingestion: S3 event notifications provide an immediate, automated trigger when new data from AWS Data Exchange lands in the S3 bucket, eliminating manual checks.
  2. Serverless Orchestration: AWS Lambda is a lightweight, serverless compute service ideal for orchestrating tasks like triggering an AWS Glue job. It executes code in response to events without provisioning or managing servers.
  3. Robust ETL: AWS Glue is specifically designed for ETL workloads. It can read data from S3 (where existing and new data reside), understand schemas via the Data Catalog, perform complex merges and transformations, and write the results back to S3. This makes it ideal for merging data from various sources and formats.
  4. Data Lake Integration: By storing data in S3 and making it queryable via Athena, the solution leverages a data lake architecture, which is flexible and cost-efficient for analytics and machine learning workloads. Merging with an Athena table implies the existing business data is already part of this data lake.

    Example: A financial institution subscribes to a market data product on AWS Data Exchange. When new daily market data files are delivered to their S3 bucket, an S3 event triggers a Lambda function. This Lambda function then starts an AWS Glue job. The Glue job reads the new market data from the S3 bucket, along with the existing historical market data stored in another S3 location (and cataloged by Athena). It merges the new data, deduplicates entries, and potentially enriches it before writing the updated, merged dataset back to a designated S3 prefix, making it immediately available for queries via Athena for financial analysts and ML models.
QUESTION 3

A Machine Learning Specialist is configuring Amazon SageMaker so multiple Data Scientists can access notebooks, train models, and deploy endpoints. To ensure the best operational performance, the Specialist needs to be able to track how often the Scientists are deploying models, GPU and CPU utilization on the deployed SageMaker endpoints, and all errors that are generated when an endpoint is invoked.

Which services are integrated with Amazon SageMaker to track this information? (Choose two.)

A
AWS CloudTrail
B
AWS Health
C
AWS Trusted Advisor
D
Amazon CloudWatch
E
AWS Config

Correct Option: A,D

โœ… AWS CloudTrail
Description: AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. It records actions taken by a user, role, or an AWS service as events. These events include actions performed through the AWS Management Console, AWS SDKs, command line tools, and other AWS services.

Why this fits: For monitoring AWS Machine Learning workloads, CloudTrail is critical for auditing API calls made to ML services like Amazon SageMaker, Amazon Rekognition, or Amazon Comprehend. This allows organizations to track administrative and data plane activities, such as who initiated a training job, who modified a model endpoint configuration, or who accessed data in an S3 bucket used by an ML pipeline. This provides a comprehensive security and operational audit trail.

Example: A data scientist updates the configuration of an existing SageMaker endpoint. CloudTrail records the UpdateEndpoint API call, capturing details like the user identity, the exact timestamp, the source IP address, and the specific parameters of the update. This record is invaluable for security auditing and compliance.



โœ… Amazon CloudWatch
Description: Amazon CloudWatch is a monitoring and observability service that provides data and actionable insights to monitor applications, respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health. It collects monitoring and operational data in the form of logs, metrics, and events.

Why this fits: CloudWatch is essential for real-time performance and health monitoring of ML workloads. It collects various metrics from ML services (e.g., SageMaker training job CPU/GPU utilization, memory usage, endpoint invocation latency, error rates). It allows for the creation of custom dashboards to visualize these metrics, and the configuration of alarms to proactively notify stakeholders of potential issues, performance degradation, or operational failures. It also centralizes logs from ML applications and services, aiding in troubleshooting.

Example: During a SageMaker model training job, CloudWatch monitors resource utilization metrics like CPUUtilization, GPUUtilization, and MemoryUtilization. An alarm can be configured to trigger if CPUUtilization drops below a certain threshold for an extended period, potentially indicating an inefficient training job or a stalled process, prompting investigation.

QUESTION 4

A financial services company wants to automate its loan approval process by building a machine learning (ML) model. Each loan data point contains credit history from a third-party data source and demographic information about the customer. Each loan approval prediction must come with a report that contains an for why the customer was approved for a loan or was denied for a loan. The company will use Amazon SageMaker to build the model.

Which solution will meet these requirements with the LEAST development effort?

A
Use SageMaker Model Debugger to automatically debug the predictions. generate the , and attach the report.
B
Use AWS Lambda to provide feature importance and partial dependence plots. Use the plots to generate and attach the report.
C
Use SageMaker Clarify to generate the report. Attach the report to the predicted results.
D
Use custom Amazon CloudWatch metrics to generate the report. Attach the report to the predicted results.

Correct Option: C

โœ… Use SageMaker Clarify to generate the report. Attach the report to the predicted results.
Description: Amazon SageMaker Clarify is a machine learning (ML) capability that helps detect potential bias in ML models and provides tools to explain model predictions. It allows for analysis of data for bias prior to training, and provides post-training explainability for model predictions. It can generate various types of reports, including bias reports and explainability reports (e.g., SHAP values), which can be integrated into model workflows.

Why this fits: The question implies a need to generate a report related to model predictions, likely for bias detection or model explainability. SageMaker Clarify is purpose-built for these tasks. It can assess potential bias in data and models and generate explainability reports for individual predictions or overall model behavior. Attaching this report to predicted results is a common practice to provide transparency and ensure fairness, directly aligning with Clarify's core functionality.

Example: A financial institution uses a machine learning model to approve loan applications. To ensure fairness and compliance, they use SageMaker Clarify after model training to detect potential biases in the model's decisions based on demographic features. Clarify generates an explainability report showing the feature importance for each loan approval decision, which is then attached to the decision record to provide transparency to auditors and customers.



QUESTION 5

A retail chain has been ingesting purchasing records from its network of 20,000 stores to Amazon S3 using Amazon Kinesis Data Firehose. To support training an improved machine learning model, training records will require new but simple transformations, and some attributes will be combined. The model needs to be retrained daily.

Given the large number of stores and the legacy data ingestion, which change will require the LEAST amount of development effort?

A
Require that the stores to switch to capturing their data locally on AWS Storage Gateway for loading into Amazon S3, then use AWS Glue to do the transformation.
B
Deploy an Amazon EMR cluster running Apache Spark with the transformation logic, and have the cluster run each day on the accumulating records in Amazon S3, outputting new/transformed records to Amazon S3.
C
Spin up a fleet of Amazon EC2 instances with the transformation logic, have them transform the data records accumulating on Amazon S3, and output the transformed records to Amazon S3.
D
Insert an Amazon Kinesis Data Analytics stream downstream of the Kinesis Data Firehose stream that transforms raw record attributes into simple transformed values using SQL.

Correct Option: D

โœ… Insert an Amazon Kinesis Data Analytics stream downstream of the Kinesis Data Firehose stream that transforms raw record attributes into simple transformed values using SQL.
Description: Amazon Kinesis Data Analytics (KDA) is a fully managed service designed to process and analyze streaming data in real time. It allows users to write SQL queries against incoming data streams to perform transformations, aggregations, and enrichments without needing to manage servers or infrastructure. It integrates seamlessly with Kinesis Data Firehose as both an input and output source.

Why this fits: This solution directly addresses the requirement for transforming raw record attributes into simple transformed values using SQL, specifically within a streaming context that leverages Kinesis Data Firehose. Kinesis Data Analytics for SQL applications provides a serverless and scalable way to perform these transformations in real time, making it ideal for processing data immediately after ingestion via Kinesis Data Firehose. It eliminates the operational overhead associated with managing compute instances (like EC2 or EMR) while offering robust real-time processing capabilities through a familiar SQL interface.

Example: A retail company streams raw customer clickstream data from its website via Kinesis Data Firehose. Before archiving this data to Amazon S3, they need to extract the product ID and categorize the event type (e.g., 'view', 'add_to_cart') from a complex JSON payload and calculate a simplified session duration. An Amazon Kinesis Data Analytics SQL application can be inserted downstream of the Firehose. It reads the raw JSON, applies SQL queries like SELECT JSON_VALUE(raw_data, '$.product.id') AS productId, CASE WHEN JSON_VALUE(raw_data, '$.event') LIKE '%view%' THEN 'view' ELSE 'other' END AS eventCategory FROM SOURCE_SQL_STREAM_001 to perform the necessary transformations, and then outputs the clean, transformed data to another Kinesis Data Firehose stream for delivery to S3.

QUESTION 6

A manufacturing company wants to build a machine learning (ML) model to predict defects in the screws that the company produces. There are three types of defects: bent. brittle. and cracked. Each screw can have zero or more of these defects. The company has collected data on all the screws produced in the past 6 months. including any features or defects associated with the screws.

Which algorithm will meet this requirement?

A
Multilayer perceptron (MLP) with a sigmoid function at the output layer
B
Multilayer perceptron (MLP) with a Softmax function at the output layer
C
K-means clustering with 3 clusters.
D
K-nearest neighbors with 3 neighbors

Correct Option: B

โœ… Multilayer perceptron (MLP) with a Softmax function at the output layer
Description: A Multilayer Perceptron (MLP) is a class of feedforward artificial neural networks. It typically consists of an input layer, one or more hidden layers, and an output layer. MLPs are widely used for supervised learning tasks, including classification. The Softmax function is an activation function applied to the output layer of a neural network that is used for multi-class classification problems. It takes a vector of real numbers (the network's raw outputs, often called logits) and normalizes them into a probability distribution, where each value is between 0 and 1, and all values sum to 1. Each output value represents the predicted probability of the input belonging to a specific class.

Why this fits: For multi-class classification tasks, where an input instance needs to be assigned to one of several mutually exclusive categories, an MLP with a Softmax output layer is the standard and most appropriate architecture. The Softmax function provides a probabilistic interpretation of the network's output, allowing the model to predict the likelihood of an input belonging to each class. This is distinct from binary classification (which might use a sigmoid function) or unsupervised learning methods like clustering (K-means) or instance-based learning (K-nearest neighbors) which are not focused on the activation function of a neural network's output layer for probabilistic multi-class prediction.

Example: Consider a machine learning model designed to classify customer reviews into one of three sentiment categories: "positive", "neutral", or "negative". An MLP would process the textual features of a review, and its output layer, activated by a Softmax function, would produce probabilities such as [0.8 (positive), 0.15 (neutral), 0.05 (negative)], indicating a high likelihood that the review is positive.

QUESTION 7

A Machine Learning Specialist is building a convolutional neural network (CNN) that will classify 10 types of animals. The Specialist has built a series of layers in a neural network that will take an input image of an animal, pass it through a series of convolutional and pooling layers, and then finally pass it through a dense and fully connected layer with 10 nodes. The Specialist would like to get an output from the neural network that is a probability distribution of how likely it is that the input image belongs to each of the 10 classes.

Which function will produce the desired output?

A
Dropout
B
Smooth L1 loss
C
Softmax
D
Rectified linear units (ReLU)

Correct Option: C

โœ… Softmax
Description: Softmax is an activation function used in the output layer of a neural network primarily for multi-class classification problems. It takes a vector of arbitrary real-valued scores (logits) and squashes them into a probability distribution over multiple classes. Each element of the output vector is a probability between 0 and 1, and the sum of all elements in the output vector is 1.

Why this fits: For multi-class classification, where an input belongs to exactly one of several possible classes, Softmax is the appropriate choice for the output layer's activation function. It ensures that the model's output can be interpreted as the probability of the input belonging to each class, which is crucial for decision-making and often paired with a cross-entropy loss function during training.

Example: In an image classification task aiming to identify whether an image contains a "cat," "dog," or "bird," the output layer of the neural network would typically have three neurons, one for each class. Applying the Softmax function to the outputs of these three neurons would yield a probability distribution, for instance, [0.1, 0.85, 0.05], indicating an 85% probability that the image is a "dog."



QUESTION 8

A companyโ€™s machine learning (ML) specialist is building a computer vision model to classify IO different traffic signs. The company has stored 100 images of each class in Amazon $3. and the company has another 10.000 unlabeled images. All the images come from dash cameras and are a size of 224 pixels x 224 pixels. After several training runs, the model is overfitting on the training data.

Which actions should the ML specialist take to address this problem? (Select TWO.)

A
Use Amazon SageMaker Ground Truth to label the unlabeled images.
B
Use image preprocessing to transform the images into grayscale images.
C
Use data augmentation to rotate and translate the labeled images.
D
Replace the activation of the last layer with a sigmoid.
E
Use the Amazon SageMaker k-nearest neighbors (k-NN) algorithm to label the unlabeled images.

Correct Option: A,C

โœ… Use Amazon SageMaker Ground Truth to label the unlabeled images.
Description: Amazon SageMaker Ground Truth is a fully managed data labeling service that makes it easy to build high-quality training datasets for machine learning. It can use human annotators (via Mechanical Turk, private workforce, or vendor workforce) or machine learning to label data, accelerating the data labeling process for various data types, including images.

Why this fits: Supervised machine learning models require labeled data for training. If you have a collection of unlabeled images, Ground Truth provides a scalable and efficient way to obtain the necessary labels, transforming raw data into a usable training dataset. This is a fundamental step in many machine learning workflows, especially for tasks like image classification, object detection, or segmentation where manual labeling can be tedious and error-prone.

Example: For a computer vision task requiring a model to identify different types of vehicles in images, a developer could use Amazon SageMaker Ground Truth to have human workers draw bounding boxes around vehicles and label them as "car," "truck," or "motorcycle" in a large dataset of raw images.



โœ… Use data augmentation to rotate and translate the labeled images.
Description: Data augmentation is a technique used to artificially increase the size and diversity of a training dataset by creating modified versions of existing data. For image data, common augmentation techniques include rotation, translation (shifting), scaling, flipping, shearing, and adjusting brightness or contrast. These transformations create new, slightly different training examples from the original labeled data.

Why this fits: When training a machine learning model, especially with a limited dataset, data augmentation helps improve the model's robustness and generalization capabilities, reducing overfitting. By exposing the model to various orientations and positions of the objects within images (e.g., via rotation and translation), the model learns to recognize patterns regardless of their exact presentation, leading to better performance on unseen data.

Example: If a dataset of cat images is used to train an image classifier, applying data augmentation could involve rotating some cat images by 15 degrees, translating others slightly to the left or right, or flipping a subset horizontally. This creates a larger, more varied training set without collecting new original images, helping the model become more resilient to variations in real-world cat photos.

QUESTION 9

A Machine Learning Specialist trained a regression model, but the first iteration needs optimizing. The Specialist needs to understand whether the model is more frequently overestimating or underestimating the target.

What option can the Specialist use to determine whether it is overestimating or underestimating the target value?

A
Root Mean Square Error (RMSE)
B
Residual plots
C
Area under the curve
D
Confusion matrix

Correct Option: B

โœ… Residual plots
Description: Residual plots are graphical tools used in regression analysis to visualize the difference between the observed and predicted values (residuals) from a model. These plots typically display the residuals on the y-axis against the predicted values, an independent variable, or the order of observations on the x-axis.

Why this fits: Residual plots are an essential diagnostic tool for regression models. They allow data scientists to visually inspect key assumptions of a regression model, such as linearity, homoscedasticity (constant variance of errors), and independence of errors. By identifying systematic patterns (e.g., a curve, a funnel shape, or unusual clusters), outliers, or trends in the residuals, these plots help in diagnosing model inadequacies, identifying whether a different model form (e.g., non-linear) is needed, or confirming the robustness of the model's assumptions. Unlike numerical metrics like RMSE, residual plots provide qualitative insights into the model's error distribution.

Example: After training a linear regression model to predict housing prices using Amazon SageMaker, a data scientist might generate a residual plot. If the plot shows residuals randomly scattered around zero with no clear pattern, it suggests that the linear model is a good fit and its assumptions are largely met. However, if the plot reveals a distinct "U" shape, it indicates that the linear model might be underfitting and that a non-linear relationship (e.g., quadratic term for a feature) may exist, requiring model refinement.



QUESTION 10

A machine learning (ML) engineer is integrating a production model with a customer metadata repository for real-time inference. The repository is hosted in Amazon SageMaker Feature Store. The engineer wants to retrieve only the latest version of the customer metadata record for a single customer at a time.

Which solution will meet these requirements?

A
Use the SageMaker Feature Store BatchGetRecord API with the record identifier. Filter to find the latest record.
B
Create an Amazon Athena query to retrieve the data from the feature table.
C
Create an Amazon Athena query to retrieve the data from the feature table. Use the write_time value to find the latest record.
D
Use the SageMaker Feature Store GetRecord API with the record identifier.

Correct Option: D

โœ… Use the SageMaker Feature Store GetRecord API with the record identifier.
Description: The GetRecord API in Amazon SageMaker Feature Store is designed for low-latency, real-time retrieval of a single record from the Online Store. It requires the FeatureGroupName and the RecordIdentifierValueAsString (which uniquely identifies the entity). The Online Store is optimized to serve the latest available features for a given record identifier.

Why this fits: When an application needs the most up-to-date features for a specific entity (identified by its record identifier) quickly, GetRecord is the most direct and efficient method. The SageMaker Feature Store's Online Store automatically ensures that GetRecord returns the absolute latest version of the record, eliminating the need for manual filtering by write_time or other timestamps. This is crucial for real-time inference where low latency is paramount.

Example: A real-time recommendation engine needs to fetch the latest browsing history and user preferences for a user entering a website. By calling GetRecord on a user_features Feature Group with the user's ID as the record identifier, the engine can instantly retrieve the most recent feature values for that user to personalize recommendations.

QUESTION 11

A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in the figure provided.

Based on this information, which model would have the HIGHEST recall with respect to the fraudulent class?

A
Decision tree
B
Linear support vector machine (SVM)
C
Naive Bayesian classifier
D
Single Perceptron with sigmoidal activation function

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 12

A data scientist is using an Amazon SageMaker notebook instance and needs to securely access data stored in a specific Amazon S3 bucket.

How should the data scientist accomplish this?

A
Add an S3 bucket policy allowing GetObject, PutObject, and ListBucket permissions to the Amazon SageMaker notebook ARN as principal.
B
Encrypt the objects in the S3 bucket with a custom AWS Key Management Service (AWS KMS) key that only the notebook owner has access to.
C
Attach the policy to the IAM role associated with the notebook that allows GetObject, PutObject, and ListBucket operations to the specific S3 bucket.
D
Use a script in a lifecycle configuration to configure the AWS CLI on the instance with an access key ID and secret.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 13

A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model using Amazon SageMaker with Area Under the ROC Curve (AUC) as the objective metric. This workflow will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click-through on data that goes stale every 24 hours.

With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialist wants to reconfigure the input hyperparameter range(s).

Which visualization will accomplish this?

A
A histogram showing whether the most important input feature is Gaussian.
B
A scatter plot with points colored by target variable that uses t-Distributed Stochastic Neighbor Embedding (t-SNE) to visualize the large number of input variables in an easier-to-read dimension.
C
A scatter plot showing the performance of the objective metric over each training iteration.
D
A scatter plot showing the correlation between maximum tree depth and the objective metric.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 14

A machine learning (ML) specialist uploads 5 TB of data to an Amazon SageMaker Studio environment. The ML specialist performs initial data cleansing. Before the ML specialist begins to train a model, the ML specialist needs to create and view an analysis report that details potential bias in the uploaded data.

Which combination of actions will meet these requirements with the LEAST operational overhead? (Choose two.)

A
Use SageMaker Clarify to automatically detect data bias.
B
Turn on the bias detection option in SageMaker Ground Truth to automatically analyze data features.
C
Use SageMaker Model Monitor to generate a bias drift report.
D
Configure SageMaker Data Wrangler to generate a bias report.
E
Use SageMaker Experiments to perform a data check.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 15

A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences. The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions.

Here is an example from the dataset:

"The quck BROWN FOX jumps over the lazy dog.โ€

Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Choose three.)

A
Perform part-of-speech tagging and keep the action verb and the nouns only.
B
Normalize all words by making the sentence lowercase.
C
Remove stop words using an English stopword dictionary.
D
Correct the typography on "quck" to "quick.โ€
E
One-hot encode all words in the sentence.
F
Tokenize the sentence into words.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 16

A company wants to conduct targeted marketing to sell solar panels to homeowners. The company wants to use machine learning (ML) technologies to identify which houses already have solar panels. The company has collected 8,000 satellite images as training data and will use Amazon SageMaker Ground Truth to label the data.

The company has a small internal team that is working on the project. The internal team has no ML expertise and no ML experience.

Which solution will meet these requirements with the LEAST amount of effort from the internal team?

A
Set up a private workforce that consists of the internal team. Use the private workforce and the SageMaker Ground Truth active learning feature to label the data. Use Amazon Rekognition Custom Labels for model training and hosting.
B
Set up a private workforce that consists of the internal team. Use the private workforce to label the data. Use Amazon Rekognition Custom Labels for model training and hosting.
C
Set up a private workforce that consists of the internal team. Use the private workforce and the SageMaker Ground Truth active learning feature to label the data. Use the SageMaker Object Detection algorithm to train a model. Use SageMaker batch transform for inference.
D
Set up a public workforce. Use the public workforce to label the data. Use the SageMaker Object Detection algorithm to train a model. Use SageMaker batch transform for inference.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 17

A company is using Amazon Polly to translate plaintext documents to speech for automated company announcements. However, company acronyms are being mispronounced in the current documents.

How should a Machine Learning Specialist address this issue for future documents?

A
Convert current documents to SSML with pronunciation tags.
B
Create an appropriate pronunciation lexicon.
C
Output speech marks to guide in pronunciation.
D
Use Amazon Lex to preprocess the text files for pronunciation

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 18

A company is building a line-counting application for use in a quick-service restaurant. The company wants to use video cameras pointed at the line of customers at a given register to measure how many people are in line and deliver notifications to managers if the line grows too long. The restaurant locations have limited bandwidth for connections to external services and cannot accommodate multiple video streams without impacting other operations.

Which solution should a machine learning specialist implement to meet these requirements?

A
Install cameras compatible with Amazon Kinesis Video Streams to stream the data to AWS over the restaurant's existing internet connection. Write an AWS Lambda function to take an image and send it to Amazon Rekognition to count the number of faces in the image. Send an Amazon Simple Notification Service (Amazon SNS) notification if the line is too long.
B
Deploy AWS DeepLens cameras in the restaurant to capture video. Enable Amazon Rekognition on the AWS DeepLens device, and use it to trigger a local AWS Lambda function when a person is recognized. Use the Lambda function to send an Amazon Simple Notification Service (Amazon SNS) notification if the line is too long.
C
Build a custom model in Amazon SageMaker to recognize the number of people in an image. Install cameras compatible with Amazon Kinesis Video Streams in the restaurant. Write an AWS Lambda function to take an image. Use the SageMaker endpoint to call the model to count people. Send an Amazon Simple Notification Service (Amazon SNS) notification if the line is too long.
D
Build a custom model in Amazon SageMaker to recognize the number of people in an image. Deploy AWS DeepLens cameras in the restaurant. Deploy the model to the cameras. Deploy an AWS Lambda function to the cameras to use the model to count people and send an Amazon Simple Notification Service (Amazon SNS) notification if the line is too long.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 19

When submitting Amazon SageMaker training jobs using one of the built-in algorithms, which common parameters MUST be specified? (Choose three.)

A
The training channel identifying the location of training data on an Amazon S3 bucket.
B
The validation channel identifying the location of validation data on an Amazon S3 bucket.
C
The IAM role that Amazon Sage Maker can assume to perform tasks on behalf of the users.
D
Hyperparameters in a JSON array as documented for the algorithm used.
E
The Amazon EC2 instance class specifying whether training will be run using CPU or GPU.
F
The output path specifying where on an Amazon S3 bucket the trained model will persist.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 20

A company is planning a marketing campaign to promote a new product to existing customers. The company has data for past promotions that are similar. The company decides to try an experiment to send a more expensive marketing package to a smaller number of customers. The company wants to target the marketing campaign to customers who are most likely to buy the new product. The experiment requires that at least 90% of the customers who are likely to purchase the new product receive the marketing materials.

The company trains a model by using the linear learner algorithm in Amazon SageMaker. The model has a recall score of 80% and a precision of 75%.

How should the company retrain the model to meet these requirements?

A
Set the target_recall hyperparameter to 90%. Set the binary_classifier_model_selection_criteria hyperparameter to recall_at_target_precision.
B
Set the target_precision hyperparameter to 90%. Set the binary_classifier_model_selection_criteria hyperparameter to precision_at_target_recall.
C
Use 90% of the historical data for training. Set the number of epochs to 20.
D
Set the normalize_label hyperparameter to true. Set the number of classes to 2.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 21

A monitoring service generates 1 TB of scale metrics record data every minute. A Research team performs queries on this data using Amazon Athena. The queries run slowly due to the large volume of data, and the team requires better performance.

How should the records be stored in Amazon S3 to improve query performance?

A
CSV files
B
Parquet files
C
Compressed JSON
D
RecordIO

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 22

A wildlife research company has a set of images of lions and cheetahs. The company created a dataset of the images. The company labeled each image with a binary label that indicates whether an image contains a lion or cheetah. The company wants to train a model to identify whether new images contain a lion or cheetah.

Which Amazon SageMaker algorithm will meet this requirement?

A
XGBoost.
B
Image Classification - TensorFlow
C
Object Detection - TensorFlow
D
Semantic segmentation โ€“ MXNet.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 23

Machine Learning Specialist is working with a media company to perform classification on popular articles from the company's website. The company is using random forests to classify how popular an article will be before it is published. A sample of the data being used is below.

Given the dataset, the Specialist wants to convert the Day_Of_Week column to binary values. What technique should be used to convert this column to binary values?

A
Binarization
B
One-hot encoding
C
Tokenization
D
Normalization transformation

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 24

A data scientist for a medical diagnostic testing company has developed a machine learning (ML) model to identify patients who have a specific disease. The dataset that the scientist used to train the model is imbalanced. The dataset contains a large number of healthy patients and only a small number of patients who have the disease. The model should consider that patients who are incorrectly identified as positive for the disease will increase costs for the company.

Which metric will MOST accurately evaluate the performance of this model?

A
Recall
B
F1 score
C
Accuracy
D
Precision

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 25

A gaming company has launched an online game where people can start playing for free, but they need to pay if they choose to use certain features. The company needs to build an automated system to predict whether or not a new user will become a paid user within 1 year. The company has gathered a labeled dataset from 1 million users.

The training dataset consists of 1,000 positive samples (from users who ended up paying within 1 year) and 999,000 negative samples (from users who did not use any paid features). Each data sample consists of 200 features including user age, device, location, and play patterns.

Using this dataset for training, the Data Science team trained a random forest model that converged with over 99% accuracy on the training set. However, the prediction results on a test dataset were not satisfactory

Which of the following approaches should the Data Science team take to mitigate this issue? (Choose two.)

A
Add more deep trees to the random forest to enable the model to learn more features.
B
Include a copy of the samples in the test dataset in the training dataset.
C
Generate more positive samples by duplicating the positive samples and adding a small amount of noise to
D
Change the cost function so that false negatives have a higher impact on the cost value than false positives.
E
Change the cost function so that false positives have a higher impact on the cost value than false negatives.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 26

A network security vendor needs to ingest telemetry data from thousands of endpoints that run all over the world. The data is transmitted every 30 seconds in the form of records that contain 50 fields. Each record is up to 1 KB in size. The security vendor uses Amazon Kinesis Data Streams to ingest the data. The vendor requires hourly summaries of the records that Kinesis Data Streams ingests. The vendor will use Amazon Athena to query the records and to generate the summaries. The Athena queries will target 7 to 12 of the available data fields.

Which solution will meet these requirements with the LEAST amount of customization to transform and store the ingested data?

A
Use AWS Lambda to read and aggregate the data hourly. Transform the data and store it in Amazon S3 by using Amazon Kinesis Data Firehose.
B
Use Amazon Kinesis Data Firehose to read and aggregate the data hourly. Transform the data and store it in Amazon S3 by using a short-lived Amazon EMR cluster.
C
Use Amazon Kinesis Data Analytics to read and aggregate the data hourly. Transform the data and store it in Amazon S3 by using Amazon Kinesis Data Firehose.
D
Use Amazon Kinesis Data Firehose to read and aggregate the data hourly. Transform the data and store it in Amazon S3 by using AWS Lambda.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 27

A Data Scientist is developing a machine learning model to predict future patient outcomes based on information collected about each patient and their treatment plans. The model should output a continuous value as its prediction. The data available includes labeled outcomes for a set of 4,000 patients. The study was conducted on a group of individuals over the age of 65 who have a particular disease that is known to worsen with age.

Initial models have performed poorly. While reviewing the underlying data, the Data Scientist notices that, out of 4,000 patient observations, there are 450 where the patient age has been input as 0. The other features for these observations appear normal compared to the rest of the sample population

How should the Data Scientist correct this issue?

A
Drop all records from the dataset where age has been set to 0.
B
Replace the age field value for records with a value of 0 with the mean or median value from the dataset
C
Drop the age feature from the dataset and train the model using the rest of the features.
D
Use k-means clustering to handle missing features

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 28

A machine learning (ML) specialist is training a multilayer perceptron (MLP) on a dataset with multiple classes. The target class of interest is unique compared to the other classes in the dataset, but it does not achieve an acceptable recall metric. The ML specialist varies the number and size of the MLP's hidden layers, but the results do not improve significantly.

Which solution will improve recall in the LEAST amount of time?

A
Add class weights to the MLP's loss function, and then retrain.
B
Gather more data by using Amazon Mechanical Turk, and then retrain.
C
Train a k-means algorithm instead of an MLP.
D
Train an anomaly detection model instead of an MLP

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 29

A Data Science team is designing a dataset repository where it will store a large amount of training data commonly used in its machine learning models. As Data Scientists may create an arbitrary number of new datasets every day, the solution has to scale automatically and be cost-effective. Also, it must be possible to explore the data using SQL.

Which storage scheme is MOST adapted to this scenario?

A
Store datasets as files in Amazon S3.
B
Store datasets as files in an Amazon EBS volume attached to an Amazon EC2 instance.
C
Store datasets as tables in a multi-node Amazon Redshift cluster.
D
Store datasets as global tables in Amazon DynamoDB.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 30

A company has a podcast platform that has thousands of users. The company implemented an algorithm to detect low podcast engagement based on a 10-minute running window of user events such as listening to, pausing, and closing the podcast. A machine learning (ML) specialist is designing the ingestion process for these events. The ML specialist needs to transform the data to prepare the data for inference.

How should the ML specialist design the transformation step to meet these requirements with the LEAST operational effort?

A
Use an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster to ingest event data. Use Amazon Kinesis Data Analytics to transform the most recent 10 minutes of data before inference.
B
Use Amazon Kinesis Data Streams to ingest event data. Store the data in Amazon S3 by using Amazon Kinesis Data Firehose. Use AWS Lambda to transform the most recent 10 minutes of data before inference.
C
Use Amazon Kinesis Data Streams to ingest event data. Use Amazon Kinesis Data Analytics to transform the most recent 10 minutes of data before inference.
D
Use an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster to ingest event data. Use AWS Lambda to transform the most recent 10 minutes of data before inference.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 31

A Machine Learning Specialist deployed a model that provides product recommendations on a company's website. Initially, the model was performing very well and resulted in customers buying more products on average. However, within the past few months, the Specialist has noticed that the effect of product recommendations has diminished and customers are starting to return to their original habits of spending less. The Specialist is unsure of what happened, as the model has not changed from its initial deployment over a year ago.

Which method should the Specialist try to improve model performance?

A
The model needs to be completely re-engineered because it is unable to handle product inventory changes.
B
The model's hyperparameters should be periodically updated to prevent drift.
C
The model should be periodically retrained from scratch using the original data while adding a regularization term to handle product inventory changes
D
The model should be periodically retrained using the original training data plus new data as product inventory changes.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 32

A machine learning (ML) specialist uploads a dataset to an Amazon S3 bucket that is protected by server-side encryption with AWS KMS keys (SSE-KMS). The ML specialist needs to ensure that an Amazon SageMaker notebook instance can read the dataset that is in Amazon S3.

Which solution will meet these requirements?

A
Define security groups to allow all HTTP inbound and outbound traffic. Assign the security groups to the SageMaker notebook instance.
B
Configure the SageMaker notebook instance to have access to the VPC. Grant permission in the AWS Key Management Service (AWS KMS) key policy to the notebookโ€™s VPC.
C
Assign an IAM role that provides S3 read access for the dataset to the SageMaker notebook. Grant permission in the KMS key policy to the IAM role.
D
Assign the same KMS key that encrypts the data in Amazon S3 to the SageMaker notebook instance.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 33

A Machine Learning Specialist working for an online fashion company wants to build a data ingestion solution for the company's Amazon S3-based data lake.

The Specialist wants to create a set of ingestion mechanisms that will enable future capabilities comprised of:

Real-time analytics

Interactive analytics of historical data Clickstream analytics

Product recommendations

Which services should the Specialist use?

A
AWS Glue as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for real- time data insights; Amazon Kinesis Data Firehose for delivery to Amazon ES for clickstream analytics; Amazon EMR to generate personalized product recommendations
B
Amazon Athena as the data catalog: Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for near-real-time data insights; Amazon Kinesis Data Firehose for clickstream analytics; AWS Glue to generate personalized product recommendations
C
AWS Glue as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for historical data insights; Amazon Kinesis Data Firehose for delivery to Amazon ES for clickstream analytics; Amazon EMR to generate personalized product recommendations
D
Amazon Athena as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for historical data insights; Amazon DynamoDB streams for clickstream analytics; AWS Glue to generate personalized product recommendations

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 34

A company wants to enhance audits for its machine learning (ML) systems. The auditing system must be able to perform metadata analysis on the features that the ML models use. The audit solution must generate a report that analyzes the metadata. The solution also must be able to set the data sensitivity and authorship of features.

Which solution will meet these requirements with the LEAST development effort?

A
Use Amazon SageMaker Feature Store to select the features. Create a data flow to perform feature-level metadata analysis. Create an Amazon DynamoDB table to store feature-level metadata. Use Amazon QuickSight to analyze the metadata.
B
Use Amazon SageMaker Feature Store to set feature groups for the current features that the ML models use. Assign the required metadata for each feature. Use SageMaker Studio to analyze the metadata.
C
Use Amazon SageMaker Features Store to apply custom algorithms to analyze the feature-level metadata that the company requires. Create an Amazon DynamoDB table to store feature-level metadata. Use Amazon QuickSight to analyze the metadata.
D
Use Amazon SageMaker Feature Store to set feature groups for the current features that the ML models use. Assign the required metadata for each feature. Use Amazon QuickSight to analyze the metadata.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 35

A company is observing low accuracy while training on the default built-in image classification algorithm in Amazon SageMaker. The Data Science team wants to use an Inception neural network architecture instead of

a ResNet architecture.

Which of the following will accomplish this? (Choose two.)

A
Customize the built-in image classification algorithm to use Inception and use this for model training.
B
Create a support case with the SageMaker team to change the default image classification algorithm to Inception.
C
Bundle a Docker container with TensorFlow Estimator loaded with an Inception network and use this for model training.
D
Use custom code in Amazon SageMaker with TensorFlow Estimator to load the model with an Inception network, and use this for model training.
E
Download and apt-get install the inception network code into an Amazon EC2 instance and use this instance as a Jupyter notebook in Amazon SageMaker.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 36

A company is deploying a new machine learning (ML) model in a production environment. The company is concerned that the ML model will drift over time, so the company creates a script to aggregate all inputs and predictions into a single file at the end of each day. The company stores the file as an object in an Amazon S3 bucket. The total size of the daily file is 100 GB. The daily file size will increase over time.

Four times a year, the company samples the data from the previous 90 days to check the ML model for drift. After the 90-day period, the company must keep the files for compliance reasons.

The company needs to use S3 storage classes to minimize costs. The company wants to maintain the same storage durability of the data.

Which solution will meet these requirements?

A
Store the daily objects in the S3 Standard-InfrequentAccess (S3 Standard-IA) storage class. Configure an S3 Lifecycle rule to move the objects to S3 Glacier Flexible Retrieval after 90 days.
B
Store the daily objects in the S3 One Zone-Infrequent Access (S3 One Zone-IA) storage class. Configure an S3 Lifecycle rule to move the objects to S3 Glacier Flexible Retrieval after 90 days.
C
Store the daily objects in the S3 Standard-InfrequentAccess (S3 Standard-IA) storage class. Configure an S3 Lifecycle rule to move the objects to S3 Glacier Deep Archive after 90 days.
D
Store the daily objects in the S3 One Zone-Infrequent Access (S3 One Zone-IA) storage class. Configure an S3 Lifecycle rule to move the objects to S3 Glacier Deep Archive after 90 days.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 37

A Machine Learning Specialist built an image classification deep learning model. However, the Specialist ran into an overfitting problem in which the training and testing accuracies were 99% and 75%, respectively.

How should the Specialist address this issue and what is the reason behind it?

A
The learning rate should be increased because the optimization process was trapped at a local minimum.
B
The dropout rate at the flatten layer should be increased because the model is not generalized enough.
C
The dimensionality of dense layer next to the flatten layer should be increased because the model is not complex enough.
D
The epoch number should be increased because the optimization process was terminated before it reached the global minimum.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 38

A company hosts a machine learning (ML) dataset repository on Amazon S3. A data scientist is preparing the repository to train a model. The data scientist needs to redact personally identifiable information (PH) from the dataset.

Which solution will meet these requirements with the LEAST development effort?

A
Use Amazon SageMaker Data Wrangler with a custom transformation to identify and redact the PII.
B
Create a custom AWS Lambda function to read the files, identify the PII. and redact the PII.
C
Use AWS Glue DataBrew to identity and redact the PII.
D
Use an AWS Glue development endpoint to implement the PII redaction from within a notebook.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 39

A Machine Learning team uses Amazon SageMaker to train an Apache MXNet handwritten digit classifier model using a research dataset. The team wants to receive a notification when the model is overfitting.

Auditors want to view the Amazon SageMaker log activity report to ensure there are no unauthorized API calls.

What should the Machine Learning team do to address the requirements with the least amount of code and fewest steps?

A
Implement an AWS Lambda function to log Amazon SageMaker API calls to Amazon S3. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting.
B
Use AWS CloudTrail to log Amazon SageMaker API calls to Amazon S3. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting.
C
Implement an AWS Lambda function to log Amazon SageMaker API calls to AWS CloudTrail. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting.
D
Use AWS CloudTrail to log Amazon SageMaker API calls to Amazon S3. Set up Amazon SNS to receive a notification when the model is overfitting

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 40

A media company wants to create a solution that identifies celebrities in pictures that users upload. The company also wants to identify the IP address and the timestamp details from the users so the company can prevent users from uploading pictures from unauthorized locations.

Which solution will meet these requirements with LEAST development effort?

A
Use AWS Panorama to identify celebrities in the pictures. Use AWS CloudTrail to capture IP address and timestamp details.
B
Use AWS Panorama to identify celebrities in the pictures. Make calls to the AWS Panorama Device SDK to capture IP address and timestamp details.
C
Use Amazon Rekognition to identify celebrities in the pictures. Use AWS CloudTrail to capture IP address and timestamp details.
D
Use Amazon Rekognition to identify celebrities in the pictures. Use the text detection feature to capture IP address and timestamp details.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 41

A Machine Learning Specialist is building a prediction model for a large number of features using linear models, such as linear regression and logistic regression. During exploratory data analysis, the Specialist observes that many features are highly correlated with each other. This may make the model unstable.

What should be done to reduce the impact of having such a large number of features?

A
Perform one-hot encoding on highly correlated features.
B
Use matrix multiplication on highly correlated features.
C
Create a new feature space using principal component analysis (PCA)
D
Apply the Pearson correlation coefficient.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 42

A social media company wants to develop a machine learning (ML) model to detect inappropriate or offensive content in images. The company has collected a large dataset of labeled images and plans to use the built-in Amazon SageMaker image classification algorithm to train the model. The company also intends to use SageMaker pipe mode to speed up the training.

The company splits the dataset into training, validation, and testing datasets. The company stores the training and validation images in folders that are named Training and Validation, respectively. The folders contain subfolders that correspond to the names of the dataset classes. The company resizes the images to the same size and generates two input manifest files named training.lst and validation.lst, for the training dataset and the validation dataset, respectively. Finally, the company creates two separate Amazon S3 buckets for uploads of the training dataset and the validation dataset.

Which additional data preparation steps should the company take before uploading the files to Amazon S3?

A
Generate two Apache Parquet files, training.parquet and validation.parquet, by reading the images into a Pandas data frame and storing the data frame as a Parquet file. Upload the Parquet files to the training S3 bucket.
B
Compress the training and validation directories by using the Snappy compression library. Upload the manifest and compressed files to the training S3 bucket.
C
Compress the training and validation directories by using the gzip compression library. Upload the manifest and compressed files to the training S3 bucket.
D
Generate two RecordIO files, training.rec and validation.rec, from the manifest files by using the im2rec Apache MXNet utility tool. Upload the RecordIO files to the training S3 bucket.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 43

A Machine Learning Specialist is implementing a full Bayesian network on a dataset that describes public transit in New York City. One of the random variables is discrete, and represents the number of minutes New Yorkers wait for a bus given that the buses cycle every 10 minutes, with a mean of 3 minutes.

Which prior probability distribution should the ML Specialist use for this variable?

A
Poisson distribution
B
Uniform distribution
C
Normal distribution
D
Binomial distribution

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 44

A financial company sends special offers to customers through weekly email campaigns. A bulk email marketing system takes the list of email addresses as an input and sends the marketing campaign messages in batches. Few customers use the offers from the campaign messages. The company does not want to send irrelevant offers to customers.

A machine learning (ML) team at the company is using Amazon SageMaker to build a model to recommend specific offers to each customer based on the customer's profile and the offers that the customer has accepted in the past.

Which solution will meet these requirements with the MOST operational efficiency?

A
Use the Factorization Machines algorithm to build a model that can generate personalized offer recommendations for customers. Deploy a SageMaker endpoint to generate offer recommendations. Feed the offer recommendations into the bulk email marketing system.
B
Use the Neural Collaborative Filtering algorithm to build a model that can generate personalized offer recommendations for customers. Deploy a SageMaker endpoint to generate offer recommendations. Feed the offer recommendations into the bulk email marketing system.
C
Use the Neural Collaborative Filtering algorithm to build a model that can generate personalized offer recommendations for customers. Deploy a SageMaker batch inference job to generate offer recommendations. Feed the offer recommendations into the bulk email marketing system.
D
Use the Factorization Machines algorithm to build a model that can generate personalized offer recommendations for customers. Deploy a SageMaker batch inference job to generate offer recommendations. Feed the offer recommendations into the bulk email marketing system.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 45

A Data Science team within a large company uses Amazon SageMaker notebooks to access data stored in Amazon S3 buckets. The IT Security team is concerned that internet-enabled notebook instances create a security vulnerability where malicious code running on the instances could compromise data privacy. The company mandates that all instances stay within a secured VPC with no internet access, and data communication traffic must stay within the AWS network.

How should the Data Science team configure the notebook instance placement to meet these requirements?

A
Associate the Amazon SageMaker notebook with a private subnet in a VPC. Place the Amazon SageMaker endpoint and S3 buckets within the same VPC.
B
Associate the Amazon SageMaker notebook with a private subnet in a VPC. Use IAM policies to grant access to Amazon S3 and Amazon SageMaker.
C
Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has S3 VPC endpoints and Amazon SageMaker VPC endpoints attached to it.
D
Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has a NAT gateway and an associated security group allowing only outbound connections to Amazon S3 and Amazon SageMaker.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 46

A company is creating an application to identify, count, and classify animal images that are uploaded to the companyโ€™s website. The company is using the Amazon SageMaker image classification algorithm with an ImageNetV2 convolutional neural network (CNN). The solution works well for most animal images but does not recognize many animal species that are less common.

The company obtains 10,000 labeled images of less common animal species and stores the images in Amazon S3. A machine learning (ML) engineer needs to incorporate the images into the model by using Pipe mode in SageMaker.

Which combination of steps should the ML engineer take to train the model? (Choose two.)

A
Use a ResNet model. Initiate full training mode by initializing the network with random weights.
B
Use an Inception model that is available with the SageMaker image classification algorithm.
C
Create a .lst file that contains a list of image files and corresponding class labels. Upload the .lst file to Amazon S3.
D
Initiate transfer learning. Train the model by using the images of less common species.
E
Use an augmented manifest file in JSON Lines format.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 47

A Machine Learning Specialist has created a deep learning neural network model that performs well on the training data but performs poorly on the test data.

Which of the following methods should the Specialist consider using to correct this? (Choose three.)

A
Decrease regularization.
B
Increase regularization.
C
Increase dropout.
D
Decrease dropout.
E
Increase feature combinations.
F
Decrease feature combinations.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 48

An automotive company uses computer vision in its autonomous cars. The company trained its object detection models successfully by using transfer learning from a convolutional neural network (CNN). The company trained the models by using PyTorch through the Amazon SageMaker SDK.

The vehicles have limited hardware and compute power. The company wants to optimize the model to reduce memory, battery, and hardware consumption without a significant sacrifice in accuracy.

Which solution will improve the computational efficiency of the models?

A
Use Amazon CloudWatch metrics to gain visibility into the SageMaker training weights, gradients, biases, and activation outputs. Compute the filter ranks based on the training information. Apply pruning to remove the low-ranking filters. Set new weights based on the pruned set of filters. Run a new training job with the pruned model.
B
Use Amazon SageMaker Ground Truth to build and run data labeling workflows. Collect a larger labeled dataset with the labelling workflows. Run a new training job that uses the new labeled data with previous training data.
C
Use Amazon SageMaker Debugger to gain visibility into the training weights, gradients, biases, and activation outputs. Compute the filter ranks based on the training information. Apply pruning to remove the low-ranking filters. Set the new weights based on the pruned set of filters. Run a new training job with the pruned model.
D
Use Amazon SageMaker Model Monitor to gain visibility into the ModelLatency metric and OverheadLatency metric of the model after the company deploys the model. Increase the model learning rate. Run a new training job.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 49

A Data Scientist needs to create a serverless ingestion and analytics solution for high-velocity, real-time streaming data.

The ingestion process must buffer and convert incoming records from JSON to a query-optimized, columnar format without data loss. The output datastore must be highly available, and Analysts must be able to run SQL queries against the data and connect to existing business intelligence dashboards.

Which solution should the Data Scientist build to satisfy the requirements?

A
Create a schema in the AWS Glue Data Catalog of the incoming data format. Use an Amazon Kinesis Data Firehose delivery stream to stream the data and transform the data to Apache Parquet or ORC format using the AWS Glue Data Catalog before delivering to Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena, and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector.
B
Write each JSON record to a staging location in Amazon S3. Use the S3 Put event to trigger an AWS Lambda function that transforms the data into Apache Parquet or ORC format and writes the data to a processed data location in Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena, and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector.
C
Write each JSON record to a staging location in Amazon S3. Use the S3 Put event to trigger an AWS Lambda function that transforms the data into Apache Parquet or ORC format and inserts it into an Amazon RDS PostgreSQL database. Have the Analysts query and run dashboards from the RDS database.
D
Use Amazon Kinesis Data Analytics to ingest the streaming data and perform real-time SQL queries to convert the records to Apache Parquet before delivering to Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 50

A manufacturing company has a production line with sensors that collect hundreds of quality metrics. The company has stored sensor data and manual inspection results in a data lake for several months. To automate quality control, the machine learning team must build an automated mechanism that determines whether the produced goods are good quality, replacement market quality, or scrap quality based on the manual inspection results.

Which modeling approach will deliver the MOST accurate prediction of product quality?

A
Amazon SageMaker DeepAR forecasting algorithm.
B
Amazon SageMaker XGBoost algorithm
C
Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm
D
A convolutional neural network (CNN) and ResNet.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 51

An online reseller has a large, multi-column dataset with one column missing 30% of its data. A Machine Learning Specialist believes that certain columns in the dataset could be used to reconstruct the missing data.

Which reconstruction approach should the Specialist use to preserve the integrity of the dataset?

A
Listwise deletion
B
Last observation carried forward
C
Multiple imputation
D
Mean substitution

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 52

A companyโ€™s data scientist has trained a new machine learning model that performs better on test data than the companyโ€™s existing model performs in the production environment. The data scientist wants to replace the existing model that runs on an Amazon SageMaker endpoint in the production environment. However, the company is concerned that the new model might not work well on the production environment data.

The data scientist needs to perform A/B testing in the production environment to evaluate whether the new model performs well on production environment data.

Which combination of steps must the data scientist take to perform the A/B testing? (Choose two.)

A
Create a new endpoint configuration that includes a production variant for each of the two models.
B
Create a new endpoint configuration that includes two target variants that point to different endpoints.
C
Deploy the new model to the existing endpoint.
D
Update the existing endpoint to activate the new model.
E
Update the existing endpoint to use the new endpoint configuration.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 53

A company is setting up an Amazon SageMaker environment. The corporate data security policy does not allow communication over the internet.

How can the company enable the Amazon SageMaker service without enabling direct internet access to Amazon SageMaker notebook instances?

A
Create a NAT gateway within the corporate VPC.
B
Route Amazon SageMaker traffic through an on-premises network.
C
Create Amazon SageMaker VPC interface endpoints within the corporate VPC.
D
Create VPC peering with Amazon VPC hosting Amazon SageMaker.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 54

Each morning, a data scientist at a rental car company creates insights about the previous dayโ€™s rental car reservation demands. The company needs to automate this process by streaming the data to Amazon S3 in near real time. The solution must detect high-demand rental cars at each of the companyโ€™s locations. The solution also must create a visualization dashboard that automatically refreshes with the most recent data.

Which solution will meet these requirements with the LEAST development time?

A
Use Amazon Kinesis Data Firehose to stream the reservation data directly to Amazon S3. Detect high-demand outliers by using Amazon QuickSight ML Insights. Visualize the data in QuickSight.
B
Use Amazon Kinesis Data Streams to stream the reservation data directly to Amazon S3. Detect high-demand outliers by using the Random Cut Forest (RCF) trained model in Amazon SageMaker. Visualize the data in Amazon QuickSight.
C
Use Amazon Kinesis Data Firehose to stream the reservation data directly to Amazon S3. Detect high-demand outliers by using the Random Cut Forest (RCF) trained model in Amazon SageMaker. Visualize the data in Amazon QuickSight.
D
Use Amazon Kinesis Data Streams to stream the reservation data directly to Amazon S3. Detect high-demand outliers by using Amazon QuickSight ML Insights. Visualize the data in QuickSight.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 55

A Machine Learning Specialist is training a model to identify the make and model of vehicles in images. The Specialist wants to use transfer learning and an existing model trained on images of general objects. The Specialist collated a large custom dataset of pictures containing different vehicle makes and models.

What should the Specialist do to initialize the model to re-train it with the custom data?

A
Initialize the model with random weights in all layers including the last fully connected layer.
B
Initialize the model with pre-trained weights in all layers and replace the last fully connected layer.
C
Initialize the model with random weights in all layers and replace the last fully connected layer.
D
Initialize the model with pre-trained weights in all layers including the last fully connected layer.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 56

A company is using a legacy telephony platform and has several years remaining on its contract. The company wants to move to AWS and wants to implement the following machine learning features:

โ€ข Call transcription in multiple languages

โ€ข Categorization of calls based on the transcript

โ€ข Detection of the main customer issues in the calls

โ€ข Customer sentiment analysis for each line of the transcript, with positive or negative indication and scoring of that sentiment

Which AWS solution will meet these requirements with the LEAST amount of custom model training?

A
Use Amazon Transcribe to process audio calls to produce transcripts, categorize calls, and detect issues. Use Amazon Comprehend to analyze sentiment.
B
Use Amazon Transcribe to process audio calls to produce transcripts. Use Amazon Comprehend to categorize calls, detect issues, and analyze sentiment.
C
Use Contact Lens for Amazon Connect to process audio calls to produce transcripts, categorize calls, detect issues, and analyze sentiment.
D
Use Contact Lens for Amazon Connect to process audio calls to produce transcripts. Use Amazon Comprehend to categorize calls, detect issues, and analyze sentiment.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 57

An office security agency conducted a successful pilot using 100 cameras installed at key locations within the main office. Images from the cameras were uploaded to Amazon S3 and tagged using Amazon Rekognition, and the results were stored in Amazon ES. The agency is now looking to expand the pilot into a full production system using thousands of video cameras in its office locations globally. The goal is to identify activities performed by non-employees in real time

Which solution should the agency consider?

A
Use a proxy server at each local office and for each camera, and stream the RTSP feed to a unique Amazon Kinesis Video Streams video stream. On each stream, use Amazon Rekognition Video and create a stream processor to detect faces from a collection of known employees, and alert when non-employees are detected.
B
Use a proxy server at each local office and for each camera, and stream the RTSP feed to a unique Amazon Kinesis Video Streams video stream. On each stream, use Amazon Rekognition Image to detect faces from a collection of known employees and alert when non-employees are detected.
C
Install AWS DeepLens cameras and use the DeepLens_Kinesis_Video module to stream video to Amazon Kinesis Video Streams for each camera. On each stream, use Amazon Rekognition Video and create a stream processor to detect faces from a collection on each stream, and alert when non- employees are detected.
D
Install AWS DeepLens cameras and use the DeepLens_Kinesis_Video module to stream video to Amazon Kinesis Video Streams for each camera. On each stream, run an AWS Lambda function to capture image fragments and then call Amazon Rekognition Image to detect faces from a collection of known employees, and alert when non-employees are detected.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 58

A company wants to predict the classification of documents that are created from an application. New documents are saved to an Amazon S3 bucket every 3 seconds. The company has developed three versions of a machine learning (ML) model within Amazon SageMaker to classify document text. The company wants to deploy these three versions to predict the classification of each document.

Which approach will meet these requirements with the LEAST operational overhead?

A
Configure an S3 event notification that invokes an AWS Lambda function when new documents are created. Configure the Lambda function to create three SageMaker batch transform jobs, one batch transform job for each model for each document.
B
Deploy all the models to a single SageMaker endpoint. Treat each model as a production variant. Configure an S3 event notification that invokes an AWS Lambda function when new documents are created. Configure the Lambda function to call each production variant and return the results of each model.
C
Deploy each model to its own SageMaker endpoint Configure an S3 event notification that invokes an AWS Lambda function when new documents are created. Configure the Lambda function to call each endpoint and return the results of each model.
D
Deploy each model to its own SageMaker endpoint. Create three AWS Lambda functions. Configure each Lambda function to call a different endpoint and return the results. Configure three S3 event notifications to invoke the Lambda functions when new documents are created.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 59

A Marketing Manager at a pet insurance company plans to launch a targeted marketing campaign on social media to acquire new customers. Currently, the company has the following data in Amazon Aurora:

Profiles for all past and existing customers Profiles for all past and existing insured pets Policy-level information

Premiums received Claims paid.

What steps should be taken to implement a machine learning model to identify potential new customers on social media?

A
Use regression on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media.
B
Use clustering on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media.
C
Use a recommendation engine on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media.
D
Use a decision tree classifier engine on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 60

A healthcare company wants to create a machine learning (ML) model to predict patient outcomes. A data science team developed an ML model by using a custom ML library. The company wants to use Amazon SageMaker to train this model. The data science team creates a custom SageMaker image to train the model. When the team tries to launch the custom image in SageMaker Studio, the data scientists encounter an error within the application.

Which service can the data scientists use to access the logs for this error?

A
Amazon S3
B
Amazon Elastic Block Store (Amazon EBS)
C
AWS CloudTrail
D
Amazon CloudWatch

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 61

A manufacturing company has a large set of labeled historical sales data. The manufacturer would like to predict how many units of a particular part should be produced each quarter.

Which machine learning approach should be used to solve this problem?

A
Logistic regression
B
Random Cut Forest (RCF)
C
Principal component analysis (PCA)
D
Linear regression

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 62

A company wants to create a data repository in the AWS Cloud for machine learning (ML) projects. The company wants to use AWS to perform complete ML lifecycles and wants to use Amazon S3 for the data storage. All of the company's data currently resides on premises and is 40 ื€ยขื€โ€™ in size.

The company wants a solution that can transfer and automatically update data between the on-premises object storage and Amazon S3. The solution must support encryption, scheduling, monitoring, and data integrity validation.

Which solution meets these requirements?

A
Use the S3 sync command to compare the source S3 bucket and the destination S3 bucket. Determine which source files do not exist in the destination S3 bucket and which source files were modified.
B
Use AWS Transfer for FTPS to transfer the files from the on-premises storage to Amazon S3.
C
Use AWS DataSync to make an initial copy of the entire dataset. Schedule subsequent incremental transfers of changing data until the final cutover from on premises to AWS.
D
Use S3 Batch Operations to pull data periodically from the on-premises storage. Enable S3 Versioning on the S3 bucket to protect against accidental overwrites.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 63

A financial services company is building a robust serverless data lake on Amazon S3. The data lake should be flexible and meet the following requirements:

Support querying old and new data on Amazon S3 through Amazon Athena and Amazon Redshift Spectrum.

Support event-driven ETL pipelines

Provide a quick and easy way to understand metadata

Which approach meets these requirements?

A
Use an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Glue ETL job, and an AWS Glue Data catalog to search and discover metadata.
B
Use an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Batch job, and an external Apache Hive metastore to search and discover metadata.
C
Use an AWS Glue crawler to crawl S3 data, an Amazon CloudWatch alarm to trigger an AWS Batch job, and an AWS Glue Data Catalog to search and discover metadata.
D
Use an AWS Glue crawler to crawl S3 data, an Amazon CloudWatch alarm to trigger an AWS Glue ETL job, and an external Apache Hive metastore to search and discover metadata.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 64

A data scientist is using the Amazon SageMaker Neural Topic Model (NTM) algorithm to build a model that recommends tags from blog posts. The raw blog post data is stored in an Amazon S3 bucket in JSON format. During model evaluation, the data scientist discovered that the model recommends certain stopwords such as "a," "an," and "the" as tags to certain blog posts, along with a few rare words that are present only in certain blog entries. After a few iterations of tag review with the content team, the data scientist notices that the rare words are unusual but feasible. The data scientist also must ensure that the tag recommendations of the generated model do not include the stopwords.

What should the data scientist do to meet these requirements?

A
Use the Amazon Comprehend entity recognition API operations. Remove the detected words from the blog post data. Replace the blog post data source in the S3 bucket.
B
Run the SageMaker built-in principal component analysis (PCA) algorithm with the blog post data from the S3 bucket as the data source. Replace the blog post data in the S3 bucket with the results of the training job.
C
Use the SageMaker built-in Object Detection algorithm instead of the NTM algorithm for the training job to process the blog post data.
D
Remove the stopwords from the blog post data by using the CountVectorizer function in the scikit-learn library. Replace the blog post data in the S3 bucket with the results of the vectorizer.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 65

A company's Machine Learning Specialist needs to improve the training speed of a time-series forecasting

model using TensorFlow. The training is currently implemented on a single-GPU machine and takes approximately 23 hours to complete. The training needs to be run daily.

The model accuracy is acceptable, but the company anticipates a continuous increase in the size of the training data and a need to update the model on an hourly, rather than a daily, basis. The company also wants to minimize coding effort and infrastructure changes.

What should the Machine Learning Specialist do to the training solution to allow it to scale for future demand?

A
Do not change the TensorFlow code. Change the machine to one with a more powerful GPU to speed up the training.
B
Change the TensorFlow code to implement a Horovod distributed framework supported by Amazon SageMaker. Parallelize the training to as many machines as needed to achieve the business goals.
C
Switch to using a built-in AWS SageMaker DeepAR model. Parallelize the training to as many machines as needed to achieve the business goals.
D
Move the training to Amazon EMR and distribute the workload to as many machines as needed to achieve the business goals.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 66

A machine learning (ML) specialist wants to create a data preparation job that uses a PySpark script with complex window aggregation operations to create data for training and testing. The ML specialist needs to evaluate the impact of the number of features and the sample count on model performance.

Which approach should the ML specialist use to determine the ideal data transformations for the model?

A
Add an Amazon SageMaker Debugger hook to the script to capture key metrics. Run the script as an AWS Glue job.
B
Add an Amazon SageMaker Experiments tracker to the script to capture key metrics. Run the script as an AWS Glue job.
C
Add an Amazon SageMaker Debugger hook to the script to capture key parameters. Run the script as a SageMaker processing job.
D
Add an Amazon SageMaker Experiments tracker to the script to capture key parameters. Run the script as a SageMaker processing job.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 67

Which of the following metrics should a Machine Learning Specialist generally use to compare/evaluate machine learning classification models against each other?

A
Recall
B
Misclassification rate
C
Mean absolute percentage error (MAPE)
D
Area Under the ROC Curve (AUC)

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 68

A developer wants to build an application that detects when customers enter personally identifiable information (PII). such as bank account numbers. into a customer survey before those responses are saved into a third-party database as records. The survey responses allow 100 words maximum and are less than 1 KB in size. The developer has never built a machine learning (ML) model before and wants a solution that requires the least development effort to build.

Which solution will meet these requirements with the LEAST development effort?

A
Use a subset of the survey responses to train an Amazon Comprehend custom classifier to determine which documents contain PII data.
B
Use a subset of the survey responses to train an Amazon Comprehend custom entity recognition to identify PII data in the survey responses.
C
Send the survey responses to the Amazon Comprehend DetectPiiEntities API to identify PII data in the survey responses.
D
Send a subset of the survey responses to the Amazon Comprehend DetectPiiEntities API to identify PII data in the survey responses.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 69

A company is running a machine learning prediction service that generates 100 TB of predictions every day. A Machine Learning Specialist must generate a visualization of the daily precision-recall curve from the predictions, and forward a read-only version to the Business team.

Which solution requires the LEAST coding effort?

A
Run a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3. Give the Business team read-only access to S3.
B
Generate daily precision-recall data in Amazon QuickSight, and publish the results in a dashboard shared with the Business team.
C
Run a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3. Visualize the arrays in Amazon QuickSight, and publish them in a dashboard shared with the Business team.
D
Generate daily precision-recall data in Amazon ES, and publish the results in a dashboard shared with the Business team.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 70

A data science team is working with a tabular dataset that the team stores in Amazon S3. The team wants to experiment with different feature transformations such as categorical feature encoding. Then the team wants to visualize the resulting distribution of the dataset. After the team finds an appropriate set of feature transformations. The team wants to automate the workflow for feature transformations.

Which solution will meet these requirements with the MOST operational efficiency?

A
Use Amazon SageMaker Data Wrangler preconfigured transformations to explore feature transformations. Use SageMaker Data Wrangler templates for visualization. Export the feature processing workflow to a SageMaker pipeline for automation.
B
Use an Amazon SageMaker notebook instance to experiment with different feature transformations. Save the transformations to Amazon S3. Use Amazon QuickSight for visualization. Package the feature processing steps into an AWS Lambda function for automation.
C
Use AWS Glue Studio with custom code to experiment with different feature transformations. Save the transformations to Amazon S3. use Amazon QuickSight for visualization. Package the feature processing steps into an AWS Lambda function for automation.
D
Use Amazon SageMaker Data Wrangler preconfigured transformations to experiment with different feature transformations. Save the transformations to Amazon S3. Use Amazon QuickSight for visualization. Package each feature transformation step into a separate AWS Lambda function. Use AWS Step Functions for workflow automation.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 71

A Machine Learning Specialist is required to build a supervised image-recognition model to identify a cat. The ML Specialist performs some tests and records the following results for a neural network-based image classifier:

Total number of images available = 1,000 Test set images = 100 (constant test set)

The ML Specialist notices that, in over 75% of the misclassified images, the cats were held upside down by their owners.

Which techniques can be used by the ML Specialist to improve this specific test error?

A
Increase the training data by adding variation in rotation for training images.
B
Increase the number of epochs for model training
C
Increase the number of layers for the neural network.
D
Increase the dropout rate for the second-to-last layer.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 72

A data engineer is evaluating customer data in Amazon SageMaker Data Wrangler. The data engineer will use the customer data to create a new model to predict customer behavior.

The engineer needs to increase the model performance by checking for multicollineanty in the dataset

Which steps can the data engineer take to accomplish this with the LEAST operational effort? (Select TWO.)

A
Use SageMaker Data Wrangler to refit and transform the dataset by applying one-hot encoding to category-based variables.
B
Use SageMaker Data Wrangler diagnostic visualization. use principal components analysis (PCA) and singular value decomposition (SVD) to calculate singular values.
C
Use the SageMaker Data Wrangler Quick Model visualization to quickly evaluate the dataset and to produce importance scores for each feature.
D
Use the SageMaker Data Wrangler Min Max Scaler transform to normalize the data.
E
Use SageMaker Data Wrangler diagnostic visualization. Use least absolute shrinkage and selection operator (LASSO) to plot coefficient values from a LASSO model that is trained on the dataset.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 73

A Machine Learning Specialist needs to be able to ingest streaming data and store it in Apache Parquet files for exploration and analysis.

Which of the following services would both ingest and store this data in the correct format?

A
AWS DMS
B
Amazon Kinesis Data Streams
C
Amazon Kinesis Data Firehose
D
Amazon Kinesis Data Analytics

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 74

A company hosts a public web application on AWS. The application provides a user feedback feature that consists of free-text fields where users can submit text to provide feedback. The company receives a large amount of free-text user feedback from the online web application. The product managers at the company classify the feedback into a set of fixed categories including user interface issues. performance issues, new feature request, and chat issues for further actions by the company's engineering teams.

A machine learning (ML) engineer at the company must automate the classification of new user feedback into these fixed categories by using Amazon SageMaker A large set of accurate data is available from the historical user feedback that the product managers previously classified.

Which solution should the ML engineer apply to perform multi-class text classification of the user feedback?

A
Use the SageMaker Latent Dirichlet Allocation (LDA) algorithm.
B
Use the SageMaker BlazingText algorithm.
C
Use the SageMaker Neural Topic Model (NTM) algorithm.
D
Use the SageMaker CatBoost algorithm.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 75

A Data Scientist is developing a machine learning model to classify whether a financial transaction is fraudulent. The labeled data available for training consists of 100,000 non-fraudulent observations and 1,000 fraudulent observations.

The Data Scientist applies the XGBoost algorithm to the data, resulting in the following confusion matrix when the trained model is applied to a previously unseen validation dataset. The accuracy of the model is 99.1%, but the Data Scientist has been asked to reduce the number of false negatives.

Which combination of steps should the Data Scientist take to reduce the number of false positive predictions by the model? (Choose two.)

A
Change the XGBoost eval_metric parameter to optimize based on rmse instead of error.
B
Increase the XGBoost scale_pos_weight parameter to adjust the balance of positive and negative weights.
C
Increase the XGBoost max_depth parameter because the model is currently underfitting the data.
D
Change the XGBoost eval_metric parameter to optimize based on AUC instead of error.
E
Decrease the XGBoost max_depth parameter because the model is currently overfitting the data.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 76

A company is building a new supervised classification model in an AWS environment. The company's data science team notices that the dataset has a large quantity of variables. All the variables are numeric.

The model accuracy for training and validation is low. The model's processing time is affected by high latency. The data science team needs to Increase the accuracy of the model and decrease the processing time.

What should the data science team do to meet these requirements?

A
Create new features and interaction variables.
B
Use a principal component analysis (PCA) model.
C
Apply normalization on the feature set.
D
Use a multiple correspondence analysis (MCA) model.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 77

A Machine Learning Specialist is assigned a TensorFlow project using Amazon SageMaker for training, and needs to continue working for an extended period with no Wi-Fi access.

Which approach should the Specialist use to continue working?

A
Install Python 3 and boto3 on their laptop and continue the code development using that environment.
B
Download the TensorFlow Docker container used in Amazon SageMaker from GitHub to their local environment, and use the Amazon SageMaker Python SDK to test the code.
C
Download TensorFlow from tensorflow.org to emulate the TensorFlow kernel in the SageMaker environment.
D
Download the SageMaker notebook to their local environment, then install Jupyter Notebooks on their laptop and continue the development in a local notebook.

Premium Solution Locked

Unlock all 390 answers & explanations

QUESTION 78

A company's data scientist has built a machine learning (ML) classification system that can determine whether the company's promotional items are present in an image. The company wants the ML classification system to also determine how many times each type of promotional item appears in an image and the exact position of each item in the image.

Which solution will provide all the annotation data that the data scientist needs to train a supervised model to accomplish this task?

A
For each image. use a JSON file with an array of objects that contain bounding box coordinates and the item name for each occurrence of a promotional item in the image.
B
For each image. use a TEXT file with one promotional item per line for each occurrence. Order the file from left to right.
C
For each image, use a CSV file with one promotional item per line and three columns: one column for the promotional item name. one column for the object width. and one column for the object height.
D
For each image. use a JSONL file with one promotional item per line for each occurrence in the image. Order the file by the item name.

Premium Solution Locked

Unlock all 390 answers & explanations

Full Question Bank Locked

You have reached the end of the free study guide preview. Upgrade now to unlock all 390 questions and the full simulation engine.

Customer Reviews

5 / 5
(15,000+ verified)
5
100%
4
0%
3
0%
2
0%
1
0%

Global Community Feedback

DM

David M.

Verified Student

"The practice engine is incredible. It feels exactly like the real testing environment and helped me build so much confidence."

SJ

Sarah J.

Premium Member

"The PDF is very well organized and the explanations for the answers are actually helpful, not just random text."

MC

Michael C.

Verified Buyer

"I was skeptical, but the content is high quality and definitely worth the price. I passed on my first try!"

Need Assistance?

Our expert support team is available to assist you with any inquiries about our exam materials.

Contact Support
Average response: < 24 Hours

Get Exam Updates

Subscribe to receive instant notifications on new questions and exclusive flash sales.

* Join 5,000+ students getting weekly updates

Support Chat โ— Active Now

๐Ÿ‘‹ Hi! How can we help you pass your exam?

Enter email to start chatting