SageMaker Capabilities

2 min readJan 9, 2022

Automatic Model Tuning

SageMaker find the optimal hyperparameters by spinning “Hyperparameter Tuning Job” that leads to Job creating lots of training instances based on specified hyperparameters and ranges.

Do’s and Don’ts
A. Don’t optimizes too many hyperparameters
B. Limit ranges to as small as possible
C. Use logarithmic scales when possible

SageMaker integration with Apache Spark

Apache Spark is extremally good for pre-processing data for model training. SageMaker provide Sagemaker-spark library

How to
Connect Sagemaker notebook to remote EMR cluster running spark -> get generated DataFrames -> Call SageMakerEstimator Fit to generate a model-> call SageMaker transform to make inferences

SageMaker Studio

Visual IDE for machine learning.

Create and share Jupyter notebook with SageMaker Studio
Switch between hardware configuration
SageMaker Experiments —Organize, capture, compare, and search ML jobs

SageMaker Debugger

It saves interval model states during training at regular intervals

Saves gradients, tensors over time during training
Define rule to capture unwanted conditions. each rule create a debugger job
Sends logs to CloudWatch, where events can be created for further actions
SageMaker Debugger Insight dashboard,
auto generated training reports and built-in rules like monitoring system bottlenecks(CPU, GPU Memory), profile model framework metrics ( Max Initialization Time, Overall Framework metrics, Step outlier, system usage), debug model parameter
Built-in actions like StopTraining(), Email(), or SMS(). Integration with SNS
Supported framework — TensorFlow, PyTorch, MXNet, XGBoost, SageMaker generic estimator
Debugger APIs for further integration. ( construct hooks and rules). SMDebug client library

SageMaker Autopilot

Automate model selection, data processing, model tuning, infrastructure selection.

Load Data to S3 -> select target column for prediction-> Autopilot creates model leaderboard with model recommendation-> pick a model-> autopilot creates the model in notebook where it can be tweaked
Problem types —binary or multiclass classification and Regression
Algorith types — Linear regression, XGBoost, Deep Learning (MLP)
Data files must be tabular CSV
Integrate with SageMaker Clarify to identify biases and provide transparency how model arrive at a result (assign each feature an importance value for a prediction)

SageMaker Model Monitor

Get alerts on quality deviations on deployed models via CloudWatch
Visualize data quality drift based on Normalized Discounter Cumulative Gain (NDCG), Bias drift
Detect anomalies and outliers
Detect new feature arriving in new data
Create monitoring job via Monitoring Schedule
Integrate with TensorBoard, QuickSight, Tableau
Integrate with Ground Truth

SageMaker JumpStart — select from over 150 open source models

SageMaker Data Wrangler — import, transform, analyze, export

SageMaker Feature Stores — find, discover and share features, online and offline modes, features are organize in groups

SageMaker Edge Manager — agent for edge devices. optimized with SageMaker Neo. Collects sample data from devices for monitoring, labeling and retaining

SageMaker Capabilities

Written by rememberme

No responses yet