SageMaker Capabilities

rememberme
2 min readJan 9, 2022

--

Automatic Model Tuning

SageMaker find the optimal hyperparameters by spinning “Hyperparameter Tuning Job” that leads to Job creating lots of training instances based on specified hyperparameters and ranges.

Do’s and Don’ts

A. Don’t optimizes too many hyperparameters

B. Limit ranges to as small as possible

C. Use logarithmic scales when possible

SageMaker integration with Apache Spark

Apache Spark is extremally good for pre-processing data for model training. SageMaker provide Sagemaker-spark library

How to

Connect Sagemaker notebook to remote EMR cluster running spark -> get generated DataFrames -> Call SageMakerEstimator Fit to generate a model-> call SageMaker transform to make inferences

SageMaker Studio

Visual IDE for machine learning.

Create and share Jupyter notebook with SageMaker Studio

Switch between hardware configuration

SageMaker Experiments —Organize, capture, compare, and search ML jobs

SageMaker Debugger

It saves interval model states during training at regular intervals

Saves gradients, tensors over time during training

Define rule to capture unwanted conditions. each rule create a debugger job

Sends logs to CloudWatch, where events can be created for further actions

SageMaker Debugger Insight dashboard,

auto generated training reports and built-in rules like monitoring system bottlenecks(CPU, GPU Memory), profile model framework metrics ( Max Initialization Time, Overall Framework metrics, Step outlier, system usage), debug model parameter

Built-in actions like StopTraining(), Email(), or SMS(). Integration with SNS

Supported framework — TensorFlow, PyTorch, MXNet, XGBoost, SageMaker generic estimator

Debugger APIs for further integration. ( construct hooks and rules). SMDebug client library

SageMaker Autopilot

Automate model selection, data processing, model tuning, infrastructure selection.

Load Data to S3 -> select target column for prediction-> Autopilot creates model leaderboard with model recommendation-> pick a model-> autopilot creates the model in notebook where it can be tweaked

Problem types —binary or multiclass classification and Regression

Algorith types — Linear regression, XGBoost, Deep Learning (MLP)

Data files must be tabular CSV

Integrate with SageMaker Clarify to identify biases and provide transparency how model arrive at a result (assign each feature an importance value for a prediction)

SageMaker Model Monitor

Get alerts on quality deviations on deployed models via CloudWatch

Visualize data quality drift based on Normalized Discounter Cumulative Gain (NDCG), Bias drift

Detect anomalies and outliers

Detect new feature arriving in new data

Create monitoring job via Monitoring Schedule

Integrate with TensorBoard, QuickSight, Tableau

Integrate with Ground Truth

SageMaker JumpStart — select from over 150 open source models

SageMaker Data Wrangler — import, transform, analyze, export

SageMaker Feature Stores — find, discover and share features, online and offline modes, features are organize in groups

SageMaker Edge Manager — agent for edge devices. optimized with SageMaker Neo. Collects sample data from devices for monitoring, labeling and retaining

--

--

No responses yet