SageMaker Capabilities
Automatic Model Tuning
SageMaker find the optimal hyperparameters by spinning “Hyperparameter Tuning Job” that leads to Job creating lots of training instances based on specified hyperparameters and ranges.
Do’s and Don’ts
A. Don’t optimizes too many hyperparameters
B. Limit ranges to as small as possible
C. Use logarithmic scales when possible
SageMaker integration with Apache Spark
Apache Spark is extremally good for pre-processing data for model training. SageMaker provide Sagemaker-spark library
How to
Connect Sagemaker notebook to remote EMR cluster running spark -> get generated DataFrames -> Call SageMakerEstimator Fit to generate a model-> call SageMaker transform to make inferences
SageMaker Studio
Visual IDE for machine learning.
Create and share Jupyter notebook with SageMaker Studio
Switch between hardware configuration
SageMaker Experiments —Organize, capture, compare, and search ML jobs
SageMaker Debugger
It saves interval model states during training at regular intervals
Saves gradients, tensors over time during training
Define rule to capture unwanted conditions. each rule create a debugger job
Sends logs to CloudWatch, where events can be created for further actions
SageMaker Debugger Insight dashboard,
auto generated training reports and built-in rules like monitoring system bottlenecks(CPU, GPU Memory), profile model framework metrics ( Max Initialization Time, Overall Framework metrics, Step outlier, system usage), debug model parameter
Built-in actions like StopTraining(), Email(), or SMS(). Integration with SNS
Supported framework — TensorFlow, PyTorch, MXNet, XGBoost, SageMaker generic estimator
Debugger APIs for further integration. ( construct hooks and rules). SMDebug client library
SageMaker Autopilot
Automate model selection, data processing, model tuning, infrastructure selection.
Load Data to S3 -> select target column for prediction-> Autopilot creates model leaderboard with model recommendation-> pick a model-> autopilot creates the model in notebook where it can be tweaked
Problem types —binary or multiclass classification and Regression
Algorith types — Linear regression, XGBoost, Deep Learning (MLP)
Data files must be tabular CSV
Integrate with SageMaker Clarify to identify biases and provide transparency how model arrive at a result (assign each feature an importance value for a prediction)
SageMaker Model Monitor
Get alerts on quality deviations on deployed models via CloudWatch
Visualize data quality drift based on Normalized Discounter Cumulative Gain (NDCG), Bias drift
Detect anomalies and outliers
Detect new feature arriving in new data
Create monitoring job via Monitoring Schedule
Integrate with TensorBoard, QuickSight, Tableau
Integrate with Ground Truth
SageMaker JumpStart — select from over 150 open source models
SageMaker Data Wrangler — import, transform, analyze, export
SageMaker Feature Stores — find, discover and share features, online and offline modes, features are organize in groups
SageMaker Edge Manager — agent for edge devices. optimized with SageMaker Neo. Collects sample data from devices for monitoring, labeling and retaining