Scale SageMaker Workloads with Application Auto Scaling

Scale a SageMaker endpoint with the ACK Application Auto Scaling service controller

The Application Auto Scaling ACK service controller makes it easier for developers to automatically scale resources for individual AWS services. Application Auto Scaling allows you to configure automatic scaling for resources such as Amazon SageMaker endpoint variants.

In this tutorial, we will use the Application Auto Scaling ACK service controller in conjunction with the SageMaker ACK service controller to automatically scale a deployed machine learning model.


Although it is not necessary to use Amazon Elastic Kubernetes Service (Amazon EKS) with ACK, this guide assumes that you have access to an Amazon EKS cluster. If this is your first time creating an Amazon EKS cluster, see Amazon EKS Setup. For automated cluster creation using eksctl, see Getting started with Amazon EKS - eksctl and create your cluster with Amazon EC2 Linux managed nodes.

This guide also assumes that you have a trained machine learning model that you are ready to dynamically scale with the Application Auto Scaling ACK service controller. To train a machine learning model using the SageMaker ACK service controller, see Machine Learning with the ACK Service Controller and return to this guide when you have successfully completed a SageMaker training job.


This guide assumes that you have:

  • Created an EKS cluster with Kubernetes version 1.16 or higher.
  • AWS IAM permissions to create roles and attach policies to roles.
  • A trained machine learning model that you want to scale dynamically.
  • Installed the following tools on the client machine used to access your Kubernetes cluster:
    • AWS CLI - A command line tool for interacting with AWS services.
    • kubectl - A command line tool for working with Kubernetes clusters.
    • eksctl - A command line tool for working with EKS clusters.
    • yq - A command line tool for YAML processing. (For Linux environments, use the wget plain binary installation)
    • Helm 3.7+ - A tool for installing and managing Kubernetes applications.
    • curl - A command line tool for transmitting data with URLs.

Configure IAM permissions

Create an IAM role and attach an IAM policy to that role to ensure that your Application Auto Scaling service controller has access to the appropriate AWS resources. First, check to make sure that you are connected to an Amazon EKS cluster.

aws eks update-kubeconfig --name $CLUSTER_NAME --region $SERVICE_REGION
kubectl config current-context
kubectl get nodes

Before you can deploy your ACK service controllers using an IAM role, associate an OpenID Connect (OIDC) provider with your IAM role to authenticate your cluster with the IAM service.

eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} \
--region ${SERVICE_REGION} --approve

Get the following OIDC information for future reference:

export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
export OIDC_PROVIDER_URL=$(aws eks describe-cluster --name $CLUSTER_NAME --region $SERVICE_REGION \
--query "cluster.identity.oidc.issuer" --output text | cut -c9-)

In your working directory, create a file named trust.json using the following trust relationship code block:

printf '{
  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::'$AWS_ACCOUNT_ID':oidc-provider/'$OIDC_PROVIDER_URL'"
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "'$OIDC_PROVIDER_URL':aud": "",
          "'$OIDC_PROVIDER_URL':sub": [
' > ./trust.json

Updating an Application Auto Scaling Scalable Target requires additional permissions. First, create a service-linked role for Application Auto Scaling.

 aws iam create-service-linked-role --aws-service-name

Create a file named pass_role_policy.json to create the policy required for the IAM role.

printf '{
  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "arn:aws:iam::'$AWS_ACCOUNT_ID':role/aws-service-role/"
' > ./pass_role_policy.json

Run the iam create-role command to create an IAM role with the trust relationship you just defined in trust.json. This IAM role enables the Amazon EKS cluster to get and refresh credentials from IAM.

export OIDC_ROLE_NAME=ack-controller-role-$CLUSTER_NAME
aws --region $SERVICE_REGION iam create-role --role-name $OIDC_ROLE_NAME --assume-role-policy-document file://trust.json

Attach the AmazonSageMakerFullAccess Policy to the IAM Role to ensure that your SageMaker service controller has access to the appropriate resources.

aws --region $SERVICE_REGION iam attach-role-policy --role-name $OIDC_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess

Attach the iam:PassRole policy required for updating an Application Auto Scaling Scalable Target.

aws iam put-role-policy --role-name $OIDC_ROLE_NAME --policy-name "iam-pass-role-policy" --policy-document file://pass_role_policy.json

Get the following IAM role information for future reference:

export IAM_ROLE_ARN_FOR_IRSA=$(aws --region $SERVICE_REGION iam get-role --role-name $OIDC_ROLE_NAME --output text --query 'Role.Arn')

For more information on authorization and access for ACK service controllers, including details regarding recommended IAM policies, see Configure Permissions.

Install the Application Auto Scaling ACK service controller

Get the Application Auto Scaling Helm chart and make it available on the client machine with the following commands:

export SERVICE=applicationautoscaling
export RELEASE_VERSION=$(curl -sL${SERVICE}-controller/releases/latest | jq -r '.tag_name | ltrimstr("v")')

if [[ -z "$RELEASE_VERSION" ]]; then

export CHART_EXPORT_PATH=/tmp/chart
export CHART_REF=$SERVICE-chart


helm pull oci://$CHART_REPO --version $RELEASE_VERSION -d $CHART_EXPORT_PATH

Update the Helm chart values for a cluster-scoped installation.

# Update the following values in the Helm chart
yq e '.aws.region = env(SERVICE_REGION)' -i values.yaml
yq e '.serviceAccount.annotations."" = env(IAM_ROLE_ARN_FOR_IRSA)' -i values.yaml
cd -

Install the relevant custom resource definitions (CRDs) for the Application Auto Scaling ACK service controller.

kubectl apply -f $CHART_EXPORT_PATH/$SERVICE-chart/crds

Create a namespace and install the Application Auto Scaling ACK service controller with the Helm chart.

export ACK_K8S_NAMESPACE=ack-system
helm install -n $ACK_K8S_NAMESPACE --create-namespace --skip-crds ack-$SERVICE-controller \

Verify that the CRDs and Helm charts were deployed with the following commands:

kubectl get pods -A | grep applicationautoscaling
kubectl get crd | grep applicationautoscaling

To scale a SageMaker endpoint variant with the Application Auto Scaling ACK service controller, you will also need the SageMaker ACK service controller. For step-by-step installation instructions see Install the SageMaker ACK Service Controller.

Prepare your pretrained model

To scale a SageMaker endpoint with Application Auto Scaling, we first need a pretrained model in an S3 bucket. For this example, we will be using a pretrained XGBoost model.

First, create a variable for the S3 bucket:

export ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
export SAGEMAKER_BUCKET=ack-sagemaker-bucket-$ACCOUNT_ID

Then, create a file named with the following code block:

printf '
#!/usr/bin/env bash
# Create the S3 bucket
if [[ $SERVICE_REGION != "us-east-1" ]]; then
  aws s3api create-bucket --bucket "$SAGEMAKER_BUCKET" --region "$SERVICE_REGION" --create-bucket-configuration LocationConstraint="$SERVICE_REGION"
  aws s3api create-bucket --bucket "$SAGEMAKER_BUCKET" --region "$SERVICE_REGION"
fi' > ./

Run the script to create an S3 bucket.

chmod +x

Get the pretrained model and copy it into your S3 bucket.

aws s3 cp xgb-churn-prediction-model.tar.gz s3://$SAGEMAKER_BUCKET

Configure permissions for your SageMaker endpoint

The SageMaker endpoint that we deploy will need an IAM role to access Amazon S3 and Amazon SageMaker. Run the following commands to create a SageMaker execution IAM role that will be used by SageMaker to access the appropriate AWS resources:

export SAGEMAKER_EXECUTION_ROLE_NAME=ack-sagemaker-execution-role-$ACCOUNT_ID

TRUST="{ \"Version\": \"2012-10-17\", \"Statement\": [ { \"Effect\": \"Allow\", \"Principal\": { \"Service\": \"\" }, \"Action\": \"sts:AssumeRole\" } ] }"
aws iam create-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --assume-role-policy-document "$TRUST"
aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess

SAGEMAKER_EXECUTION_ROLE_ARN=$(aws iam get-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --output text --query 'Role.Arn')


Deploy a SageMaker endpoint

Use the SageMaker ACK service controller to create a model, an endpoint configuration, and an endpoint.

export MODEL_NAME=ack-xgboost-model-$RANDOM_VAR
export ENDPOINT_CONFIG_NAME=ack-xgboost-endpoint-config-$RANDOM_VAR
export ENDPOINT_NAME=ack-xgboost-endpoint-$RANDOM_VAR
Change XGBoost image URI based on region
IMPORTANT: If your SERVICE_REGION is not us-east-1, you must change the XGBOOST_IMAGE URI. To find your region-specific XGBoost image URI, choose your region in the SageMaker Docker Registry Paths page, and then select XGBoost (algorithm). For this example, use version 1.2-1.

Use the following deploy.yaml file to deploy the model on an ml.m5.large instance. To use your own model, change the modelDataURL value.

printf '
kind: Model
  name: '$MODEL_NAME'
  modelName: '$MODEL_NAME'
    containerHostname: xgboost
    # The source of the model data
    modelDataURL: s3://'$SAGEMAKER_BUCKET'/xgb-churn-prediction-model.tar.gz
    image: '$XGBOOST_IMAGE'
kind: EndpointConfig
  endpointConfigName: '$ENDPOINT_CONFIG_NAME'
  - modelName: '$MODEL_NAME'
    variantName: AllTraffic
    instanceType: ml.m5.large
    initialInstanceCount: 1
kind: Endpoint
  name: '$ENDPOINT_NAME'
  endpointName: '$ENDPOINT_NAME'
  endpointConfigName: '$ENDPOINT_CONFIG_NAME'
' > ./deploy.yaml

Deploy the endpoint by applying the deploy.yaml file.

kubectl apply -f deploy.yaml

After applying the deploy.yaml file, you should see that the model, endpoint configuration, and endpoint were successfully created. created created created

Watch the process with the kubectl get command. Deploying the endpoint may take some time.

kubectl get endpoints.sagemaker --watch

The endpoint status will be InService when the endpoint is successfully deployed and ready for use.

NAME                        ENDPOINTSTATUS
ack-xgboost-endpoint-7420   Creating         
ack-xgboost-endpoint-7420   InService    

Automatically scale your SageMaker endpoint

Scale your SageMaker endpoint using the Application Auto Scaling ScalableTarget and ScalingPolicy resources.

Create a scalable target

Create a scalable target with the scalable-target.yaml file. The following file designates that a specified SageMaker endpoint variant can automatically scale to up to three instances.

printf '
kind: ScalableTarget
  name: ack-tutorial-endpoint-scalable-target
  maxCapacity: 3
  minCapacity: 1
  resourceID: endpoint/'$ENDPOINT_NAME'/variant/AllTraffic
  scalableDimension: "sagemaker:variant:DesiredInstanceCount"
  serviceNamespace: sagemaker
 ' > ./scalable-target.yaml

Apply your scalable-target.yaml file:

kubectl apply -f scalable-target.yaml

After applying your scalable target, you should see the following output: created

You can verify that the ScalableTarget was created with the kubectl describe command.

kubectl describe scalabletarget.applicationautoscaling

Create a scaling policy

Create a scaling policy with the scaling-policy.yaml file. The following file creates a target tracking scaling policy that scales a specified SageMaker endpoint based on the number of variant invocations per instance. The scaling policy adds or removes capacity as required to keep this number close to the target value of 60.

printf '
kind: ScalingPolicy
  name: ack-tutorial-endpoint-scaling-policy
  policyName: ack-tutorial-endpoint-scaling-policy
  policyType: TargetTrackingScaling
  resourceID: endpoint/'$ENDPOINT_NAME'/variant/AllTraffic
  scalableDimension: "sagemaker:variant:DesiredInstanceCount"
  serviceNamespace: sagemaker
    targetValue: 60
    scaleInCooldown: 700
    scaleOutCooldown: 300
        predefinedMetricType: SageMakerVariantInvocationsPerInstance
 ' > ./scaling-policy.yaml

Apply your scaling-policy.yaml file:

kubectl apply -f scaling-policy.yaml

After applying your scaling policy, you should see the following output: created

You can verify that the ScalingPolicy was created with the kubectl describe command.

kubectl describe scalingpolicy.applicationautoscaling

Next steps

To learn more about Application Auto Scaling on a SageMaker endpoint, see the Application Auto Scaling controller samples repository.


To update the ScalableTarget and ScalingPolicy parameters after the resources are created, make any changes to the scalable-target.yaml or scaling-policy.yaml files and reapply them with kubectl apply.

kubectl apply -f scalable-target.yaml
kubectl apply -f scaling-policy.yaml.yaml


You can delete your training jobs, endpoints, scalable targets, and scaling policies with the kubectl delete command.

kubectl delete -f deploy.yaml
kubectl delete -f scalable-target.yaml
kubectl delete -f scaling-policy.yaml

To remove the SageMaker and Application Auto Scaling ACK service controllers, related CRDs, and namespaces see ACK Cleanup.

It is recommended to delete any additional resources such as S3 buckets, IAM roles, and IAM policies when you no longer need them. You can delete these resources with the following commands or directly in the AWS console.

# Delete S3 bucket
aws s3 rb s3://$SAGEMAKER_BUCKET --force

# Delete SageMaker execution role
aws iam detach-role-policy --role-name $SAGEMAKER_EXECUTION_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam detach-role-policy --role-name $SAGEMAKER_EXECUTION_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
aws iam delete-role --role-name $SAGEMAKER_EXECUTION_ROLE_NAME

# Delete IAM role created for IRSA
aws iam detach-role-policy --role-name $OIDC_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam delete-role --role-name $OIDC_ROLE_NAME

To delete your EKS clusters, see Amazon EKS - Deleting a cluster.

Edit this page on GitHub