HyperParameterTuningJob

sagemaker.services.k8s.aws/v1alpha1

TypeLink
GoDocsagemaker-controller/apis/v1alpha1#HyperParameterTuningJob

Metadata

PropertyValue
ScopeNamespaced
KindHyperParameterTuningJob
ListKindHyperParameterTuningJobList
Pluralhyperparametertuningjobs
Singularhyperparametertuningjob

Spec

hyperParameterTuningJobConfig: 
  hyperParameterTuningJobObjective: 
    metricName: string
    type_: string
  parameterRanges: 
    categoricalParameterRanges:
    - name: string
      values:
      - string
    continuousParameterRanges:
    - maxValue: string
      minValue: string
      name: string
      scalingType: string
    integerParameterRanges:
    - maxValue: string
      minValue: string
      name: string
      scalingType: string
  resourceLimits: 
    maxNumberOfTrainingJobs: integer
    maxParallelTrainingJobs: integer
  strategy: string
  trainingJobEarlyStoppingType: string
  tuningJobCompletionCriteria: 
    targetObjectiveMetricValue: number
hyperParameterTuningJobName: string
tags:
- key: string
  value: string
trainingJobDefinition: 
  algorithmSpecification: 
    algorithmName: string
    metricDefinitions:
    - name: string
      regex: string
    trainingImage: string
    trainingInputMode: string
  checkpointConfig: 
    localPath: string
    s3URI: string
  definitionName: string
  enableInterContainerTrafficEncryption: boolean
  enableManagedSpotTraining: boolean
  enableNetworkIsolation: boolean
  hyperParameterRanges: 
    categoricalParameterRanges:
    - name: string
      values:
      - string
    continuousParameterRanges:
    - maxValue: string
      minValue: string
      name: string
      scalingType: string
    integerParameterRanges:
    - maxValue: string
      minValue: string
      name: string
      scalingType: string
  inputDataConfig:
  - channelName: string
    compressionType: string
    contentType: string
    dataSource: 
      fileSystemDataSource: 
        directoryPath: string
        fileSystemAccessMode: string
        fileSystemID: string
        fileSystemType: string
      s3DataSource: 
        attributeNames:
        - string
        instanceGroupNames:
        - string
        s3DataDistributionType: string
        s3DataType: string
        s3URI: string
    inputMode: string
    recordWrapperType: string
    shuffleConfig: 
      seed: integer
  outputDataConfig: 
    kmsKeyID: string
    s3OutputPath: string
  resourceConfig: 
    instanceCount: integer
    instanceGroups:
    - instanceCount: integer
      instanceGroupName: string
      instanceType: string
    instanceType: string
    keepAlivePeriodInSeconds: integer
    volumeKMSKeyID: string
    volumeSizeInGB: integer
  retryStrategy: 
    maximumRetryAttempts: integer
  roleARN: string
  staticHyperParameters: {}
  stoppingCondition: 
    maxRuntimeInSeconds: integer
    maxWaitTimeInSeconds: integer
  tuningObjective: 
    metricName: string
    type_: string
  vpcConfig: 
    securityGroupIDs:
    - string
    subnets:
    - string
trainingJobDefinitions:
  algorithmSpecification: 
    algorithmName: string
    metricDefinitions:
    - name: string
      regex: string
    trainingImage: string
    trainingInputMode: string
  checkpointConfig: 
    localPath: string
    s3URI: string
  definitionName: string
  enableInterContainerTrafficEncryption: boolean
  enableManagedSpotTraining: boolean
  enableNetworkIsolation: boolean
  hyperParameterRanges: 
    categoricalParameterRanges:
    - name: string
      values:
      - string
    continuousParameterRanges:
    - maxValue: string
      minValue: string
      name: string
      scalingType: string
    integerParameterRanges:
    - maxValue: string
      minValue: string
      name: string
      scalingType: string
  inputDataConfig:
  - channelName: string
    compressionType: string
    contentType: string
    dataSource: 
      fileSystemDataSource: 
        directoryPath: string
        fileSystemAccessMode: string
        fileSystemID: string
        fileSystemType: string
      s3DataSource: 
        attributeNames:
        - string
        instanceGroupNames:
        - string
        s3DataDistributionType: string
        s3DataType: string
        s3URI: string
    inputMode: string
    recordWrapperType: string
    shuffleConfig: 
      seed: integer
  outputDataConfig: 
    kmsKeyID: string
    s3OutputPath: string
  resourceConfig: 
    instanceCount: integer
    instanceGroups:
    - instanceCount: integer
      instanceGroupName: string
      instanceType: string
    instanceType: string
    keepAlivePeriodInSeconds: integer
    volumeKMSKeyID: string
    volumeSizeInGB: integer
  retryStrategy: 
    maximumRetryAttempts: integer
  roleARN: string
  staticHyperParameters: {}
  stoppingCondition: 
    maxRuntimeInSeconds: integer
    maxWaitTimeInSeconds: integer
  tuningObjective: 
    metricName: string
    type_: string
  vpcConfig: 
    securityGroupIDs:
    - string
    subnets:
    - string
warmStartConfig: 
  parentHyperParameterTuningJobs:
  - hyperParameterTuningJobName: string
  warmStartType: string
FieldDescription
hyperParameterTuningJobConfig
Required
object
The HyperParameterTuningJobConfig object that describes the tuning job, including the search strategy, the objective metric used to evaluate training jobs, ranges of parameters to search, and resource limits for the tuning job. For more information, see How Hyperparameter Tuning Works (https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html).
hyperParameterTuningJobConfig.hyperParameterTuningJobObjective
Optional
object
Defines the objective metric for a hyperparameter tuning job. Hyperparameter tuning uses the value of this metric to evaluate the training jobs it launches, and returns the training job that results in either the highest or lowest value for this metric, depending on the value you specify for the Type parameter.
hyperParameterTuningJobConfig.hyperParameterTuningJobObjective.metricName
Optional
string
**hyperParameterTuningJobConfig.hyperParameterTuningJobObjective.type_**
Optional
string
hyperParameterTuningJobConfig.parameterRanges
Optional
object
Specifies ranges of integer, continuous, and categorical hyperparameters that a hyperparameter tuning job searches. The hyperparameter tuning job launches training jobs with hyperparameter values within these ranges to find the combination of values that result in the training job with the best performance as measured by the objective metric of the hyperparameter tuning job.
The maximum number of items specified for Array Members refers to the maximum number of hyperparameters for each range and also the maximum for the hyperparameter tuning job itself. That is, the sum of the number of hyperparameters for all the ranges can’t exceed the maximum number specified.
hyperParameterTuningJobConfig.parameterRanges.categoricalParameterRanges
Optional
array
hyperParameterTuningJobConfig.parameterRanges.categoricalParameterRanges.[]
Required
object
A list of categorical hyperparameters to tune.
hyperParameterTuningJobConfig.parameterRanges.categoricalParameterRanges.[].values
Optional
array
hyperParameterTuningJobConfig.parameterRanges.categoricalParameterRanges.[].values.[]
Required
string
hyperParameterTuningJobConfig.parameterRanges.continuousParameterRanges.[]
Required
object
A list of continuous hyperparameters to tune.
hyperParameterTuningJobConfig.parameterRanges.continuousParameterRanges.[].minValue
Optional
string
hyperParameterTuningJobConfig.parameterRanges.continuousParameterRanges.[].name
Optional
string
hyperParameterTuningJobConfig.parameterRanges.continuousParameterRanges.[].scalingType
Optional
string
hyperParameterTuningJobConfig.parameterRanges.integerParameterRanges
Optional
array
hyperParameterTuningJobConfig.parameterRanges.integerParameterRanges.[]
Required
object
For a hyperparameter of the integer type, specifies the range that a hyperparameter tuning job searches.
hyperParameterTuningJobConfig.parameterRanges.integerParameterRanges.[].minValue
Optional
string
hyperParameterTuningJobConfig.parameterRanges.integerParameterRanges.[].name
Optional
string
hyperParameterTuningJobConfig.parameterRanges.integerParameterRanges.[].scalingType
Optional
string
hyperParameterTuningJobConfig.resourceLimits
Optional
object
Specifies the maximum number of training jobs and parallel training jobs that a hyperparameter tuning job can launch.
hyperParameterTuningJobConfig.resourceLimits.maxNumberOfTrainingJobs
Optional
integer
hyperParameterTuningJobConfig.resourceLimits.maxParallelTrainingJobs
Optional
integer
hyperParameterTuningJobConfig.strategy
Optional
string
The strategy hyperparameter tuning uses to find the best combination of hyperparameters for your model.
hyperParameterTuningJobConfig.trainingJobEarlyStoppingType
Optional
string
hyperParameterTuningJobConfig.tuningJobCompletionCriteria
Optional
object
The job completion criteria.
hyperParameterTuningJobConfig.tuningJobCompletionCriteria.targetObjectiveMetricValue
Optional
number
hyperParameterTuningJobName
Required
string
The name of the tuning job. This name is the prefix for the names of all training jobs that this tuning job launches. The name must be unique within the same Amazon Web Services account and Amazon Web Services Region. The name must have 1 to 32 characters. Valid characters are a-z, A-Z, 0-9, and : + = @ _ % - (hyphen). The name is not case sensitive.
tags
Optional
array
An array of key-value pairs. You can use tags to categorize your Amazon Web Services resources in different ways, for example, by purpose, owner, or environment. For more information, see Tagging Amazon Web Services Resources (https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html).
Tags that you specify for the tuning job are also added to all training jobs that the tuning job launches.
tags.[]
Required
object
A tag object that consists of a key and an optional value, used to manage metadata for SageMaker Amazon Web Services resources.
You can add tags to notebook instances, training jobs, hyperparameter tuning jobs, batch transform jobs, models, labeling jobs, work teams, endpoint configurations, and endpoints. For more information on adding tags to SageMaker resources, see AddTags.
For more information on adding metadata to your Amazon Web Services resources with tagging, see Tagging Amazon Web Services resources (https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html). For advice on best practices for managing Amazon Web Services resources with tagging, see Tagging Best Practices: Implement an Effective Amazon Web Services Resource Tagging Strategy (https://d1.awsstatic.com/whitepapers/aws-tagging-best-practices.pdf).
tags.[].value
Optional
string
trainingJobDefinition
Optional
object
The HyperParameterTrainingJobDefinition object that describes the training jobs that this tuning job launches, including static hyperparameters, input data configuration, output data configuration, resource configuration, and stopping condition.
trainingJobDefinition.algorithmSpecification
Optional
object
Specifies which training algorithm to use for training jobs that a hyperparameter tuning job launches and the metrics to monitor.
trainingJobDefinition.algorithmSpecification.algorithmName
Optional
string
trainingJobDefinition.algorithmSpecification.metricDefinitions
Optional
array
trainingJobDefinition.algorithmSpecification.metricDefinitions.[]
Required
object
Specifies a metric that the training algorithm writes to stderr or stdout. SageMakerhyperparameter tuning captures all defined metrics. You specify one metric that a hyperparameter tuning job uses as its objective metric to choose the best training job.
trainingJobDefinition.algorithmSpecification.metricDefinitions.[].regex
Optional
string
trainingJobDefinition.algorithmSpecification.trainingImage
Optional
string
trainingJobDefinition.algorithmSpecification.trainingInputMode
Optional
string
The training input mode that the algorithm supports. For more information about input modes, see Algorithms (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html).
Pipe mode
If an algorithm supports Pipe mode, Amazon SageMaker streams data directly from Amazon S3 to the container.
File mode
If an algorithm supports File mode, SageMaker downloads the training data from S3 to the provisioned ML storage volume, and mounts the directory to the Docker volume for the training container.
You must provision the ML storage volume with sufficient capacity to accommodate the data downloaded from S3. In addition to the training data, the ML storage volume also stores the output model. The algorithm container uses the ML storage volume to also store intermediate information, if any.
For distributed algorithms, training data is distributed uniformly. Your training duration is predictable if the input data objects sizes are approximately the same. SageMaker does not split the files any further for model training. If the object sizes are skewed, training won’t be optimal as the data distribution is also skewed when one host in a training cluster is overloaded, thus becoming a bottleneck in training.
FastFile mode
If an algorithm supports FastFile mode, SageMaker streams data directly from S3 to the container with no code changes, and provides file system access to the data. Users can author their training script to interact with these files as if they were stored on disk.
FastFile mode works best when the data is read sequentially. Augmented manifest files aren’t supported. The startup time is lower when there are fewer files in the S3 bucket provided.
trainingJobDefinition.checkpointConfig
Optional
object
Contains information about the output location for managed spot training checkpoint data.
trainingJobDefinition.checkpointConfig.localPath
Optional
string
trainingJobDefinition.checkpointConfig.s3URI
Optional
string
trainingJobDefinition.definitionName
Optional
string
trainingJobDefinition.enableInterContainerTrafficEncryption
Optional
boolean
trainingJobDefinition.enableManagedSpotTraining
Optional
boolean
trainingJobDefinition.enableNetworkIsolation
Optional
boolean
trainingJobDefinition.hyperParameterRanges
Optional
object
Specifies ranges of integer, continuous, and categorical hyperparameters that a hyperparameter tuning job searches. The hyperparameter tuning job launches training jobs with hyperparameter values within these ranges to find the combination of values that result in the training job with the best performance as measured by the objective metric of the hyperparameter tuning job.
The maximum number of items specified for Array Members refers to the maximum number of hyperparameters for each range and also the maximum for the hyperparameter tuning job itself. That is, the sum of the number of hyperparameters for all the ranges can’t exceed the maximum number specified.
trainingJobDefinition.hyperParameterRanges.categoricalParameterRanges
Optional
array
trainingJobDefinition.hyperParameterRanges.categoricalParameterRanges.[]
Required
object
A list of categorical hyperparameters to tune.
trainingJobDefinition.hyperParameterRanges.categoricalParameterRanges.[].values
Optional
array
trainingJobDefinition.hyperParameterRanges.categoricalParameterRanges.[].values.[]
Required
string
trainingJobDefinition.hyperParameterRanges.continuousParameterRanges.[]
Required
object
A list of continuous hyperparameters to tune.
trainingJobDefinition.hyperParameterRanges.continuousParameterRanges.[].minValue
Optional
string
trainingJobDefinition.hyperParameterRanges.continuousParameterRanges.[].name
Optional
string
trainingJobDefinition.hyperParameterRanges.continuousParameterRanges.[].scalingType
Optional
string
trainingJobDefinition.hyperParameterRanges.integerParameterRanges
Optional
array
trainingJobDefinition.hyperParameterRanges.integerParameterRanges.[]
Required
object
For a hyperparameter of the integer type, specifies the range that a hyperparameter tuning job searches.
trainingJobDefinition.hyperParameterRanges.integerParameterRanges.[].minValue
Optional
string
trainingJobDefinition.hyperParameterRanges.integerParameterRanges.[].name
Optional
string
trainingJobDefinition.hyperParameterRanges.integerParameterRanges.[].scalingType
Optional
string
trainingJobDefinition.inputDataConfig
Optional
array
trainingJobDefinition.inputDataConfig.[]
Required
object
A channel is a named input source that training algorithms can consume.
trainingJobDefinition.inputDataConfig.[].compressionType
Optional
string
trainingJobDefinition.inputDataConfig.[].contentType
Optional
string
trainingJobDefinition.inputDataConfig.[].dataSource
Optional
object
Describes the location of the channel data.
trainingJobDefinition.inputDataConfig.[].dataSource.fileSystemDataSource
Optional
object
Specifies a file system data source for a channel.
trainingJobDefinition.inputDataConfig.[].dataSource.fileSystemDataSource.directoryPath
Optional
string
trainingJobDefinition.inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemAccessMode
Optional
string
trainingJobDefinition.inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemID
Optional
string
trainingJobDefinition.inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemType
Optional
string
trainingJobDefinition.inputDataConfig.[].dataSource.s3DataSource
Optional
object
Describes the S3 data source.
trainingJobDefinition.inputDataConfig.[].dataSource.s3DataSource.attributeNames
Optional
array
trainingJobDefinition.inputDataConfig.[].dataSource.s3DataSource.attributeNames.[]
Required
string
trainingJobDefinition.inputDataConfig.[].dataSource.s3DataSource.instanceGroupNames.[]
Required
string
trainingJobDefinition.inputDataConfig.[].dataSource.s3DataSource.s3DataType
Optional
string
trainingJobDefinition.inputDataConfig.[].dataSource.s3DataSource.s3URI
Optional
string
trainingJobDefinition.inputDataConfig.[].inputMode
Optional
string
The training input mode that the algorithm supports. For more information about input modes, see Algorithms (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html).
Pipe mode
If an algorithm supports Pipe mode, Amazon SageMaker streams data directly from Amazon S3 to the container.
File mode
If an algorithm supports File mode, SageMaker downloads the training data from S3 to the provisioned ML storage volume, and mounts the directory to the Docker volume for the training container.
You must provision the ML storage volume with sufficient capacity to accommodate the data downloaded from S3. In addition to the training data, the ML storage volume also stores the output model. The algorithm container uses the ML storage volume to also store intermediate information, if any.
For distributed algorithms, training data is distributed uniformly. Your training duration is predictable if the input data objects sizes are approximately the same. SageMaker does not split the files any further for model training. If the object sizes are skewed, training won’t be optimal as the data distribution is also skewed when one host in a training cluster is overloaded, thus becoming a bottleneck in training.
FastFile mode
If an algorithm supports FastFile mode, SageMaker streams data directly from S3 to the container with no code changes, and provides file system access to the data. Users can author their training script to interact with these files as if they were stored on disk.
FastFile mode works best when the data is read sequentially. Augmented manifest files aren’t supported. The startup time is lower when there are fewer files in the S3 bucket provided.
trainingJobDefinition.inputDataConfig.[].recordWrapperType
Optional
string
trainingJobDefinition.inputDataConfig.[].shuffleConfig
Optional
object
A configuration for a shuffle option for input data in a channel. If you use S3Prefix for S3DataType, the results of the S3 key prefix matches are shuffled. If you use ManifestFile, the order of the S3 object references in the ManifestFile is shuffled. If you use AugmentedManifestFile, the order of the JSON lines in the AugmentedManifestFile is shuffled. The shuffling order is determined using the Seed value.
For Pipe input mode, when ShuffleConfig is specified shuffling is done at the start of every epoch. With large datasets, this ensures that the order of the training data is different for each epoch, and it helps reduce bias and possible overfitting. In a multi-node training job when ShuffleConfig is combined with S3DataDistributionType of ShardedByS3Key, the data is shuffled across nodes so that the content sent to a particular node on the first epoch might be sent to a different node on the second epoch.
trainingJobDefinition.inputDataConfig.[].shuffleConfig.seed
Optional
integer
trainingJobDefinition.outputDataConfig
Optional
object
Provides information about how to store model training results (model artifacts).
trainingJobDefinition.outputDataConfig.kmsKeyID
Optional
string
trainingJobDefinition.outputDataConfig.s3OutputPath
Optional
string
trainingJobDefinition.resourceConfig
Optional
object
Describes the resources, including machine learning (ML) compute instances and ML storage volumes, to use for model training.
trainingJobDefinition.resourceConfig.instanceCount
Optional
integer
trainingJobDefinition.resourceConfig.instanceGroups
Optional
array
trainingJobDefinition.resourceConfig.instanceGroups.[]
Required
object
Defines an instance group for heterogeneous cluster training. When requesting a training job using the CreateTrainingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) API, you can configure multiple instance groups .
trainingJobDefinition.resourceConfig.instanceGroups.[].instanceGroupName
Optional
string
trainingJobDefinition.resourceConfig.instanceGroups.[].instanceType
Optional
string
trainingJobDefinition.resourceConfig.instanceType
Optional
string
trainingJobDefinition.resourceConfig.keepAlivePeriodInSeconds
Optional
integer
trainingJobDefinition.resourceConfig.volumeKMSKeyID
Optional
string
trainingJobDefinition.resourceConfig.volumeSizeInGB
Optional
integer
trainingJobDefinition.retryStrategy
Optional
object
The retry strategy to use when a training job fails due to an InternalServerError. RetryStrategy is specified as part of the CreateTrainingJob and CreateHyperParameterTuningJob requests. You can add the StoppingCondition parameter to the request to limit the training time for the complete job.
trainingJobDefinition.retryStrategy.maximumRetryAttempts
Optional
integer
trainingJobDefinition.roleARN
Optional
string
trainingJobDefinition.staticHyperParameters
Optional
object
trainingJobDefinition.stoppingCondition
Optional
object
Specifies a limit to how long a model training job or model compilation job can run. It also specifies how long a managed spot training job has to complete. When the job reaches the time limit, SageMaker ends the training or compilation job. Use this API to cap model training costs.
To stop a training job, SageMaker sends the algorithm the SIGTERM signal, which delays job termination for 120 seconds. Algorithms can use this 120-second window to save the model artifacts, so the results of training are not lost.
The training algorithms provided by SageMaker automatically save the intermediate results of a model training job when possible. This attempt to save artifacts is only a best effort case as model might not be in a state from which it can be saved. For example, if training has just started, the model might not be ready to save. When saved, this intermediate data is a valid model artifact. You can use it to create a model with CreateModel.
The Neural Topic Model (NTM) currently does not support saving intermediate model artifacts. When training NTMs, make sure that the maximum runtime is sufficient for the training job to complete.
trainingJobDefinition.stoppingCondition.maxRuntimeInSeconds
Optional
integer
trainingJobDefinition.stoppingCondition.maxWaitTimeInSeconds
Optional
integer
trainingJobDefinition.tuningObjective
Optional
object
Defines the objective metric for a hyperparameter tuning job. Hyperparameter tuning uses the value of this metric to evaluate the training jobs it launches, and returns the training job that results in either the highest or lowest value for this metric, depending on the value you specify for the Type parameter.
trainingJobDefinition.tuningObjective.metricName
Optional
string
**trainingJobDefinition.tuningObjective.type_**
Optional
string
trainingJobDefinition.vpcConfig
Optional
object
Specifies a VPC that your training jobs and hosted models have access to. Control access to and from your training and model containers by configuring the VPC. For more information, see Protect Endpoints by Using an Amazon Virtual Private Cloud (https://docs.aws.amazon.com/sagemaker/latest/dg/host-vpc.html) and Protect Training Jobs by Using an Amazon Virtual Private Cloud (https://docs.aws.amazon.com/sagemaker/latest/dg/train-vpc.html).
trainingJobDefinition.vpcConfig.securityGroupIDs
Optional
array
trainingJobDefinition.vpcConfig.securityGroupIDs.[]
Required
string
trainingJobDefinition.vpcConfig.subnets.[]
Required
string
trainingJobDefinitions.[]
Required
object
Defines the training jobs launched by a hyperparameter tuning job.
trainingJobDefinitions.[].algorithmSpecification.algorithmName
Optional
string
trainingJobDefinitions.[].algorithmSpecification.metricDefinitions
Optional
array
trainingJobDefinitions.[].algorithmSpecification.metricDefinitions.[]
Required
object
Specifies a metric that the training algorithm writes to stderr or stdout. SageMakerhyperparameter tuning captures all defined metrics. You specify one metric that a hyperparameter tuning job uses as its objective metric to choose the best training job.
trainingJobDefinitions.[].algorithmSpecification.metricDefinitions.[].regex
Optional
string
trainingJobDefinitions.[].algorithmSpecification.trainingImage
Optional
string
trainingJobDefinitions.[].algorithmSpecification.trainingInputMode
Optional
string
The training input mode that the algorithm supports. For more information about input modes, see Algorithms (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html).
Pipe mode
If an algorithm supports Pipe mode, Amazon SageMaker streams data directly from Amazon S3 to the container.
File mode
If an algorithm supports File mode, SageMaker downloads the training data from S3 to the provisioned ML storage volume, and mounts the directory to the Docker volume for the training container.
You must provision the ML storage volume with sufficient capacity to accommodate the data downloaded from S3. In addition to the training data, the ML storage volume also stores the output model. The algorithm container uses the ML storage volume to also store intermediate information, if any.
For distributed algorithms, training data is distributed uniformly. Your training duration is predictable if the input data objects sizes are approximately the same. SageMaker does not split the files any further for model training. If the object sizes are skewed, training won’t be optimal as the data distribution is also skewed when one host in a training cluster is overloaded, thus becoming a bottleneck in training.
FastFile mode
If an algorithm supports FastFile mode, SageMaker streams data directly from S3 to the container with no code changes, and provides file system access to the data. Users can author their training script to interact with these files as if they were stored on disk.
FastFile mode works best when the data is read sequentially. Augmented manifest files aren’t supported. The startup time is lower when there are fewer files in the S3 bucket provided.
trainingJobDefinitions.[].checkpointConfig
Optional
object
Contains information about the output location for managed spot training checkpoint data.
trainingJobDefinitions.[].checkpointConfig.localPath
Optional
string
trainingJobDefinitions.[].checkpointConfig.s3URI
Optional
string
trainingJobDefinitions.[].definitionName
Optional
string
trainingJobDefinitions.[].enableInterContainerTrafficEncryption
Optional
boolean
trainingJobDefinitions.[].enableManagedSpotTraining
Optional
boolean
trainingJobDefinitions.[].enableNetworkIsolation
Optional
boolean
trainingJobDefinitions.[].hyperParameterRanges
Optional
object
Specifies ranges of integer, continuous, and categorical hyperparameters that a hyperparameter tuning job searches. The hyperparameter tuning job launches training jobs with hyperparameter values within these ranges to find the combination of values that result in the training job with the best performance as measured by the objective metric of the hyperparameter tuning job.
The maximum number of items specified for Array Members refers to the maximum number of hyperparameters for each range and also the maximum for the hyperparameter tuning job itself. That is, the sum of the number of hyperparameters for all the ranges can’t exceed the maximum number specified.
trainingJobDefinitions.[].hyperParameterRanges.categoricalParameterRanges
Optional
array
trainingJobDefinitions.[].hyperParameterRanges.categoricalParameterRanges.[]
Required
object
A list of categorical hyperparameters to tune.
trainingJobDefinitions.[].hyperParameterRanges.categoricalParameterRanges.[].values
Optional
array
trainingJobDefinitions.[].hyperParameterRanges.categoricalParameterRanges.[].values.[]
Required
string
trainingJobDefinitions.[].hyperParameterRanges.continuousParameterRanges.[]
Required
object
A list of continuous hyperparameters to tune.
trainingJobDefinitions.[].hyperParameterRanges.continuousParameterRanges.[].minValue
Optional
string
trainingJobDefinitions.[].hyperParameterRanges.continuousParameterRanges.[].name
Optional
string
trainingJobDefinitions.[].hyperParameterRanges.continuousParameterRanges.[].scalingType
Optional
string
trainingJobDefinitions.[].hyperParameterRanges.integerParameterRanges
Optional
array
trainingJobDefinitions.[].hyperParameterRanges.integerParameterRanges.[]
Required
object
For a hyperparameter of the integer type, specifies the range that a hyperparameter tuning job searches.
trainingJobDefinitions.[].hyperParameterRanges.integerParameterRanges.[].minValue
Optional
string
trainingJobDefinitions.[].hyperParameterRanges.integerParameterRanges.[].name
Optional
string
trainingJobDefinitions.[].hyperParameterRanges.integerParameterRanges.[].scalingType
Optional
string
trainingJobDefinitions.[].inputDataConfig
Optional
array
trainingJobDefinitions.[].inputDataConfig.[]
Required
object
A channel is a named input source that training algorithms can consume.
trainingJobDefinitions.[].inputDataConfig.[].compressionType
Optional
string
trainingJobDefinitions.[].inputDataConfig.[].contentType
Optional
string
trainingJobDefinitions.[].inputDataConfig.[].dataSource
Optional
object
Describes the location of the channel data.
trainingJobDefinitions.[].inputDataConfig.[].dataSource.fileSystemDataSource
Optional
object
Specifies a file system data source for a channel.
trainingJobDefinitions.[].inputDataConfig.[].dataSource.fileSystemDataSource.directoryPath
Optional
string
trainingJobDefinitions.[].inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemAccessMode
Optional
string
trainingJobDefinitions.[].inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemID
Optional
string
trainingJobDefinitions.[].inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemType
Optional
string
trainingJobDefinitions.[].inputDataConfig.[].dataSource.s3DataSource
Optional
object
Describes the S3 data source.
trainingJobDefinitions.[].inputDataConfig.[].dataSource.s3DataSource.attributeNames
Optional
array
trainingJobDefinitions.[].inputDataConfig.[].dataSource.s3DataSource.attributeNames.[]
Required
string
trainingJobDefinitions.[].inputDataConfig.[].dataSource.s3DataSource.instanceGroupNames.[]
Required
string
trainingJobDefinitions.[].inputDataConfig.[].dataSource.s3DataSource.s3DataType
Optional
string
trainingJobDefinitions.[].inputDataConfig.[].dataSource.s3DataSource.s3URI
Optional
string
trainingJobDefinitions.[].inputDataConfig.[].inputMode
Optional
string
The training input mode that the algorithm supports. For more information about input modes, see Algorithms (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html).
Pipe mode
If an algorithm supports Pipe mode, Amazon SageMaker streams data directly from Amazon S3 to the container.
File mode
If an algorithm supports File mode, SageMaker downloads the training data from S3 to the provisioned ML storage volume, and mounts the directory to the Docker volume for the training container.
You must provision the ML storage volume with sufficient capacity to accommodate the data downloaded from S3. In addition to the training data, the ML storage volume also stores the output model. The algorithm container uses the ML storage volume to also store intermediate information, if any.
For distributed algorithms, training data is distributed uniformly. Your training duration is predictable if the input data objects sizes are approximately the same. SageMaker does not split the files any further for model training. If the object sizes are skewed, training won’t be optimal as the data distribution is also skewed when one host in a training cluster is overloaded, thus becoming a bottleneck in training.
FastFile mode
If an algorithm supports FastFile mode, SageMaker streams data directly from S3 to the container with no code changes, and provides file system access to the data. Users can author their training script to interact with these files as if they were stored on disk.
FastFile mode works best when the data is read sequentially. Augmented manifest files aren’t supported. The startup time is lower when there are fewer files in the S3 bucket provided.
trainingJobDefinitions.[].inputDataConfig.[].recordWrapperType
Optional
string
trainingJobDefinitions.[].inputDataConfig.[].shuffleConfig
Optional
object
A configuration for a shuffle option for input data in a channel. If you use S3Prefix for S3DataType, the results of the S3 key prefix matches are shuffled. If you use ManifestFile, the order of the S3 object references in the ManifestFile is shuffled. If you use AugmentedManifestFile, the order of the JSON lines in the AugmentedManifestFile is shuffled. The shuffling order is determined using the Seed value.
For Pipe input mode, when ShuffleConfig is specified shuffling is done at the start of every epoch. With large datasets, this ensures that the order of the training data is different for each epoch, and it helps reduce bias and possible overfitting. In a multi-node training job when ShuffleConfig is combined with S3DataDistributionType of ShardedByS3Key, the data is shuffled across nodes so that the content sent to a particular node on the first epoch might be sent to a different node on the second epoch.
trainingJobDefinitions.[].inputDataConfig.[].shuffleConfig.seed
Optional
integer
trainingJobDefinitions.[].outputDataConfig
Optional
object
Provides information about how to store model training results (model artifacts).
trainingJobDefinitions.[].outputDataConfig.kmsKeyID
Optional
string
trainingJobDefinitions.[].outputDataConfig.s3OutputPath
Optional
string
trainingJobDefinitions.[].resourceConfig
Optional
object
Describes the resources, including machine learning (ML) compute instances and ML storage volumes, to use for model training.
trainingJobDefinitions.[].resourceConfig.instanceCount
Optional
integer
trainingJobDefinitions.[].resourceConfig.instanceGroups
Optional
array
trainingJobDefinitions.[].resourceConfig.instanceGroups.[]
Required
object
Defines an instance group for heterogeneous cluster training. When requesting a training job using the CreateTrainingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) API, you can configure multiple instance groups .
trainingJobDefinitions.[].resourceConfig.instanceGroups.[].instanceGroupName
Optional
string
trainingJobDefinitions.[].resourceConfig.instanceGroups.[].instanceType
Optional
string
trainingJobDefinitions.[].resourceConfig.instanceType
Optional
string
trainingJobDefinitions.[].resourceConfig.keepAlivePeriodInSeconds
Optional
integer
trainingJobDefinitions.[].resourceConfig.volumeKMSKeyID
Optional
string
trainingJobDefinitions.[].resourceConfig.volumeSizeInGB
Optional
integer
trainingJobDefinitions.[].retryStrategy
Optional
object
The retry strategy to use when a training job fails due to an InternalServerError. RetryStrategy is specified as part of the CreateTrainingJob and CreateHyperParameterTuningJob requests. You can add the StoppingCondition parameter to the request to limit the training time for the complete job.
trainingJobDefinitions.[].retryStrategy.maximumRetryAttempts
Optional
integer
trainingJobDefinitions.[].roleARN
Optional
string
trainingJobDefinitions.[].staticHyperParameters
Optional
object
trainingJobDefinitions.[].stoppingCondition
Optional
object
Specifies a limit to how long a model training job or model compilation job can run. It also specifies how long a managed spot training job has to complete. When the job reaches the time limit, SageMaker ends the training or compilation job. Use this API to cap model training costs.
To stop a training job, SageMaker sends the algorithm the SIGTERM signal, which delays job termination for 120 seconds. Algorithms can use this 120-second window to save the model artifacts, so the results of training are not lost.
The training algorithms provided by SageMaker automatically save the intermediate results of a model training job when possible. This attempt to save artifacts is only a best effort case as model might not be in a state from which it can be saved. For example, if training has just started, the model might not be ready to save. When saved, this intermediate data is a valid model artifact. You can use it to create a model with CreateModel.
The Neural Topic Model (NTM) currently does not support saving intermediate model artifacts. When training NTMs, make sure that the maximum runtime is sufficient for the training job to complete.
trainingJobDefinitions.[].stoppingCondition.maxRuntimeInSeconds
Optional
integer
trainingJobDefinitions.[].stoppingCondition.maxWaitTimeInSeconds
Optional
integer
trainingJobDefinitions.[].tuningObjective
Optional
object
Defines the objective metric for a hyperparameter tuning job. Hyperparameter tuning uses the value of this metric to evaluate the training jobs it launches, and returns the training job that results in either the highest or lowest value for this metric, depending on the value you specify for the Type parameter.
trainingJobDefinitions.[].tuningObjective.metricName
Optional
string
**trainingJobDefinitions.[].tuningObjective.type_**
Optional
string
trainingJobDefinitions.[].vpcConfig
Optional
object
Specifies a VPC that your training jobs and hosted models have access to. Control access to and from your training and model containers by configuring the VPC. For more information, see Protect Endpoints by Using an Amazon Virtual Private Cloud (https://docs.aws.amazon.com/sagemaker/latest/dg/host-vpc.html) and Protect Training Jobs by Using an Amazon Virtual Private Cloud (https://docs.aws.amazon.com/sagemaker/latest/dg/train-vpc.html).
trainingJobDefinitions.[].vpcConfig.securityGroupIDs
Optional
array
trainingJobDefinitions.[].vpcConfig.securityGroupIDs.[]
Required
string
trainingJobDefinitions.[].vpcConfig.subnets.[]
Required
string
warmStartConfig.parentHyperParameterTuningJobs
Optional
array
warmStartConfig.parentHyperParameterTuningJobs.[]
Required
object
A previously completed or stopped hyperparameter tuning job to be used as a starting point for a new hyperparameter tuning job.
warmStartConfig.warmStartType
Optional
string

Status

ackResourceMetadata: 
  arn: string
  ownerAccountID: string
  region: string
bestTrainingJob: 
  creationTime: string
  failureReason: string
  finalHyperParameterTuningJobObjectiveMetric: 
    metricName: string
    type_: string
    value: number
  objectiveStatus: string
  trainingEndTime: string
  trainingJobARN: string
  trainingJobDefinitionName: string
  trainingJobName: string
  trainingJobStatus: string
  trainingStartTime: string
  tunedHyperParameters: {}
  tuningJobName: string
conditions:
- lastTransitionTime: string
  message: string
  reason: string
  status: string
  type: string
failureReason: string
hyperParameterTuningJobStatus: string
overallBestTrainingJob: 
  creationTime: string
  failureReason: string
  finalHyperParameterTuningJobObjectiveMetric: 
    metricName: string
    type_: string
    value: number
  objectiveStatus: string
  trainingEndTime: string
  trainingJobARN: string
  trainingJobDefinitionName: string
  trainingJobName: string
  trainingJobStatus: string
  trainingStartTime: string
  tunedHyperParameters: {}
  tuningJobName: string
FieldDescription
ackResourceMetadata
Optional
object
All CRs managed by ACK have a common Status.ACKResourceMetadata member that is used to contain resource sync state, account ownership, constructed ARN for the resource
ackResourceMetadata.arn
Optional
string
ARN is the Amazon Resource Name for the resource. This is a globally-unique identifier and is set only by the ACK service controller once the controller has orchestrated the creation of the resource OR when it has verified that an “adopted” resource (a resource where the ARN annotation was set by the Kubernetes user on the CR) exists and matches the supplied CR’s Spec field values. TODO(vijat@): Find a better strategy for resources that do not have ARN in CreateOutputResponse https://github.com/aws/aws-controllers-k8s/issues/270
ackResourceMetadata.ownerAccountID
Required
string
OwnerAccountID is the AWS Account ID of the account that owns the backend AWS service API resource.
ackResourceMetadata.region
Required
string
Region is the AWS region in which the resource exists or will exist.
bestTrainingJob
Optional
object
A TrainingJobSummary object that describes the training job that completed with the best current HyperParameterTuningJobObjective.
bestTrainingJob.creationTime
Optional
string
bestTrainingJob.failureReason
Optional
string
bestTrainingJob.finalHyperParameterTuningJobObjectiveMetric
Optional
object
Shows the latest objective metric emitted by a training job that was launched by a hyperparameter tuning job. You define the objective metric in the HyperParameterTuningJobObjective parameter of HyperParameterTuningJobConfig.
bestTrainingJob.finalHyperParameterTuningJobObjectiveMetric.metricName
Optional
string
**bestTrainingJob.finalHyperParameterTuningJobObjectiveMetric.type_**
Optional
string
bestTrainingJob.finalHyperParameterTuningJobObjectiveMetric.value
Optional
number
bestTrainingJob.objectiveStatus
Optional
string
bestTrainingJob.trainingEndTime
Optional
string
bestTrainingJob.trainingJobARN
Optional
string
bestTrainingJob.trainingJobDefinitionName
Optional
string
bestTrainingJob.trainingJobName
Optional
string
bestTrainingJob.trainingJobStatus
Optional
string
bestTrainingJob.trainingStartTime
Optional
string
bestTrainingJob.tunedHyperParameters
Optional
object
bestTrainingJob.tuningJobName
Optional
string
conditions
Optional
array
All CRS managed by ACK have a common Status.Conditions member that contains a collection of ackv1alpha1.Condition objects that describe the various terminal states of the CR and its backend AWS service API resource
conditions.[]
Required
object
Condition is the common struct used by all CRDs managed by ACK service controllers to indicate terminal states of the CR and its backend AWS service API resource
conditions.[].message
Optional
string
A human readable message indicating details about the transition.
conditions.[].reason
Optional
string
The reason for the condition’s last transition.
conditions.[].status
Optional
string
Status of the condition, one of True, False, Unknown.
conditions.[].type
Optional
string
Type is the type of the Condition
failureReason
Optional
string
If the tuning job failed, the reason it failed.
hyperParameterTuningJobStatus
Optional
string
The status of the tuning job: InProgress, Completed, Failed, Stopping, or Stopped.
overallBestTrainingJob
Optional
object
If the hyperparameter tuning job is an warm start tuning job with a WarmStartType of IDENTICAL_DATA_AND_ALGORITHM, this is the TrainingJobSummary for the training job with the best objective metric value of all training jobs launched by this tuning job and all parent jobs specified for the warm start tuning job.
overallBestTrainingJob.creationTime
Optional
string
overallBestTrainingJob.failureReason
Optional
string
overallBestTrainingJob.finalHyperParameterTuningJobObjectiveMetric
Optional
object
Shows the latest objective metric emitted by a training job that was launched by a hyperparameter tuning job. You define the objective metric in the HyperParameterTuningJobObjective parameter of HyperParameterTuningJobConfig.
overallBestTrainingJob.finalHyperParameterTuningJobObjectiveMetric.metricName
Optional
string
**overallBestTrainingJob.finalHyperParameterTuningJobObjectiveMetric.type_**
Optional
string
overallBestTrainingJob.finalHyperParameterTuningJobObjectiveMetric.value
Optional
number
overallBestTrainingJob.objectiveStatus
Optional
string
overallBestTrainingJob.trainingEndTime
Optional
string
overallBestTrainingJob.trainingJobARN
Optional
string
overallBestTrainingJob.trainingJobDefinitionName
Optional
string
overallBestTrainingJob.trainingJobName
Optional
string
overallBestTrainingJob.trainingJobStatus
Optional
string
overallBestTrainingJob.trainingStartTime
Optional
string
overallBestTrainingJob.tunedHyperParameters
Optional
object
overallBestTrainingJob.tuningJobName
Optional
string