HyperParameterTuningJob
sagemaker.services.k8s.aws/v1alpha1
Type | Link |
---|---|
GoDoc | sagemaker-controller/apis/v1alpha1#HyperParameterTuningJob |
Metadata
Property | Value |
---|---|
Scope | Namespaced |
Kind | HyperParameterTuningJob |
ListKind | HyperParameterTuningJobList |
Plural | hyperparametertuningjobs |
Singular | hyperparametertuningjob |
Spec
hyperParameterTuningJobConfig:
hyperParameterTuningJobObjective:
metricName: string
type_: string
parameterRanges:
categoricalParameterRanges:
- name: string
values:
- string
continuousParameterRanges:
- maxValue: string
minValue: string
name: string
scalingType: string
integerParameterRanges:
- maxValue: string
minValue: string
name: string
scalingType: string
resourceLimits:
maxNumberOfTrainingJobs: integer
maxParallelTrainingJobs: integer
strategy: string
trainingJobEarlyStoppingType: string
tuningJobCompletionCriteria:
targetObjectiveMetricValue: number
hyperParameterTuningJobName: string
tags:
- key: string
value: string
trainingJobDefinition:
algorithmSpecification:
algorithmName: string
metricDefinitions:
- name: string
regex: string
trainingImage: string
trainingInputMode: string
checkpointConfig:
localPath: string
s3URI: string
definitionName: string
enableInterContainerTrafficEncryption: boolean
enableManagedSpotTraining: boolean
enableNetworkIsolation: boolean
hyperParameterRanges:
categoricalParameterRanges:
- name: string
values:
- string
continuousParameterRanges:
- maxValue: string
minValue: string
name: string
scalingType: string
integerParameterRanges:
- maxValue: string
minValue: string
name: string
scalingType: string
inputDataConfig:
- channelName: string
compressionType: string
contentType: string
dataSource:
fileSystemDataSource:
directoryPath: string
fileSystemAccessMode: string
fileSystemID: string
fileSystemType: string
s3DataSource:
attributeNames:
- string
instanceGroupNames:
- string
s3DataDistributionType: string
s3DataType: string
s3URI: string
inputMode: string
recordWrapperType: string
shuffleConfig:
seed: integer
outputDataConfig:
kmsKeyID: string
s3OutputPath: string
resourceConfig:
instanceCount: integer
instanceGroups:
- instanceCount: integer
instanceGroupName: string
instanceType: string
instanceType: string
keepAlivePeriodInSeconds: integer
volumeKMSKeyID: string
volumeSizeInGB: integer
retryStrategy:
maximumRetryAttempts: integer
roleARN: string
staticHyperParameters: {}
stoppingCondition:
maxRuntimeInSeconds: integer
maxWaitTimeInSeconds: integer
tuningObjective:
metricName: string
type_: string
vpcConfig:
securityGroupIDs:
- string
subnets:
- string
trainingJobDefinitions:
algorithmSpecification:
algorithmName: string
metricDefinitions:
- name: string
regex: string
trainingImage: string
trainingInputMode: string
checkpointConfig:
localPath: string
s3URI: string
definitionName: string
enableInterContainerTrafficEncryption: boolean
enableManagedSpotTraining: boolean
enableNetworkIsolation: boolean
hyperParameterRanges:
categoricalParameterRanges:
- name: string
values:
- string
continuousParameterRanges:
- maxValue: string
minValue: string
name: string
scalingType: string
integerParameterRanges:
- maxValue: string
minValue: string
name: string
scalingType: string
inputDataConfig:
- channelName: string
compressionType: string
contentType: string
dataSource:
fileSystemDataSource:
directoryPath: string
fileSystemAccessMode: string
fileSystemID: string
fileSystemType: string
s3DataSource:
attributeNames:
- string
instanceGroupNames:
- string
s3DataDistributionType: string
s3DataType: string
s3URI: string
inputMode: string
recordWrapperType: string
shuffleConfig:
seed: integer
outputDataConfig:
kmsKeyID: string
s3OutputPath: string
resourceConfig:
instanceCount: integer
instanceGroups:
- instanceCount: integer
instanceGroupName: string
instanceType: string
instanceType: string
keepAlivePeriodInSeconds: integer
volumeKMSKeyID: string
volumeSizeInGB: integer
retryStrategy:
maximumRetryAttempts: integer
roleARN: string
staticHyperParameters: {}
stoppingCondition:
maxRuntimeInSeconds: integer
maxWaitTimeInSeconds: integer
tuningObjective:
metricName: string
type_: string
vpcConfig:
securityGroupIDs:
- string
subnets:
- string
warmStartConfig:
parentHyperParameterTuningJobs:
- hyperParameterTuningJobName: string
warmStartType: string
Field | Description |
---|---|
hyperParameterTuningJobConfig Required | object The HyperParameterTuningJobConfig object that describes the tuning job, including the search strategy, the objective metric used to evaluate training jobs, ranges of parameters to search, and resource limits for the tuning job. For more information, see How Hyperparameter Tuning Works (https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html). |
hyperParameterTuningJobConfig.hyperParameterTuningJobObjective Optional | object Defines the objective metric for a hyperparameter tuning job. Hyperparameter tuning uses the value of this metric to evaluate the training jobs it launches, and returns the training job that results in either the highest or lowest value for this metric, depending on the value you specify for the Type parameter. |
hyperParameterTuningJobConfig.hyperParameterTuningJobObjective.metricName Optional | string |
**hyperParameterTuningJobConfig.hyperParameterTuningJobObjective.type_** Optional | string |
hyperParameterTuningJobConfig.parameterRanges Optional | object Specifies ranges of integer, continuous, and categorical hyperparameters that a hyperparameter tuning job searches. The hyperparameter tuning job launches training jobs with hyperparameter values within these ranges to find the combination of values that result in the training job with the best performance as measured by the objective metric of the hyperparameter tuning job. The maximum number of items specified for Array Members refers to the maximum number of hyperparameters for each range and also the maximum for the hyperparameter tuning job itself. That is, the sum of the number of hyperparameters for all the ranges can’t exceed the maximum number specified. |
hyperParameterTuningJobConfig.parameterRanges.categoricalParameterRanges Optional | array |
hyperParameterTuningJobConfig.parameterRanges.categoricalParameterRanges.[] Required | object A list of categorical hyperparameters to tune. |
hyperParameterTuningJobConfig.parameterRanges.categoricalParameterRanges.[].values Optional | array |
hyperParameterTuningJobConfig.parameterRanges.categoricalParameterRanges.[].values.[] Required | string |
hyperParameterTuningJobConfig.parameterRanges.continuousParameterRanges.[] Required | object A list of continuous hyperparameters to tune. |
hyperParameterTuningJobConfig.parameterRanges.continuousParameterRanges.[].minValue Optional | string |
hyperParameterTuningJobConfig.parameterRanges.continuousParameterRanges.[].name Optional | string |
hyperParameterTuningJobConfig.parameterRanges.continuousParameterRanges.[].scalingType Optional | string |
hyperParameterTuningJobConfig.parameterRanges.integerParameterRanges Optional | array |
hyperParameterTuningJobConfig.parameterRanges.integerParameterRanges.[] Required | object For a hyperparameter of the integer type, specifies the range that a hyperparameter tuning job searches. |
hyperParameterTuningJobConfig.parameterRanges.integerParameterRanges.[].minValue Optional | string |
hyperParameterTuningJobConfig.parameterRanges.integerParameterRanges.[].name Optional | string |
hyperParameterTuningJobConfig.parameterRanges.integerParameterRanges.[].scalingType Optional | string |
hyperParameterTuningJobConfig.resourceLimits Optional | object Specifies the maximum number of training jobs and parallel training jobs that a hyperparameter tuning job can launch. |
hyperParameterTuningJobConfig.resourceLimits.maxNumberOfTrainingJobs Optional | integer |
hyperParameterTuningJobConfig.resourceLimits.maxParallelTrainingJobs Optional | integer |
hyperParameterTuningJobConfig.strategy Optional | string The strategy hyperparameter tuning uses to find the best combination of hyperparameters for your model. |
hyperParameterTuningJobConfig.trainingJobEarlyStoppingType Optional | string |
hyperParameterTuningJobConfig.tuningJobCompletionCriteria Optional | object The job completion criteria. |
hyperParameterTuningJobConfig.tuningJobCompletionCriteria.targetObjectiveMetricValue Optional | number |
hyperParameterTuningJobName Required | string The name of the tuning job. This name is the prefix for the names of all training jobs that this tuning job launches. The name must be unique within the same Amazon Web Services account and Amazon Web Services Region. The name must have 1 to 32 characters. Valid characters are a-z, A-Z, 0-9, and : + = @ _ % - (hyphen). The name is not case sensitive. |
tags Optional | array An array of key-value pairs. You can use tags to categorize your Amazon Web Services resources in different ways, for example, by purpose, owner, or environment. For more information, see Tagging Amazon Web Services Resources (https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html). Tags that you specify for the tuning job are also added to all training jobs that the tuning job launches. |
tags.[] Required | object A tag object that consists of a key and an optional value, used to manage metadata for SageMaker Amazon Web Services resources. |
You can add tags to notebook instances, training jobs, hyperparameter tuning jobs, batch transform jobs, models, labeling jobs, work teams, endpoint configurations, and endpoints. For more information on adding tags to SageMaker resources, see AddTags. | |
For more information on adding metadata to your Amazon Web Services resources with tagging, see Tagging Amazon Web Services resources (https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html). For advice on best practices for managing Amazon Web Services resources with tagging, see Tagging Best Practices: Implement an Effective Amazon Web Services Resource Tagging Strategy (https://d1.awsstatic.com/whitepapers/aws-tagging-best-practices.pdf). | |
tags.[].value Optional | string |
trainingJobDefinition Optional | object The HyperParameterTrainingJobDefinition object that describes the training jobs that this tuning job launches, including static hyperparameters, input data configuration, output data configuration, resource configuration, and stopping condition. |
trainingJobDefinition.algorithmSpecification Optional | object Specifies which training algorithm to use for training jobs that a hyperparameter tuning job launches and the metrics to monitor. |
trainingJobDefinition.algorithmSpecification.algorithmName Optional | string |
trainingJobDefinition.algorithmSpecification.metricDefinitions Optional | array |
trainingJobDefinition.algorithmSpecification.metricDefinitions.[] Required | object Specifies a metric that the training algorithm writes to stderr or stdout. SageMakerhyperparameter tuning captures all defined metrics. You specify one metric that a hyperparameter tuning job uses as its objective metric to choose the best training job. |
trainingJobDefinition.algorithmSpecification.metricDefinitions.[].regex Optional | string |
trainingJobDefinition.algorithmSpecification.trainingImage Optional | string |
trainingJobDefinition.algorithmSpecification.trainingInputMode Optional | string The training input mode that the algorithm supports. For more information about input modes, see Algorithms (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html). Pipe mode If an algorithm supports Pipe mode, Amazon SageMaker streams data directly from Amazon S3 to the container. File mode If an algorithm supports File mode, SageMaker downloads the training data from S3 to the provisioned ML storage volume, and mounts the directory to the Docker volume for the training container. You must provision the ML storage volume with sufficient capacity to accommodate the data downloaded from S3. In addition to the training data, the ML storage volume also stores the output model. The algorithm container uses the ML storage volume to also store intermediate information, if any. For distributed algorithms, training data is distributed uniformly. Your training duration is predictable if the input data objects sizes are approximately the same. SageMaker does not split the files any further for model training. If the object sizes are skewed, training won’t be optimal as the data distribution is also skewed when one host in a training cluster is overloaded, thus becoming a bottleneck in training. FastFile mode If an algorithm supports FastFile mode, SageMaker streams data directly from S3 to the container with no code changes, and provides file system access to the data. Users can author their training script to interact with these files as if they were stored on disk. FastFile mode works best when the data is read sequentially. Augmented manifest files aren’t supported. The startup time is lower when there are fewer files in the S3 bucket provided. |
trainingJobDefinition.checkpointConfig Optional | object Contains information about the output location for managed spot training checkpoint data. |
trainingJobDefinition.checkpointConfig.localPath Optional | string |
trainingJobDefinition.checkpointConfig.s3URI Optional | string |
trainingJobDefinition.definitionName Optional | string |
trainingJobDefinition.enableInterContainerTrafficEncryption Optional | boolean |
trainingJobDefinition.enableManagedSpotTraining Optional | boolean |
trainingJobDefinition.enableNetworkIsolation Optional | boolean |
trainingJobDefinition.hyperParameterRanges Optional | object Specifies ranges of integer, continuous, and categorical hyperparameters that a hyperparameter tuning job searches. The hyperparameter tuning job launches training jobs with hyperparameter values within these ranges to find the combination of values that result in the training job with the best performance as measured by the objective metric of the hyperparameter tuning job. The maximum number of items specified for Array Members refers to the maximum number of hyperparameters for each range and also the maximum for the hyperparameter tuning job itself. That is, the sum of the number of hyperparameters for all the ranges can’t exceed the maximum number specified. |
trainingJobDefinition.hyperParameterRanges.categoricalParameterRanges Optional | array |
trainingJobDefinition.hyperParameterRanges.categoricalParameterRanges.[] Required | object A list of categorical hyperparameters to tune. |
trainingJobDefinition.hyperParameterRanges.categoricalParameterRanges.[].values Optional | array |
trainingJobDefinition.hyperParameterRanges.categoricalParameterRanges.[].values.[] Required | string |
trainingJobDefinition.hyperParameterRanges.continuousParameterRanges.[] Required | object A list of continuous hyperparameters to tune. |
trainingJobDefinition.hyperParameterRanges.continuousParameterRanges.[].minValue Optional | string |
trainingJobDefinition.hyperParameterRanges.continuousParameterRanges.[].name Optional | string |
trainingJobDefinition.hyperParameterRanges.continuousParameterRanges.[].scalingType Optional | string |
trainingJobDefinition.hyperParameterRanges.integerParameterRanges Optional | array |
trainingJobDefinition.hyperParameterRanges.integerParameterRanges.[] Required | object For a hyperparameter of the integer type, specifies the range that a hyperparameter tuning job searches. |
trainingJobDefinition.hyperParameterRanges.integerParameterRanges.[].minValue Optional | string |
trainingJobDefinition.hyperParameterRanges.integerParameterRanges.[].name Optional | string |
trainingJobDefinition.hyperParameterRanges.integerParameterRanges.[].scalingType Optional | string |
trainingJobDefinition.inputDataConfig Optional | array |
trainingJobDefinition.inputDataConfig.[] Required | object A channel is a named input source that training algorithms can consume. |
trainingJobDefinition.inputDataConfig.[].compressionType Optional | string |
trainingJobDefinition.inputDataConfig.[].contentType Optional | string |
trainingJobDefinition.inputDataConfig.[].dataSource Optional | object Describes the location of the channel data. |
trainingJobDefinition.inputDataConfig.[].dataSource.fileSystemDataSource Optional | object Specifies a file system data source for a channel. |
trainingJobDefinition.inputDataConfig.[].dataSource.fileSystemDataSource.directoryPath Optional | string |
trainingJobDefinition.inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemAccessMode Optional | string |
trainingJobDefinition.inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemID Optional | string |
trainingJobDefinition.inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemType Optional | string |
trainingJobDefinition.inputDataConfig.[].dataSource.s3DataSource Optional | object Describes the S3 data source. |
trainingJobDefinition.inputDataConfig.[].dataSource.s3DataSource.attributeNames Optional | array |
trainingJobDefinition.inputDataConfig.[].dataSource.s3DataSource.attributeNames.[] Required | string |
trainingJobDefinition.inputDataConfig.[].dataSource.s3DataSource.instanceGroupNames.[] Required | string |
trainingJobDefinition.inputDataConfig.[].dataSource.s3DataSource.s3DataType Optional | string |
trainingJobDefinition.inputDataConfig.[].dataSource.s3DataSource.s3URI Optional | string |
trainingJobDefinition.inputDataConfig.[].inputMode Optional | string The training input mode that the algorithm supports. For more information about input modes, see Algorithms (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html). Pipe mode If an algorithm supports Pipe mode, Amazon SageMaker streams data directly from Amazon S3 to the container. File mode If an algorithm supports File mode, SageMaker downloads the training data from S3 to the provisioned ML storage volume, and mounts the directory to the Docker volume for the training container. You must provision the ML storage volume with sufficient capacity to accommodate the data downloaded from S3. In addition to the training data, the ML storage volume also stores the output model. The algorithm container uses the ML storage volume to also store intermediate information, if any. For distributed algorithms, training data is distributed uniformly. Your training duration is predictable if the input data objects sizes are approximately the same. SageMaker does not split the files any further for model training. If the object sizes are skewed, training won’t be optimal as the data distribution is also skewed when one host in a training cluster is overloaded, thus becoming a bottleneck in training. FastFile mode If an algorithm supports FastFile mode, SageMaker streams data directly from S3 to the container with no code changes, and provides file system access to the data. Users can author their training script to interact with these files as if they were stored on disk. FastFile mode works best when the data is read sequentially. Augmented manifest files aren’t supported. The startup time is lower when there are fewer files in the S3 bucket provided. |
trainingJobDefinition.inputDataConfig.[].recordWrapperType Optional | string |
trainingJobDefinition.inputDataConfig.[].shuffleConfig Optional | object A configuration for a shuffle option for input data in a channel. If you use S3Prefix for S3DataType, the results of the S3 key prefix matches are shuffled. If you use ManifestFile, the order of the S3 object references in the ManifestFile is shuffled. If you use AugmentedManifestFile, the order of the JSON lines in the AugmentedManifestFile is shuffled. The shuffling order is determined using the Seed value. For Pipe input mode, when ShuffleConfig is specified shuffling is done at the start of every epoch. With large datasets, this ensures that the order of the training data is different for each epoch, and it helps reduce bias and possible overfitting. In a multi-node training job when ShuffleConfig is combined with S3DataDistributionType of ShardedByS3Key, the data is shuffled across nodes so that the content sent to a particular node on the first epoch might be sent to a different node on the second epoch. |
trainingJobDefinition.inputDataConfig.[].shuffleConfig.seed Optional | integer |
trainingJobDefinition.outputDataConfig Optional | object Provides information about how to store model training results (model artifacts). |
trainingJobDefinition.outputDataConfig.kmsKeyID Optional | string |
trainingJobDefinition.outputDataConfig.s3OutputPath Optional | string |
trainingJobDefinition.resourceConfig Optional | object Describes the resources, including machine learning (ML) compute instances and ML storage volumes, to use for model training. |
trainingJobDefinition.resourceConfig.instanceCount Optional | integer |
trainingJobDefinition.resourceConfig.instanceGroups Optional | array |
trainingJobDefinition.resourceConfig.instanceGroups.[] Required | object Defines an instance group for heterogeneous cluster training. When requesting a training job using the CreateTrainingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) API, you can configure multiple instance groups . |
trainingJobDefinition.resourceConfig.instanceGroups.[].instanceGroupName Optional | string |
trainingJobDefinition.resourceConfig.instanceGroups.[].instanceType Optional | string |
trainingJobDefinition.resourceConfig.instanceType Optional | string |
trainingJobDefinition.resourceConfig.keepAlivePeriodInSeconds Optional | integer |
trainingJobDefinition.resourceConfig.volumeKMSKeyID Optional | string |
trainingJobDefinition.resourceConfig.volumeSizeInGB Optional | integer |
trainingJobDefinition.retryStrategy Optional | object The retry strategy to use when a training job fails due to an InternalServerError. RetryStrategy is specified as part of the CreateTrainingJob and CreateHyperParameterTuningJob requests. You can add the StoppingCondition parameter to the request to limit the training time for the complete job. |
trainingJobDefinition.retryStrategy.maximumRetryAttempts Optional | integer |
trainingJobDefinition.roleARN Optional | string |
trainingJobDefinition.staticHyperParameters Optional | object |
trainingJobDefinition.stoppingCondition Optional | object Specifies a limit to how long a model training job or model compilation job can run. It also specifies how long a managed spot training job has to complete. When the job reaches the time limit, SageMaker ends the training or compilation job. Use this API to cap model training costs. To stop a training job, SageMaker sends the algorithm the SIGTERM signal, which delays job termination for 120 seconds. Algorithms can use this 120-second window to save the model artifacts, so the results of training are not lost. The training algorithms provided by SageMaker automatically save the intermediate results of a model training job when possible. This attempt to save artifacts is only a best effort case as model might not be in a state from which it can be saved. For example, if training has just started, the model might not be ready to save. When saved, this intermediate data is a valid model artifact. You can use it to create a model with CreateModel. The Neural Topic Model (NTM) currently does not support saving intermediate model artifacts. When training NTMs, make sure that the maximum runtime is sufficient for the training job to complete. |
trainingJobDefinition.stoppingCondition.maxRuntimeInSeconds Optional | integer |
trainingJobDefinition.stoppingCondition.maxWaitTimeInSeconds Optional | integer |
trainingJobDefinition.tuningObjective Optional | object Defines the objective metric for a hyperparameter tuning job. Hyperparameter tuning uses the value of this metric to evaluate the training jobs it launches, and returns the training job that results in either the highest or lowest value for this metric, depending on the value you specify for the Type parameter. |
trainingJobDefinition.tuningObjective.metricName Optional | string |
**trainingJobDefinition.tuningObjective.type_** Optional | string |
trainingJobDefinition.vpcConfig Optional | object Specifies a VPC that your training jobs and hosted models have access to. Control access to and from your training and model containers by configuring the VPC. For more information, see Protect Endpoints by Using an Amazon Virtual Private Cloud (https://docs.aws.amazon.com/sagemaker/latest/dg/host-vpc.html) and Protect Training Jobs by Using an Amazon Virtual Private Cloud (https://docs.aws.amazon.com/sagemaker/latest/dg/train-vpc.html). |
trainingJobDefinition.vpcConfig.securityGroupIDs Optional | array |
trainingJobDefinition.vpcConfig.securityGroupIDs.[] Required | string |
trainingJobDefinition.vpcConfig.subnets.[] Required | string |
trainingJobDefinitions.[] Required | object Defines the training jobs launched by a hyperparameter tuning job. |
trainingJobDefinitions.[].algorithmSpecification.algorithmName Optional | string |
trainingJobDefinitions.[].algorithmSpecification.metricDefinitions Optional | array |
trainingJobDefinitions.[].algorithmSpecification.metricDefinitions.[] Required | object Specifies a metric that the training algorithm writes to stderr or stdout. SageMakerhyperparameter tuning captures all defined metrics. You specify one metric that a hyperparameter tuning job uses as its objective metric to choose the best training job. |
trainingJobDefinitions.[].algorithmSpecification.metricDefinitions.[].regex Optional | string |
trainingJobDefinitions.[].algorithmSpecification.trainingImage Optional | string |
trainingJobDefinitions.[].algorithmSpecification.trainingInputMode Optional | string The training input mode that the algorithm supports. For more information about input modes, see Algorithms (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html). Pipe mode If an algorithm supports Pipe mode, Amazon SageMaker streams data directly from Amazon S3 to the container. File mode If an algorithm supports File mode, SageMaker downloads the training data from S3 to the provisioned ML storage volume, and mounts the directory to the Docker volume for the training container. You must provision the ML storage volume with sufficient capacity to accommodate the data downloaded from S3. In addition to the training data, the ML storage volume also stores the output model. The algorithm container uses the ML storage volume to also store intermediate information, if any. For distributed algorithms, training data is distributed uniformly. Your training duration is predictable if the input data objects sizes are approximately the same. SageMaker does not split the files any further for model training. If the object sizes are skewed, training won’t be optimal as the data distribution is also skewed when one host in a training cluster is overloaded, thus becoming a bottleneck in training. FastFile mode If an algorithm supports FastFile mode, SageMaker streams data directly from S3 to the container with no code changes, and provides file system access to the data. Users can author their training script to interact with these files as if they were stored on disk. FastFile mode works best when the data is read sequentially. Augmented manifest files aren’t supported. The startup time is lower when there are fewer files in the S3 bucket provided. |
trainingJobDefinitions.[].checkpointConfig Optional | object Contains information about the output location for managed spot training checkpoint data. |
trainingJobDefinitions.[].checkpointConfig.localPath Optional | string |
trainingJobDefinitions.[].checkpointConfig.s3URI Optional | string |
trainingJobDefinitions.[].definitionName Optional | string |
trainingJobDefinitions.[].enableInterContainerTrafficEncryption Optional | boolean |
trainingJobDefinitions.[].enableManagedSpotTraining Optional | boolean |
trainingJobDefinitions.[].enableNetworkIsolation Optional | boolean |
trainingJobDefinitions.[].hyperParameterRanges Optional | object Specifies ranges of integer, continuous, and categorical hyperparameters that a hyperparameter tuning job searches. The hyperparameter tuning job launches training jobs with hyperparameter values within these ranges to find the combination of values that result in the training job with the best performance as measured by the objective metric of the hyperparameter tuning job. The maximum number of items specified for Array Members refers to the maximum number of hyperparameters for each range and also the maximum for the hyperparameter tuning job itself. That is, the sum of the number of hyperparameters for all the ranges can’t exceed the maximum number specified. |
trainingJobDefinitions.[].hyperParameterRanges.categoricalParameterRanges Optional | array |
trainingJobDefinitions.[].hyperParameterRanges.categoricalParameterRanges.[] Required | object A list of categorical hyperparameters to tune. |
trainingJobDefinitions.[].hyperParameterRanges.categoricalParameterRanges.[].values Optional | array |
trainingJobDefinitions.[].hyperParameterRanges.categoricalParameterRanges.[].values.[] Required | string |
trainingJobDefinitions.[].hyperParameterRanges.continuousParameterRanges.[] Required | object A list of continuous hyperparameters to tune. |
trainingJobDefinitions.[].hyperParameterRanges.continuousParameterRanges.[].minValue Optional | string |
trainingJobDefinitions.[].hyperParameterRanges.continuousParameterRanges.[].name Optional | string |
trainingJobDefinitions.[].hyperParameterRanges.continuousParameterRanges.[].scalingType Optional | string |
trainingJobDefinitions.[].hyperParameterRanges.integerParameterRanges Optional | array |
trainingJobDefinitions.[].hyperParameterRanges.integerParameterRanges.[] Required | object For a hyperparameter of the integer type, specifies the range that a hyperparameter tuning job searches. |
trainingJobDefinitions.[].hyperParameterRanges.integerParameterRanges.[].minValue Optional | string |
trainingJobDefinitions.[].hyperParameterRanges.integerParameterRanges.[].name Optional | string |
trainingJobDefinitions.[].hyperParameterRanges.integerParameterRanges.[].scalingType Optional | string |
trainingJobDefinitions.[].inputDataConfig Optional | array |
trainingJobDefinitions.[].inputDataConfig.[] Required | object A channel is a named input source that training algorithms can consume. |
trainingJobDefinitions.[].inputDataConfig.[].compressionType Optional | string |
trainingJobDefinitions.[].inputDataConfig.[].contentType Optional | string |
trainingJobDefinitions.[].inputDataConfig.[].dataSource Optional | object Describes the location of the channel data. |
trainingJobDefinitions.[].inputDataConfig.[].dataSource.fileSystemDataSource Optional | object Specifies a file system data source for a channel. |
trainingJobDefinitions.[].inputDataConfig.[].dataSource.fileSystemDataSource.directoryPath Optional | string |
trainingJobDefinitions.[].inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemAccessMode Optional | string |
trainingJobDefinitions.[].inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemID Optional | string |
trainingJobDefinitions.[].inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemType Optional | string |
trainingJobDefinitions.[].inputDataConfig.[].dataSource.s3DataSource Optional | object Describes the S3 data source. |
trainingJobDefinitions.[].inputDataConfig.[].dataSource.s3DataSource.attributeNames Optional | array |
trainingJobDefinitions.[].inputDataConfig.[].dataSource.s3DataSource.attributeNames.[] Required | string |
trainingJobDefinitions.[].inputDataConfig.[].dataSource.s3DataSource.instanceGroupNames.[] Required | string |
trainingJobDefinitions.[].inputDataConfig.[].dataSource.s3DataSource.s3DataType Optional | string |
trainingJobDefinitions.[].inputDataConfig.[].dataSource.s3DataSource.s3URI Optional | string |
trainingJobDefinitions.[].inputDataConfig.[].inputMode Optional | string The training input mode that the algorithm supports. For more information about input modes, see Algorithms (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html). Pipe mode If an algorithm supports Pipe mode, Amazon SageMaker streams data directly from Amazon S3 to the container. File mode If an algorithm supports File mode, SageMaker downloads the training data from S3 to the provisioned ML storage volume, and mounts the directory to the Docker volume for the training container. You must provision the ML storage volume with sufficient capacity to accommodate the data downloaded from S3. In addition to the training data, the ML storage volume also stores the output model. The algorithm container uses the ML storage volume to also store intermediate information, if any. For distributed algorithms, training data is distributed uniformly. Your training duration is predictable if the input data objects sizes are approximately the same. SageMaker does not split the files any further for model training. If the object sizes are skewed, training won’t be optimal as the data distribution is also skewed when one host in a training cluster is overloaded, thus becoming a bottleneck in training. FastFile mode If an algorithm supports FastFile mode, SageMaker streams data directly from S3 to the container with no code changes, and provides file system access to the data. Users can author their training script to interact with these files as if they were stored on disk. FastFile mode works best when the data is read sequentially. Augmented manifest files aren’t supported. The startup time is lower when there are fewer files in the S3 bucket provided. |
trainingJobDefinitions.[].inputDataConfig.[].recordWrapperType Optional | string |
trainingJobDefinitions.[].inputDataConfig.[].shuffleConfig Optional | object A configuration for a shuffle option for input data in a channel. If you use S3Prefix for S3DataType, the results of the S3 key prefix matches are shuffled. If you use ManifestFile, the order of the S3 object references in the ManifestFile is shuffled. If you use AugmentedManifestFile, the order of the JSON lines in the AugmentedManifestFile is shuffled. The shuffling order is determined using the Seed value. For Pipe input mode, when ShuffleConfig is specified shuffling is done at the start of every epoch. With large datasets, this ensures that the order of the training data is different for each epoch, and it helps reduce bias and possible overfitting. In a multi-node training job when ShuffleConfig is combined with S3DataDistributionType of ShardedByS3Key, the data is shuffled across nodes so that the content sent to a particular node on the first epoch might be sent to a different node on the second epoch. |
trainingJobDefinitions.[].inputDataConfig.[].shuffleConfig.seed Optional | integer |
trainingJobDefinitions.[].outputDataConfig Optional | object Provides information about how to store model training results (model artifacts). |
trainingJobDefinitions.[].outputDataConfig.kmsKeyID Optional | string |
trainingJobDefinitions.[].outputDataConfig.s3OutputPath Optional | string |
trainingJobDefinitions.[].resourceConfig Optional | object Describes the resources, including machine learning (ML) compute instances and ML storage volumes, to use for model training. |
trainingJobDefinitions.[].resourceConfig.instanceCount Optional | integer |
trainingJobDefinitions.[].resourceConfig.instanceGroups Optional | array |
trainingJobDefinitions.[].resourceConfig.instanceGroups.[] Required | object Defines an instance group for heterogeneous cluster training. When requesting a training job using the CreateTrainingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) API, you can configure multiple instance groups . |
trainingJobDefinitions.[].resourceConfig.instanceGroups.[].instanceGroupName Optional | string |
trainingJobDefinitions.[].resourceConfig.instanceGroups.[].instanceType Optional | string |
trainingJobDefinitions.[].resourceConfig.instanceType Optional | string |
trainingJobDefinitions.[].resourceConfig.keepAlivePeriodInSeconds Optional | integer |
trainingJobDefinitions.[].resourceConfig.volumeKMSKeyID Optional | string |
trainingJobDefinitions.[].resourceConfig.volumeSizeInGB Optional | integer |
trainingJobDefinitions.[].retryStrategy Optional | object The retry strategy to use when a training job fails due to an InternalServerError. RetryStrategy is specified as part of the CreateTrainingJob and CreateHyperParameterTuningJob requests. You can add the StoppingCondition parameter to the request to limit the training time for the complete job. |
trainingJobDefinitions.[].retryStrategy.maximumRetryAttempts Optional | integer |
trainingJobDefinitions.[].roleARN Optional | string |
trainingJobDefinitions.[].staticHyperParameters Optional | object |
trainingJobDefinitions.[].stoppingCondition Optional | object Specifies a limit to how long a model training job or model compilation job can run. It also specifies how long a managed spot training job has to complete. When the job reaches the time limit, SageMaker ends the training or compilation job. Use this API to cap model training costs. To stop a training job, SageMaker sends the algorithm the SIGTERM signal, which delays job termination for 120 seconds. Algorithms can use this 120-second window to save the model artifacts, so the results of training are not lost. The training algorithms provided by SageMaker automatically save the intermediate results of a model training job when possible. This attempt to save artifacts is only a best effort case as model might not be in a state from which it can be saved. For example, if training has just started, the model might not be ready to save. When saved, this intermediate data is a valid model artifact. You can use it to create a model with CreateModel. The Neural Topic Model (NTM) currently does not support saving intermediate model artifacts. When training NTMs, make sure that the maximum runtime is sufficient for the training job to complete. |
trainingJobDefinitions.[].stoppingCondition.maxRuntimeInSeconds Optional | integer |
trainingJobDefinitions.[].stoppingCondition.maxWaitTimeInSeconds Optional | integer |
trainingJobDefinitions.[].tuningObjective Optional | object Defines the objective metric for a hyperparameter tuning job. Hyperparameter tuning uses the value of this metric to evaluate the training jobs it launches, and returns the training job that results in either the highest or lowest value for this metric, depending on the value you specify for the Type parameter. |
trainingJobDefinitions.[].tuningObjective.metricName Optional | string |
**trainingJobDefinitions.[].tuningObjective.type_** Optional | string |
trainingJobDefinitions.[].vpcConfig Optional | object Specifies a VPC that your training jobs and hosted models have access to. Control access to and from your training and model containers by configuring the VPC. For more information, see Protect Endpoints by Using an Amazon Virtual Private Cloud (https://docs.aws.amazon.com/sagemaker/latest/dg/host-vpc.html) and Protect Training Jobs by Using an Amazon Virtual Private Cloud (https://docs.aws.amazon.com/sagemaker/latest/dg/train-vpc.html). |
trainingJobDefinitions.[].vpcConfig.securityGroupIDs Optional | array |
trainingJobDefinitions.[].vpcConfig.securityGroupIDs.[] Required | string |
trainingJobDefinitions.[].vpcConfig.subnets.[] Required | string |
warmStartConfig.parentHyperParameterTuningJobs Optional | array |
warmStartConfig.parentHyperParameterTuningJobs.[] Required | object A previously completed or stopped hyperparameter tuning job to be used as a starting point for a new hyperparameter tuning job. |
warmStartConfig.warmStartType Optional | string |
Status
ackResourceMetadata:
arn: string
ownerAccountID: string
region: string
bestTrainingJob:
creationTime: string
failureReason: string
finalHyperParameterTuningJobObjectiveMetric:
metricName: string
type_: string
value: number
objectiveStatus: string
trainingEndTime: string
trainingJobARN: string
trainingJobDefinitionName: string
trainingJobName: string
trainingJobStatus: string
trainingStartTime: string
tunedHyperParameters: {}
tuningJobName: string
conditions:
- lastTransitionTime: string
message: string
reason: string
status: string
type: string
failureReason: string
hyperParameterTuningJobStatus: string
overallBestTrainingJob:
creationTime: string
failureReason: string
finalHyperParameterTuningJobObjectiveMetric:
metricName: string
type_: string
value: number
objectiveStatus: string
trainingEndTime: string
trainingJobARN: string
trainingJobDefinitionName: string
trainingJobName: string
trainingJobStatus: string
trainingStartTime: string
tunedHyperParameters: {}
tuningJobName: string
Field | Description |
---|---|
ackResourceMetadata Optional | object All CRs managed by ACK have a common Status.ACKResourceMetadata member that is used to contain resource sync state, account ownership, constructed ARN for the resource |
ackResourceMetadata.arn Optional | string ARN is the Amazon Resource Name for the resource. This is a globally-unique identifier and is set only by the ACK service controller once the controller has orchestrated the creation of the resource OR when it has verified that an “adopted” resource (a resource where the ARN annotation was set by the Kubernetes user on the CR) exists and matches the supplied CR’s Spec field values. TODO(vijat@): Find a better strategy for resources that do not have ARN in CreateOutputResponse https://github.com/aws/aws-controllers-k8s/issues/270 |
ackResourceMetadata.ownerAccountID Required | string OwnerAccountID is the AWS Account ID of the account that owns the backend AWS service API resource. |
ackResourceMetadata.region Required | string Region is the AWS region in which the resource exists or will exist. |
bestTrainingJob Optional | object A TrainingJobSummary object that describes the training job that completed with the best current HyperParameterTuningJobObjective. |
bestTrainingJob.creationTime Optional | string |
bestTrainingJob.failureReason Optional | string |
bestTrainingJob.finalHyperParameterTuningJobObjectiveMetric Optional | object Shows the latest objective metric emitted by a training job that was launched by a hyperparameter tuning job. You define the objective metric in the HyperParameterTuningJobObjective parameter of HyperParameterTuningJobConfig. |
bestTrainingJob.finalHyperParameterTuningJobObjectiveMetric.metricName Optional | string |
**bestTrainingJob.finalHyperParameterTuningJobObjectiveMetric.type_** Optional | string |
bestTrainingJob.finalHyperParameterTuningJobObjectiveMetric.value Optional | number |
bestTrainingJob.objectiveStatus Optional | string |
bestTrainingJob.trainingEndTime Optional | string |
bestTrainingJob.trainingJobARN Optional | string |
bestTrainingJob.trainingJobDefinitionName Optional | string |
bestTrainingJob.trainingJobName Optional | string |
bestTrainingJob.trainingJobStatus Optional | string |
bestTrainingJob.trainingStartTime Optional | string |
bestTrainingJob.tunedHyperParameters Optional | object |
bestTrainingJob.tuningJobName Optional | string |
conditions Optional | array All CRS managed by ACK have a common Status.Conditions member that contains a collection of ackv1alpha1.Condition objects that describe the various terminal states of the CR and its backend AWS service API resource |
conditions.[] Required | object Condition is the common struct used by all CRDs managed by ACK service controllers to indicate terminal states of the CR and its backend AWS service API resource |
conditions.[].message Optional | string A human readable message indicating details about the transition. |
conditions.[].reason Optional | string The reason for the condition’s last transition. |
conditions.[].status Optional | string Status of the condition, one of True, False, Unknown. |
conditions.[].type Optional | string Type is the type of the Condition |
failureReason Optional | string If the tuning job failed, the reason it failed. |
hyperParameterTuningJobStatus Optional | string The status of the tuning job: InProgress, Completed, Failed, Stopping, or Stopped. |
overallBestTrainingJob Optional | object If the hyperparameter tuning job is an warm start tuning job with a WarmStartType of IDENTICAL_DATA_AND_ALGORITHM, this is the TrainingJobSummary for the training job with the best objective metric value of all training jobs launched by this tuning job and all parent jobs specified for the warm start tuning job. |
overallBestTrainingJob.creationTime Optional | string |
overallBestTrainingJob.failureReason Optional | string |
overallBestTrainingJob.finalHyperParameterTuningJobObjectiveMetric Optional | object Shows the latest objective metric emitted by a training job that was launched by a hyperparameter tuning job. You define the objective metric in the HyperParameterTuningJobObjective parameter of HyperParameterTuningJobConfig. |
overallBestTrainingJob.finalHyperParameterTuningJobObjectiveMetric.metricName Optional | string |
**overallBestTrainingJob.finalHyperParameterTuningJobObjectiveMetric.type_** Optional | string |
overallBestTrainingJob.finalHyperParameterTuningJobObjectiveMetric.value Optional | number |
overallBestTrainingJob.objectiveStatus Optional | string |
overallBestTrainingJob.trainingEndTime Optional | string |
overallBestTrainingJob.trainingJobARN Optional | string |
overallBestTrainingJob.trainingJobDefinitionName Optional | string |
overallBestTrainingJob.trainingJobName Optional | string |
overallBestTrainingJob.trainingJobStatus Optional | string |
overallBestTrainingJob.trainingStartTime Optional | string |
overallBestTrainingJob.tunedHyperParameters Optional | object |
overallBestTrainingJob.tuningJobName Optional | string |