TrainingJob

sagemaker.services.k8s.aws/v1alpha1

TypeLink
GoDocsagemaker-controller/apis/v1alpha1#TrainingJob

Metadata

PropertyValue
ScopeNamespaced
KindTrainingJob
ListKindTrainingJobList
Pluraltrainingjobs
Singulartrainingjob

Contains information about a training job.

Spec

algorithmSpecification: 
  algorithmName: string
  enableSageMakerMetricsTimeSeries: boolean
  metricDefinitions:
  - name: string
    regex: string
  trainingImage: string
  trainingInputMode: string
checkpointConfig: 
  localPath: string
  s3URI: string
debugHookConfig: 
  collectionConfigurations:
  - collectionName: string
    collectionParameters: {}
  hookParameters: {}
  localPath: string
  s3OutputPath: string
debugRuleConfigurations:
- instanceType: string
  localPath: string
  ruleConfigurationName: string
  ruleEvaluatorImage: string
  ruleParameters: {}
  s3OutputPath: string
  volumeSizeInGB: integer
enableInterContainerTrafficEncryption: boolean
enableManagedSpotTraining: boolean
enableNetworkIsolation: boolean
environment: {}
experimentConfig: 
  experimentName: string
  trialComponentDisplayName: string
  trialName: string
hyperParameters: {}
infraCheckConfig: 
  enableInfraCheck: boolean
inputDataConfig:
- channelName: string
  compressionType: string
  contentType: string
  dataSource: 
    fileSystemDataSource: 
      directoryPath: string
      fileSystemAccessMode: string
      fileSystemID: string
      fileSystemType: string
    s3DataSource: 
      attributeNames:
      - string
      instanceGroupNames:
      - string
      s3DataDistributionType: string
      s3DataType: string
      s3URI: string
  inputMode: string
  recordWrapperType: string
  shuffleConfig: 
    seed: integer
outputDataConfig: 
  compressionType: string
  kmsKeyID: string
  s3OutputPath: string
profilerConfig: 
  profilingIntervalInMilliseconds: integer
  profilingParameters: {}
  s3OutputPath: string
profilerRuleConfigurations:
- instanceType: string
  localPath: string
  ruleConfigurationName: string
  ruleEvaluatorImage: string
  ruleParameters: {}
  s3OutputPath: string
  volumeSizeInGB: integer
remoteDebugConfig: 
  enableRemoteDebug: boolean
resourceConfig: 
  instanceCount: integer
  instanceGroups:
  - instanceCount: integer
    instanceGroupName: string
    instanceType: string
  instanceType: string
  keepAlivePeriodInSeconds: integer
  volumeKMSKeyID: string
  volumeSizeInGB: integer
retryStrategy: 
  maximumRetryAttempts: integer
roleARN: string
stoppingCondition: 
  maxPendingTimeInSeconds: integer
  maxRuntimeInSeconds: integer
  maxWaitTimeInSeconds: integer
tags:
- key: string
  value: string
tensorBoardOutputConfig: 
  localPath: string
  s3OutputPath: string
trainingJobName: string
vpcConfig: 
  securityGroupIDs:
  - string
  subnets:
  - string
FieldDescription
algorithmSpecification
Required
object
The registry path of the Docker image that contains the training algorithm
and algorithm-specific metadata, including the input mode. For more information
about algorithms provided by SageMaker, see Algorithms (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html).
For information about providing your own algorithms, see Using Your Own Algorithms
with Amazon SageMaker (https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html).
algorithmSpecification.algorithmName
Optional
string
algorithmSpecification.enableSageMakerMetricsTimeSeries
Optional
boolean
algorithmSpecification.metricDefinitions
Optional
array
algorithmSpecification.metricDefinitions.[]
Required
object
Specifies a metric that the training algorithm writes to stderr or stdout.
You can view these logs to understand how your training job performs and
check for any errors encountered during training. SageMaker hyperparameter
tuning captures all defined metrics. Specify one of the defined metrics to
use as an objective metric using the TuningObjective (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTrainingJobDefinition.html#sagemaker-Type-HyperParameterTrainingJobDefinition-TuningObjective)
parameter in the HyperParameterTrainingJobDefinition API to evaluate job
performance during hyperparameter tuning.
algorithmSpecification.metricDefinitions.[].regex
Optional
string
algorithmSpecification.trainingImage
Optional
string
algorithmSpecification.trainingInputMode
Optional
string
The training input mode that the algorithm supports. For more information
about input modes, see Algorithms (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html).


Pipe mode


If an algorithm supports Pipe mode, Amazon SageMaker streams data directly
from Amazon S3 to the container.


File mode


If an algorithm supports File mode, SageMaker downloads the training data
from S3 to the provisioned ML storage volume, and mounts the directory to
the Docker volume for the training container.


You must provision the ML storage volume with sufficient capacity to accommodate
the data downloaded from S3. In addition to the training data, the ML storage
volume also stores the output model. The algorithm container uses the ML
storage volume to also store intermediate information, if any.


For distributed algorithms, training data is distributed uniformly. Your
training duration is predictable if the input data objects sizes are approximately
the same. SageMaker does not split the files any further for model training.
If the object sizes are skewed, training won’t be optimal as the data distribution
is also skewed when one host in a training cluster is overloaded, thus becoming
a bottleneck in training.


FastFile mode


If an algorithm supports FastFile mode, SageMaker streams data directly from
S3 to the container with no code changes, and provides file system access
to the data. Users can author their training script to interact with these
files as if they were stored on disk.


FastFile mode works best when the data is read sequentially. Augmented manifest
files aren’t supported. The startup time is lower when there are fewer files
in the S3 bucket provided.
checkpointConfig
Optional
object
Contains information about the output location for managed spot training
checkpoint data.
checkpointConfig.localPath
Optional
string
checkpointConfig.s3URI
Optional
string
debugHookConfig
Optional
object
Configuration information for the Amazon SageMaker Debugger hook parameters,
metric and tensor collections, and storage paths. To learn more about how
to configure the DebugHookConfig parameter, see Use the SageMaker and Debugger
Configuration API Operations to Create, Update, and Debug Your Training Job
(https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-createtrainingjob-api.html).
debugHookConfig.collectionConfigurations
Optional
array
debugHookConfig.collectionConfigurations.[]
Required
object
Configuration information for the Amazon SageMaker Debugger output tensor
collections.
debugHookConfig.collectionConfigurations.[].collectionParameters
Optional
object
debugHookConfig.hookParameters
Optional
object
debugHookConfig.localPath
Optional
string
debugHookConfig.s3OutputPath
Optional
string
debugRuleConfigurations
Optional
array
Configuration information for Amazon SageMaker Debugger rules for debugging
output tensors.
debugRuleConfigurations.[]
Required
object
Configuration information for SageMaker Debugger rules for debugging. To
learn more about how to configure the DebugRuleConfiguration parameter, see
Use the SageMaker and Debugger Configuration API Operations to Create, Update,
and Debug Your Training Job (https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-createtrainingjob-api.html).
debugRuleConfigurations.[].localPath
Optional
string
debugRuleConfigurations.[].ruleConfigurationName
Optional
string
debugRuleConfigurations.[].ruleEvaluatorImage
Optional
string
debugRuleConfigurations.[].ruleParameters
Optional
object
debugRuleConfigurations.[].s3OutputPath
Optional
string
debugRuleConfigurations.[].volumeSizeInGB
Optional
integer
enableInterContainerTrafficEncryption
Optional
boolean
To encrypt all communications between ML compute instances in distributed
training, choose True. Encryption provides greater security for distributed
training, but training might take longer. How long it takes depends on the
amount of communication between compute instances, especially if you use
a deep learning algorithm in distributed training. For more information,
see Protect Communications Between ML Compute Instances in a Distributed
Training Job (https://docs.aws.amazon.com/sagemaker/latest/dg/train-encrypt.html).
enableManagedSpotTraining
Optional
boolean
To train models using managed spot training, choose True. Managed spot training
provides a fully managed and scalable infrastructure for training machine
learning models. this option is useful when training jobs can be interrupted
and when there is flexibility when the training job is run.


The complete and intermediate results of jobs are stored in an Amazon S3
bucket, and can be used as a starting point to train models incrementally.
Amazon SageMaker provides metrics and logs in CloudWatch. They can be used
to see when managed spot training jobs are running, interrupted, resumed,
or completed.
enableNetworkIsolation
Optional
boolean
Isolates the training container. No inbound or outbound network calls can
be made, except for calls between peers within a training cluster for distributed
training. If you enable network isolation for training jobs that are configured
to use a VPC, SageMaker downloads and uploads customer data and model artifacts
through the specified VPC, but the training container does not have network
access.
environment
Optional
object
The environment variables to set in the Docker container.
experimentConfig
Optional
object
Associates a SageMaker job as a trial component with an experiment and trial.
Specified when you call the following APIs:


* CreateProcessingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateProcessingJob.html)


* CreateTrainingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html)


* CreateTransformJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html)
experimentConfig.experimentName
Optional
string
experimentConfig.trialComponentDisplayName
Optional
string
experimentConfig.trialName
Optional
string
hyperParameters
Optional
object
Algorithm-specific parameters that influence the quality of the model. You
set hyperparameters before you start the learning process. For a list of
hyperparameters for each training algorithm provided by SageMaker, see Algorithms
(https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html).


You can specify a maximum of 100 hyperparameters. Each hyperparameter is
a key-value pair. Each key and value is limited to 256 characters, as specified
by the Length Constraint.


Do not include any security-sensitive information including account access
IDs, secrets or tokens in any hyperparameter field. If the use of security-sensitive
credentials are detected, SageMaker will reject your training job request
and return an exception error.
infraCheckConfig
Optional
object
Contains information about the infrastructure health check configuration
for the training job.
infraCheckConfig.enableInfraCheck
Optional
boolean
inputDataConfig
Optional
array
An array of Channel objects. Each channel is a named input source. InputDataConfig
describes the input data and its location.


Algorithms can accept input data from one or more channels. For example,
an algorithm might have two channels of input data, training_data and validation_data.
The configuration for each channel provides the S3, EFS, or FSx location
where the input data is stored. It also provides information about the stored
data: the MIME type, compression method, and whether the data is wrapped
in RecordIO format.


Depending on the input mode that the algorithm supports, SageMaker either
copies input data files from an S3 bucket to a local directory in the Docker
container, or makes it available as input streams. For example, if you specify
an EFS location, input data files are available as input streams. They do
not need to be downloaded.


Your input must be in the same Amazon Web Services region as your training
job.
inputDataConfig.[]
Required
object
A channel is a named input source that training algorithms can consume.
inputDataConfig.[].compressionType
Optional
string
inputDataConfig.[].contentType
Optional
string
inputDataConfig.[].dataSource
Optional
object
Describes the location of the channel data.
inputDataConfig.[].dataSource.fileSystemDataSource
Optional
object
Specifies a file system data source for a channel.
inputDataConfig.[].dataSource.fileSystemDataSource.directoryPath
Optional
string
inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemAccessMode
Optional
string
inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemID
Optional
string
inputDataConfig.[].dataSource.fileSystemDataSource.fileSystemType
Optional
string
inputDataConfig.[].dataSource.s3DataSource
Optional
object
Describes the S3 data source.


Your input bucket must be in the same Amazon Web Services region as your
training job.
inputDataConfig.[].dataSource.s3DataSource.attributeNames
Optional
array
inputDataConfig.[].dataSource.s3DataSource.attributeNames.[]
Required
string
inputDataConfig.[].dataSource.s3DataSource.instanceGroupNames.[]
Required
string
inputDataConfig.[].dataSource.s3DataSource.s3DataType
Optional
string
inputDataConfig.[].dataSource.s3DataSource.s3URI
Optional
string
inputDataConfig.[].inputMode
Optional
string
The training input mode that the algorithm supports. For more information
about input modes, see Algorithms (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html).


Pipe mode


If an algorithm supports Pipe mode, Amazon SageMaker streams data directly
from Amazon S3 to the container.


File mode


If an algorithm supports File mode, SageMaker downloads the training data
from S3 to the provisioned ML storage volume, and mounts the directory to
the Docker volume for the training container.


You must provision the ML storage volume with sufficient capacity to accommodate
the data downloaded from S3. In addition to the training data, the ML storage
volume also stores the output model. The algorithm container uses the ML
storage volume to also store intermediate information, if any.


For distributed algorithms, training data is distributed uniformly. Your
training duration is predictable if the input data objects sizes are approximately
the same. SageMaker does not split the files any further for model training.
If the object sizes are skewed, training won’t be optimal as the data distribution
is also skewed when one host in a training cluster is overloaded, thus becoming
a bottleneck in training.


FastFile mode


If an algorithm supports FastFile mode, SageMaker streams data directly from
S3 to the container with no code changes, and provides file system access
to the data. Users can author their training script to interact with these
files as if they were stored on disk.


FastFile mode works best when the data is read sequentially. Augmented manifest
files aren’t supported. The startup time is lower when there are fewer files
in the S3 bucket provided.
inputDataConfig.[].recordWrapperType
Optional
string
inputDataConfig.[].shuffleConfig
Optional
object
A configuration for a shuffle option for input data in a channel. If you
use S3Prefix for S3DataType, the results of the S3 key prefix matches are
shuffled. If you use ManifestFile, the order of the S3 object references
in the ManifestFile is shuffled. If you use AugmentedManifestFile, the order
of the JSON lines in the AugmentedManifestFile is shuffled. The shuffling
order is determined using the Seed value.


For Pipe input mode, when ShuffleConfig is specified shuffling is done at
the start of every epoch. With large datasets, this ensures that the order
of the training data is different for each epoch, and it helps reduce bias
and possible overfitting. In a multi-node training job when ShuffleConfig
is combined with S3DataDistributionType of ShardedByS3Key, the data is shuffled
across nodes so that the content sent to a particular node on the first epoch
might be sent to a different node on the second epoch.
inputDataConfig.[].shuffleConfig.seed
Optional
integer
outputDataConfig
Required
object
Specifies the path to the S3 location where you want to store model artifacts.
SageMaker creates subfolders for the artifacts.
outputDataConfig.compressionType
Optional
string
outputDataConfig.kmsKeyID
Optional
string
outputDataConfig.s3OutputPath
Optional
string
profilerConfig
Optional
object
Configuration information for Amazon SageMaker Debugger system monitoring,
framework profiling, and storage paths.
profilerConfig.profilingIntervalInMilliseconds
Optional
integer
profilerConfig.profilingParameters
Optional
object
profilerConfig.s3OutputPath
Optional
string
profilerRuleConfigurations
Optional
array
Configuration information for Amazon SageMaker Debugger rules for profiling
system and framework metrics.
profilerRuleConfigurations.[]
Required
object
Configuration information for profiling rules.
profilerRuleConfigurations.[].localPath
Optional
string
profilerRuleConfigurations.[].ruleConfigurationName
Optional
string
profilerRuleConfigurations.[].ruleEvaluatorImage
Optional
string
profilerRuleConfigurations.[].ruleParameters
Optional
object
profilerRuleConfigurations.[].s3OutputPath
Optional
string
profilerRuleConfigurations.[].volumeSizeInGB
Optional
integer
remoteDebugConfig
Optional
object
Configuration for remote debugging. To learn more about the remote debugging
functionality of SageMaker, see Access a training container through Amazon
Web Services Systems Manager (SSM) for remote debugging (https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-debugging.html).
remoteDebugConfig.enableRemoteDebug
Optional
boolean
resourceConfig
Required
object
The resources, including the ML compute instances and ML storage volumes,
to use for model training.


ML storage volumes store model artifacts and incremental states. Training
algorithms might also use ML storage volumes for scratch space. If you want
SageMaker to use the ML storage volume to store the training data, choose
File as the TrainingInputMode in the algorithm specification. For distributed
training algorithms, specify an instance count greater than 1.
resourceConfig.instanceCount
Optional
integer
resourceConfig.instanceGroups
Optional
array
resourceConfig.instanceGroups.[]
Required
object
Defines an instance group for heterogeneous cluster training. When requesting
a training job using the CreateTrainingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html)
API, you can configure multiple instance groups .
resourceConfig.instanceGroups.[].instanceGroupName
Optional
string
resourceConfig.instanceGroups.[].instanceType
Optional
string
resourceConfig.instanceType
Optional
string
resourceConfig.keepAlivePeriodInSeconds
Optional
integer
Optional. Customer requested period in seconds for which the Training cluster
is kept alive after the job is finished.
resourceConfig.volumeKMSKeyID
Optional
string
resourceConfig.volumeSizeInGB
Optional
integer
retryStrategy
Optional
object
The number of times to retry the job when the job fails due to an InternalServerError.
retryStrategy.maximumRetryAttempts
Optional
integer
roleARN
Required
string
The Amazon Resource Name (ARN) of an IAM role that SageMaker can assume to
perform tasks on your behalf.


During model training, SageMaker needs your permission to read input data
from an S3 bucket, download a Docker image that contains training code, write
model artifacts to an S3 bucket, write logs to Amazon CloudWatch Logs, and
publish metrics to Amazon CloudWatch. You grant permissions for all of these
tasks to an IAM role. For more information, see SageMaker Roles (https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html).


To be able to pass this role to SageMaker, the caller of this API must have
the iam:PassRole permission.
stoppingCondition
Required
object
Specifies a limit to how long a model training job can run. It also specifies
how long a managed Spot training job has to complete. When the job reaches
the time limit, SageMaker ends the training job. Use this API to cap model
training costs.


To stop a job, SageMaker sends the algorithm the SIGTERM signal, which delays
job termination for 120 seconds. Algorithms can use this 120-second window
to save the model artifacts, so the results of training are not lost.
stoppingCondition.maxPendingTimeInSeconds
Optional
integer
Maximum job scheduler pending time in seconds.
stoppingCondition.maxRuntimeInSeconds
Optional
integer
stoppingCondition.maxWaitTimeInSeconds
Optional
integer
tags
Optional
array
An array of key-value pairs. You can use tags to categorize your Amazon Web
Services resources in different ways, for example, by purpose, owner, or
environment. For more information, see Tagging Amazon Web Services Resources
(https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html).
tags.[]
Required
object
A tag object that consists of a key and an optional value, used to manage
metadata for SageMaker Amazon Web Services resources.

You can add tags to notebook instances, training jobs, hyperparameter tuning jobs, batch transform jobs, models, labeling jobs, work teams, endpoint configurations, and endpoints. For more information on adding tags to SageMaker resources, see AddTags (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddTags.html).

For more information on adding metadata to your Amazon Web Services resources with tagging, see Tagging Amazon Web Services resources (https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html). For advice on best practices for managing Amazon Web Services resources with tagging, see Tagging Best Practices: Implement an Effective Amazon Web Services Resource Tagging Strategy (https://d1.awsstatic.com/whitepapers/aws-tagging-best-practices.pdf). || tags.[].key
Optional | string
| | tags.[].value
Optional | string
| | tensorBoardOutputConfig
Optional | object
Configuration of storage locations for the Amazon SageMaker Debugger TensorBoard
output data. | | tensorBoardOutputConfig.localPath
Optional | string
| | tensorBoardOutputConfig.s3OutputPath
Optional | string
| | trainingJobName
Required | string
The name of the training job. The name must be unique within an Amazon Web
Services Region in an Amazon Web Services account. | | vpcConfig
Optional | object
A VpcConfig (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_VpcConfig.html)
object that specifies the VPC that you want your training job to connect
to. Control access to and from your training container by configuring the
VPC. For more information, see Protect Training Jobs by Using an Amazon Virtual
Private Cloud (https://docs.aws.amazon.com/sagemaker/latest/dg/train-vpc.html). | | vpcConfig.securityGroupIDs
Optional | array
| | vpcConfig.securityGroupIDs.[]
Required | string
|| vpcConfig.subnets
Optional | array
| | vpcConfig.subnets.[]
Required | string
|

Status

ackResourceMetadata: 
  arn: string
  ownerAccountID: string
  region: string
conditions:
- lastTransitionTime: string
  message: string
  reason: string
  status: string
  type: string
creationTime: string
debugRuleEvaluationStatuses:
- lastModifiedTime: string
  ruleConfigurationName: string
  ruleEvaluationJobARN: string
  ruleEvaluationStatus: string
  statusDetails: string
failureReason: string
lastModifiedTime: string
modelArtifacts: 
  s3ModelArtifacts: string
profilerRuleEvaluationStatuses:
- lastModifiedTime: string
  ruleConfigurationName: string
  ruleEvaluationJobARN: string
  ruleEvaluationStatus: string
  statusDetails: string
profilingStatus: string
secondaryStatus: string
trainingJobStatus: string
warmPoolStatus: 
  resourceRetainedBillableTimeInSeconds: integer
  reusedByJob: string
  status: string
FieldDescription
ackResourceMetadata
Optional
object
All CRs managed by ACK have a common Status.ACKResourceMetadata member
that is used to contain resource sync state, account ownership,
constructed ARN for the resource
ackResourceMetadata.arn
Optional
string
ARN is the Amazon Resource Name for the resource. This is a
globally-unique identifier and is set only by the ACK service controller
once the controller has orchestrated the creation of the resource OR
when it has verified that an “adopted” resource (a resource where the
ARN annotation was set by the Kubernetes user on the CR) exists and
matches the supplied CR’s Spec field values.
TODO(vijat@): Find a better strategy for resources that do not have ARN in CreateOutputResponse
https://github.com/aws/aws-controllers-k8s/issues/270
ackResourceMetadata.ownerAccountID
Required
string
OwnerAccountID is the AWS Account ID of the account that owns the
backend AWS service API resource.
ackResourceMetadata.region
Required
string
Region is the AWS region in which the resource exists or will exist.
conditions
Optional
array
All CRS managed by ACK have a common Status.Conditions member that
contains a collection of ackv1alpha1.Condition objects that describe
the various terminal states of the CR and its backend AWS service API
resource
conditions.[]
Required
object
Condition is the common struct used by all CRDs managed by ACK service
controllers to indicate terminal states of the CR and its backend AWS
service API resource
conditions.[].message
Optional
string
A human readable message indicating details about the transition.
conditions.[].reason
Optional
string
The reason for the condition’s last transition.
conditions.[].status
Optional
string
Status of the condition, one of True, False, Unknown.
conditions.[].type
Optional
string
Type is the type of the Condition
creationTime
Optional
string
A timestamp that indicates when the training job was created.
debugRuleEvaluationStatuses
Optional
array
Evaluation status of Amazon SageMaker Debugger rules for debugging on a training
job.
debugRuleEvaluationStatuses.[]
Required
object
Information about the status of the rule evaluation.
debugRuleEvaluationStatuses.[].ruleConfigurationName
Optional
string
debugRuleEvaluationStatuses.[].ruleEvaluationJobARN
Optional
string
debugRuleEvaluationStatuses.[].ruleEvaluationStatus
Optional
string
debugRuleEvaluationStatuses.[].statusDetails
Optional
string
failureReason
Optional
string
If the training job failed, the reason it failed.
lastModifiedTime
Optional
string
A timestamp that indicates when the status of the training job was last modified.
modelArtifacts
Optional
object
Information about the Amazon S3 location that is configured for storing model
artifacts.
modelArtifacts.s3ModelArtifacts
Optional
string
profilerRuleEvaluationStatuses
Optional
array
Evaluation status of Amazon SageMaker Debugger rules for profiling on a training
job.
profilerRuleEvaluationStatuses.[]
Required
object
Information about the status of the rule evaluation.
profilerRuleEvaluationStatuses.[].ruleConfigurationName
Optional
string
profilerRuleEvaluationStatuses.[].ruleEvaluationJobARN
Optional
string
profilerRuleEvaluationStatuses.[].ruleEvaluationStatus
Optional
string
profilerRuleEvaluationStatuses.[].statusDetails
Optional
string
profilingStatus
Optional
string
Profiling status of a training job.
secondaryStatus
Optional
string
Provides detailed information about the state of the training job. For detailed
information on the secondary status of the training job, see StatusMessage
under SecondaryStatusTransition (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_SecondaryStatusTransition.html).


SageMaker provides primary statuses and secondary statuses that apply to
each of them:


InProgress


* Starting - Starting the training job.


* Downloading - An optional stage for algorithms that support File training
input mode. It indicates that data is being downloaded to the ML storage
volumes.


* Training - Training is in progress.


* Interrupted - The job stopped because the managed spot training instances
were interrupted.


* Uploading - Training is complete and the model artifacts are being uploaded
to the S3 location.


Completed


* Completed - The training job has completed.


Failed


* Failed - The training job has failed. The reason for the failure is
returned in the FailureReason field of DescribeTrainingJobResponse.


Stopped


* MaxRuntimeExceeded - The job stopped because it exceeded the maximum
allowed runtime.


* MaxWaitTimeExceeded - The job stopped because it exceeded the maximum
allowed wait time.


* Stopped - The training job has stopped.


Stopping


* Stopping - Stopping the training job.


Valid values for SecondaryStatus are subject to change.


We no longer support the following secondary statuses:


* LaunchingMLInstances


* PreparingTraining


* DownloadingTrainingImage
trainingJobStatus
Optional
string
The status of the training job.


SageMaker provides the following training job statuses:


* InProgress - The training is in progress.


* Completed - The training job has completed.


* Failed - The training job has failed. To see the reason for the failure,
see the FailureReason field in the response to a DescribeTrainingJobResponse
call.


* Stopping - The training job is stopping.


* Stopped - The training job has stopped.


For more detailed information, see SecondaryStatus.
warmPoolStatus
Optional
object
The status of the warm pool associated with the training job.
warmPoolStatus.resourceRetainedBillableTimeInSeconds
Optional
integer
Optional. Indicates how many seconds the resource stayed in ResourceRetained
state. Populated only after resource reaches ResourceReused or ResourceReleased
state.
warmPoolStatus.reusedByJob
Optional
string
warmPoolStatus.status
Optional
string