TransformJob

sagemaker.services.k8s.aws/v1alpha1

TypeLink
GoDocsagemaker-controller/apis/v1alpha1#TransformJob

Metadata

PropertyValue
ScopeNamespaced
KindTransformJob
ListKindTransformJobList
Pluraltransformjobs
Singulartransformjob

A batch transform job. For information about SageMaker batch transform, see Use Batch Transform (https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html).

Spec

batchStrategy: string
dataProcessing: 
  inputFilter: string
  joinSource: string
  outputFilter: string
environment: {}
experimentConfig: 
  experimentName: string
  trialComponentDisplayName: string
  trialName: string
maxConcurrentTransforms: integer
maxPayloadInMB: integer
modelClientConfig: 
  invocationsMaxRetries: integer
  invocationsTimeoutInSeconds: integer
modelName: string
tags:
- key: string
  value: string
transformInput: 
  compressionType: string
  contentType: string
  dataSource: 
    s3DataSource: 
      s3DataType: string
      s3URI: string
  splitType: string
transformJobName: string
transformOutput: 
  accept: string
  assembleWith: string
  kmsKeyID: string
  s3OutputPath: string
transformResources: 
  instanceCount: integer
  instanceType: string
  volumeKMSKeyID: string
FieldDescription
batchStrategy
Optional
string
Specifies the number of records to include in a mini-batch for an HTTP inference
request. A record is a single unit of input data that inference can be made
on. For example, a single line in a CSV file is a record.


To enable the batch strategy, you must set the SplitType property to Line,
RecordIO, or TFRecord.


To use only one record when making an HTTP invocation request to a container,
set BatchStrategy to SingleRecord and SplitType to Line.


To fit as many records in a mini-batch as can fit within the MaxPayloadInMB
limit, set BatchStrategy to MultiRecord and SplitType to Line.
dataProcessing
Optional
object
The data structure used to specify the data to be used for inference in a
batch transform job and to associate the data that is relevant to the prediction
results in the output. The input filter provided allows you to exclude input
data that is not needed for inference in a batch transform job. The output
filter provided allows you to include input data relevant to interpreting
the predictions in the output from the job. For more information, see Associate
Prediction Results with their Corresponding Input Records (https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform-data-processing.html).
dataProcessing.inputFilter
Optional
string
dataProcessing.joinSource
Optional
string
dataProcessing.outputFilter
Optional
string
environment
Optional
object
The environment variables to set in the Docker container. We support up to
16 key and values entries in the map.
experimentConfig
Optional
object
Associates a SageMaker job as a trial component with an experiment and trial.
Specified when you call the following APIs:


* CreateProcessingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateProcessingJob.html)


* CreateTrainingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html)


* CreateTransformJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html)
experimentConfig.experimentName
Optional
string
experimentConfig.trialComponentDisplayName
Optional
string
experimentConfig.trialName
Optional
string
maxConcurrentTransforms
Optional
integer
The maximum number of parallel requests that can be sent to each instance
in a transform job. If MaxConcurrentTransforms is set to 0 or left unset,
Amazon SageMaker checks the optional execution-parameters to determine the
settings for your chosen algorithm. If the execution-parameters endpoint
is not enabled, the default value is 1. For more information on execution-parameters,
see How Containers Serve Requests (https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-batch-code.html#your-algorithms-batch-code-how-containe-serves-requests).
For built-in algorithms, you don’t need to set a value for MaxConcurrentTransforms.
maxPayloadInMB
Optional
integer
The maximum allowed size of the payload, in MB. A payload is the data portion
of a record (without metadata). The value in MaxPayloadInMB must be greater
than, or equal to, the size of a single record. To estimate the size of a
record in MB, divide the size of your dataset by the number of records. To
ensure that the records fit within the maximum payload size, we recommend
using a slightly larger value. The default value is 6 MB.


The value of MaxPayloadInMB cannot be greater than 100 MB. If you specify
the MaxConcurrentTransforms parameter, the value of (MaxConcurrentTransforms
* MaxPayloadInMB) also cannot exceed 100 MB.


For cases where the payload might be arbitrarily large and is transmitted
using HTTP chunked encoding, set the value to 0. This feature works only
in supported algorithms. Currently, Amazon SageMaker built-in algorithms
do not support HTTP chunked encoding.
modelClientConfig
Optional
object
Configures the timeout and maximum number of retries for processing a transform
job invocation.
modelClientConfig.invocationsMaxRetries
Optional
integer
modelClientConfig.invocationsTimeoutInSeconds
Optional
integer
modelName
Required
string
The name of the model that you want to use for the transform job. ModelName
must be the name of an existing Amazon SageMaker model within an Amazon Web
Services Region in an Amazon Web Services account.
tags
Optional
array
(Optional) An array of key-value pairs. For more information, see Using Cost
Allocation Tags (https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html#allocation-what)
in the Amazon Web Services Billing and Cost Management User Guide.
tags.[]
Required
object
A tag object that consists of a key and an optional value, used to manage
metadata for SageMaker Amazon Web Services resources.

You can add tags to notebook instances, training jobs, hyperparameter tuning jobs, batch transform jobs, models, labeling jobs, work teams, endpoint configurations, and endpoints. For more information on adding tags to SageMaker resources, see AddTags (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddTags.html).

For more information on adding metadata to your Amazon Web Services resources with tagging, see Tagging Amazon Web Services resources (https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html). For advice on best practices for managing Amazon Web Services resources with tagging, see Tagging Best Practices: Implement an Effective Amazon Web Services Resource Tagging Strategy (https://d1.awsstatic.com/whitepapers/aws-tagging-best-practices.pdf). || tags.[].key
Optional | string
| | tags.[].value
Optional | string
| | transformInput
Required | object
Describes the input source and the way the transform job consumes it. | | transformInput.compressionType
Optional | string
| | transformInput.contentType
Optional | string
| | transformInput.dataSource
Optional | object
Describes the location of the channel data. | | transformInput.dataSource.s3DataSource
Optional | object
Describes the S3 data source. | | transformInput.dataSource.s3DataSource.s3DataType
Optional | string
| | transformInput.dataSource.s3DataSource.s3URI
Optional | string
| | transformInput.splitType
Optional | string
| | transformJobName
Required | string
The name of the transform job. The name must be unique within an Amazon Web
Services Region in an Amazon Web Services account. | | transformOutput
Required | object
Describes the results of the transform job. | | transformOutput.accept
Optional | string
| | transformOutput.assembleWith
Optional | string
| | transformOutput.kmsKeyID
Optional | string
| | transformOutput.s3OutputPath
Optional | string
| | transformResources
Required | object
Describes the resources, including ML instance types and ML instance count,
to use for the transform job. | | transformResources.instanceCount
Optional | integer
| | transformResources.instanceType
Optional | string
| | transformResources.volumeKMSKeyID
Optional | string
|

Status

ackResourceMetadata: 
  arn: string
  ownerAccountID: string
  region: string
conditions:
- lastTransitionTime: string
  message: string
  reason: string
  status: string
  type: string
failureReason: string
transformJobStatus: string
FieldDescription
ackResourceMetadata
Optional
object
All CRs managed by ACK have a common Status.ACKResourceMetadata member
that is used to contain resource sync state, account ownership,
constructed ARN for the resource
ackResourceMetadata.arn
Optional
string
ARN is the Amazon Resource Name for the resource. This is a
globally-unique identifier and is set only by the ACK service controller
once the controller has orchestrated the creation of the resource OR
when it has verified that an “adopted” resource (a resource where the
ARN annotation was set by the Kubernetes user on the CR) exists and
matches the supplied CR’s Spec field values.
TODO(vijat@): Find a better strategy for resources that do not have ARN in CreateOutputResponse
https://github.com/aws/aws-controllers-k8s/issues/270
ackResourceMetadata.ownerAccountID
Required
string
OwnerAccountID is the AWS Account ID of the account that owns the
backend AWS service API resource.
ackResourceMetadata.region
Required
string
Region is the AWS region in which the resource exists or will exist.
conditions
Optional
array
All CRS managed by ACK have a common Status.Conditions member that
contains a collection of ackv1alpha1.Condition objects that describe
the various terminal states of the CR and its backend AWS service API
resource
conditions.[]
Required
object
Condition is the common struct used by all CRDs managed by ACK service
controllers to indicate terminal states of the CR and its backend AWS
service API resource
conditions.[].message
Optional
string
A human readable message indicating details about the transition.
conditions.[].reason
Optional
string
The reason for the condition’s last transition.
conditions.[].status
Optional
string
Status of the condition, one of True, False, Unknown.
conditions.[].type
Optional
string
Type is the type of the Condition
failureReason
Optional
string
If the transform job failed, FailureReason describes why it failed. A transform
job creates a log file, which includes error messages, and stores it as an
Amazon S3 object. For more information, see Log Amazon SageMaker Events with
Amazon CloudWatch (https://docs.aws.amazon.com/sagemaker/latest/dg/logging-cloudwatch.html).
transformJobStatus
Optional
string
The status of the transform job. If the transform job failed, the reason
is returned in the FailureReason field.