ProcessingJob

sagemaker.services.k8s.aws/v1alpha1

TypeLink
GoDocsagemaker-controller/apis/v1alpha1#ProcessingJob

Metadata

PropertyValue
ScopeNamespaced
KindProcessingJob
ListKindProcessingJobList
Pluralprocessingjobs
Singularprocessingjob

An Amazon SageMaker processing job that is used to analyze data and evaluate models. For more information, see Process Data and Evaluate Models (https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html).

Spec

appSpecification: 
  containerArguments:
  - string
  containerEntrypoint:
  - string
  imageURI: string
environment: {}
experimentConfig: 
  experimentName: string
  trialComponentDisplayName: string
  trialName: string
networkConfig: 
  enableInterContainerTrafficEncryption: boolean
  enableNetworkIsolation: boolean
  vpcConfig: 
    securityGroupIDs:
    - string
    subnets:
    - string
processingInputs:
- appManaged: boolean
  datasetDefinition: 
    athenaDatasetDefinition: 
      catalog: string
      database: string
      kmsKeyID: string
      outputCompression: string
      outputFormat: string
      outputS3URI: string
      queryString: string
      workGroup: string
    dataDistributionType: string
    inputMode: string
    localPath: string
    redshiftDatasetDefinition: 
      clusterID: string
      clusterRoleARN: string
      database: string
      dbUser: string
      kmsKeyID: string
      outputCompression: string
      outputFormat: string
      outputS3URI: string
      queryString: string
  inputName: string
  s3Input: 
    localPath: string
    s3CompressionType: string
    s3DataDistributionType: string
    s3DataType: string
    s3InputMode: string
    s3URI: string
processingJobName: string
processingOutputConfig: 
  kmsKeyID: string
  outputs:
  - appManaged: boolean
    featureStoreOutput: 
      featureGroupName: string
    outputName: string
    s3Output: 
      localPath: string
      s3URI: string
      s3UploadMode: string
processingResources: 
  clusterConfig: 
    instanceCount: integer
    instanceType: string
    volumeKMSKeyID: string
    volumeSizeInGB: integer
roleARN: string
stoppingCondition: 
  maxRuntimeInSeconds: integer
tags:
- key: string
  value: string
FieldDescription
appSpecification
Required
object
Configures the processing job to run a specified Docker container image.
appSpecification.containerArguments
Optional
array
appSpecification.containerArguments.[]
Required
string
appSpecification.containerEntrypoint.[]
Required
string
environment
Optional
object
The environment variables to set in the Docker container. Up to 100 key and
values entries in the map are supported.
experimentConfig
Optional
object
Associates a SageMaker job as a trial component with an experiment and trial.
Specified when you call the following APIs:


* CreateProcessingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateProcessingJob.html)


* CreateTrainingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html)


* CreateTransformJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html)
experimentConfig.experimentName
Optional
string
experimentConfig.trialComponentDisplayName
Optional
string
experimentConfig.trialName
Optional
string
networkConfig
Optional
object
Networking options for a processing job, such as whether to allow inbound
and outbound network calls to and from processing containers, and the VPC
subnets and security groups to use for VPC-enabled processing jobs.
networkConfig.enableInterContainerTrafficEncryption
Optional
boolean
networkConfig.enableNetworkIsolation
Optional
boolean
networkConfig.vpcConfig
Optional
object
Specifies an Amazon Virtual Private Cloud (VPC) that your SageMaker jobs,
hosted models, and compute resources have access to. You can control access
to and from your resources by configuring a VPC. For more information, see
Give SageMaker Access to Resources in your Amazon VPC (https://docs.aws.amazon.com/sagemaker/latest/dg/infrastructure-give-access.html).
networkConfig.vpcConfig.securityGroupIDs
Optional
array
networkConfig.vpcConfig.securityGroupIDs.[]
Required
string
networkConfig.vpcConfig.subnets.[]
Required
string
processingInputs.[]
Required
object
The inputs for a processing job. The processing input must specify exactly
one of either S3Input or DatasetDefinition types.
processingInputs.[].datasetDefinition
Optional
object
Configuration for Dataset Definition inputs. The Dataset Definition input
must specify exactly one of either AthenaDatasetDefinition or RedshiftDatasetDefinition
types.
processingInputs.[].datasetDefinition.athenaDatasetDefinition
Optional
object
Configuration for Athena Dataset Definition input.
processingInputs.[].datasetDefinition.athenaDatasetDefinition.catalog
Optional
string
The name of the data catalog used in Athena query execution.
processingInputs.[].datasetDefinition.athenaDatasetDefinition.database
Optional
string
The name of the database used in the Athena query execution.
processingInputs.[].datasetDefinition.athenaDatasetDefinition.kmsKeyID
Optional
string
processingInputs.[].datasetDefinition.athenaDatasetDefinition.outputCompression
Optional
string
The compression used for Athena query results.
processingInputs.[].datasetDefinition.athenaDatasetDefinition.outputFormat
Optional
string
The data storage format for Athena query results.
processingInputs.[].datasetDefinition.athenaDatasetDefinition.outputS3URI
Optional
string
processingInputs.[].datasetDefinition.athenaDatasetDefinition.queryString
Optional
string
The SQL query statements, to be executed.
processingInputs.[].datasetDefinition.athenaDatasetDefinition.workGroup
Optional
string
The name of the workgroup in which the Athena query is being started.
processingInputs.[].datasetDefinition.dataDistributionType
Optional
string
processingInputs.[].datasetDefinition.inputMode
Optional
string
processingInputs.[].datasetDefinition.localPath
Optional
string
processingInputs.[].datasetDefinition.redshiftDatasetDefinition
Optional
object
Configuration for Redshift Dataset Definition input.
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.clusterID
Optional
string
The Redshift cluster Identifier.
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.clusterRoleARN
Optional
string
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.database
Optional
string
The name of the Redshift database used in Redshift query execution.
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.dbUser
Optional
string
The database user name used in Redshift query execution.
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.kmsKeyID
Optional
string
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.outputCompression
Optional
string
The compression used for Redshift query results.
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.outputFormat
Optional
string
The data storage format for Redshift query results.
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.outputS3URI
Optional
string
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.queryString
Optional
string
The SQL query statements to be executed.
processingInputs.[].inputName
Optional
string
processingInputs.[].s3Input
Optional
object
Configuration for downloading input data from Amazon S3 into the processing
container.
processingInputs.[].s3Input.localPath
Optional
string
processingInputs.[].s3Input.s3CompressionType
Optional
string
processingInputs.[].s3Input.s3DataDistributionType
Optional
string
processingInputs.[].s3Input.s3DataType
Optional
string
processingInputs.[].s3Input.s3InputMode
Optional
string
processingInputs.[].s3Input.s3URI
Optional
string
processingJobName
Required
string
The name of the processing job. The name must be unique within an Amazon
Web Services Region in the Amazon Web Services account.
processingOutputConfig
Optional
object
Output configuration for the processing job.
processingOutputConfig.kmsKeyID
Optional
string
processingOutputConfig.outputs
Optional
array
processingOutputConfig.outputs.[]
Required
object
Describes the results of a processing job. The processing output must specify
exactly one of either S3Output or FeatureStoreOutput types.
processingOutputConfig.outputs.[].featureStoreOutput
Optional
object
Configuration for processing job outputs in Amazon SageMaker Feature Store.
processingOutputConfig.outputs.[].featureStoreOutput.featureGroupName
Optional
string
processingOutputConfig.outputs.[].outputName
Optional
string
processingOutputConfig.outputs.[].s3Output
Optional
object
Configuration for uploading output data to Amazon S3 from the processing
container.
processingOutputConfig.outputs.[].s3Output.localPath
Optional
string
processingOutputConfig.outputs.[].s3Output.s3URI
Optional
string
processingOutputConfig.outputs.[].s3Output.s3UploadMode
Optional
string
processingResources
Required
object
Identifies the resources, ML compute instances, and ML storage volumes to
deploy for a processing job. In distributed training, you specify more than
one instance.
processingResources.clusterConfig
Optional
object
Configuration for the cluster used to run a processing job.
processingResources.clusterConfig.instanceCount
Optional
integer
processingResources.clusterConfig.instanceType
Optional
string
processingResources.clusterConfig.volumeKMSKeyID
Optional
string
processingResources.clusterConfig.volumeSizeInGB
Optional
integer
roleARN
Required
string
The Amazon Resource Name (ARN) of an IAM role that Amazon SageMaker can assume
to perform tasks on your behalf.
stoppingCondition
Optional
object
The time limit for how long the processing job is allowed to run.
stoppingCondition.maxRuntimeInSeconds
Optional
integer
tags
Optional
array
(Optional) An array of key-value pairs. For more information, see Using Cost
Allocation Tags (https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html#allocation-whatURL)
in the Amazon Web Services Billing and Cost Management User Guide.
tags.[]
Required
object
A tag object that consists of a key and an optional value, used to manage
metadata for SageMaker Amazon Web Services resources.

You can add tags to notebook instances, training jobs, hyperparameter tuning jobs, batch transform jobs, models, labeling jobs, work teams, endpoint configurations, and endpoints. For more information on adding tags to SageMaker resources, see AddTags (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddTags.html).

For more information on adding metadata to your Amazon Web Services resources with tagging, see Tagging Amazon Web Services resources (https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html). For advice on best practices for managing Amazon Web Services resources with tagging, see Tagging Best Practices: Implement an Effective Amazon Web Services Resource Tagging Strategy (https://d1.awsstatic.com/whitepapers/aws-tagging-best-practices.pdf). || tags.[].key
Optional | string
| | tags.[].value
Optional | string
|

Status

ackResourceMetadata: 
  arn: string
  ownerAccountID: string
  region: string
conditions:
- lastTransitionTime: string
  message: string
  reason: string
  status: string
  type: string
failureReason: string
processingJobStatus: string
FieldDescription
ackResourceMetadata
Optional
object
All CRs managed by ACK have a common Status.ACKResourceMetadata member
that is used to contain resource sync state, account ownership,
constructed ARN for the resource
ackResourceMetadata.arn
Optional
string
ARN is the Amazon Resource Name for the resource. This is a
globally-unique identifier and is set only by the ACK service controller
once the controller has orchestrated the creation of the resource OR
when it has verified that an “adopted” resource (a resource where the
ARN annotation was set by the Kubernetes user on the CR) exists and
matches the supplied CR’s Spec field values.
TODO(vijat@): Find a better strategy for resources that do not have ARN in CreateOutputResponse
https://github.com/aws/aws-controllers-k8s/issues/270
ackResourceMetadata.ownerAccountID
Required
string
OwnerAccountID is the AWS Account ID of the account that owns the
backend AWS service API resource.
ackResourceMetadata.region
Required
string
Region is the AWS region in which the resource exists or will exist.
conditions
Optional
array
All CRS managed by ACK have a common Status.Conditions member that
contains a collection of ackv1alpha1.Condition objects that describe
the various terminal states of the CR and its backend AWS service API
resource
conditions.[]
Required
object
Condition is the common struct used by all CRDs managed by ACK service
controllers to indicate terminal states of the CR and its backend AWS
service API resource
conditions.[].message
Optional
string
A human readable message indicating details about the transition.
conditions.[].reason
Optional
string
The reason for the condition’s last transition.
conditions.[].status
Optional
string
Status of the condition, one of True, False, Unknown.
conditions.[].type
Optional
string
Type is the type of the Condition
failureReason
Optional
string
A string, up to one KB in size, that contains the reason a processing job
failed, if it failed.
processingJobStatus
Optional
string
Provides the status of a processing job.