ProcessingJob

sagemaker.services.k8s.aws/v1alpha1

Type	Link
GoDoc	sagemaker-controller/apis/v1alpha1#ProcessingJob

Metadata

Property	Value
Scope	Namespaced
Kind	`ProcessingJob`
ListKind	`ProcessingJobList`
Plural	`processingjobs`
Singular	`processingjob`

An Amazon SageMaker processing job that is used to analyze data and evaluate models. For more information, see Process Data and Evaluate Models (https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html).

Spec

appSpecification: 
  containerArguments:
  - string
  containerEntrypoint:
  - string
  imageURI: string
environment: {}
experimentConfig: 
  experimentName: string
  trialComponentDisplayName: string
  trialName: string
networkConfig: 
  enableInterContainerTrafficEncryption: boolean
  enableNetworkIsolation: boolean
  vpcConfig: 
    securityGroupIDs:
    - string
    subnets:
    - string
processingInputs:
- appManaged: boolean
  datasetDefinition: 
    athenaDatasetDefinition: 
      catalog: string
      database: string
      kmsKeyID: string
      outputCompression: string
      outputFormat: string
      outputS3URI: string
      queryString: string
      workGroup: string
    dataDistributionType: string
    inputMode: string
    localPath: string
    redshiftDatasetDefinition: 
      clusterID: string
      clusterRoleARN: string
      database: string
      dbUser: string
      kmsKeyID: string
      outputCompression: string
      outputFormat: string
      outputS3URI: string
      queryString: string
  inputName: string
  s3Input: 
    localPath: string
    s3CompressionType: string
    s3DataDistributionType: string
    s3DataType: string
    s3InputMode: string
    s3URI: string
processingJobName: string
processingOutputConfig: 
  kmsKeyID: string
  outputs:
  - appManaged: boolean
    featureStoreOutput: 
      featureGroupName: string
    outputName: string
    s3Output: 
      localPath: string
      s3URI: string
      s3UploadMode: string
processingResources: 
  clusterConfig: 
    instanceCount: integer
    instanceType: string
    volumeKMSKeyID: string
    volumeSizeInGB: integer
roleARN: string
stoppingCondition: 
  maxRuntimeInSeconds: integer
tags:
- key: string
  value: string

Field	Description
appSpecification Required	object Configures the processing job to run a specified Docker container image.
appSpecification.containerArguments Optional	array
appSpecification.containerArguments.[] Required	string
appSpecification.containerEntrypoint.[] Required	string
environment Optional	object The environment variables to set in the Docker container. Up to 100 key and values entries in the map are supported.
experimentConfig Optional	object Associates a SageMaker job as a trial component with an experiment and trial. Specified when you call the following APIs: * CreateProcessingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateProcessingJob.html) * CreateTrainingJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) * CreateTransformJob (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html)
experimentConfig.experimentName Optional	string
experimentConfig.trialComponentDisplayName Optional	string
experimentConfig.trialName Optional	string
networkConfig Optional	object Networking options for a processing job, such as whether to allow inbound and outbound network calls to and from processing containers, and the VPC subnets and security groups to use for VPC-enabled processing jobs.
networkConfig.enableInterContainerTrafficEncryption Optional	boolean
networkConfig.enableNetworkIsolation Optional	boolean
networkConfig.vpcConfig Optional	object Specifies an Amazon Virtual Private Cloud (VPC) that your SageMaker jobs, hosted models, and compute resources have access to. You can control access to and from your resources by configuring a VPC. For more information, see Give SageMaker Access to Resources in your Amazon VPC (https://docs.aws.amazon.com/sagemaker/latest/dg/infrastructure-give-access.html).
networkConfig.vpcConfig.securityGroupIDs Optional	array
networkConfig.vpcConfig.securityGroupIDs.[] Required	string
networkConfig.vpcConfig.subnets.[] Required	string
processingInputs.[] Required	object The inputs for a processing job. The processing input must specify exactly
one of either S3Input or DatasetDefinition types.
processingInputs.[].datasetDefinition Optional	object Configuration for Dataset Definition inputs. The Dataset Definition input must specify exactly one of either AthenaDatasetDefinition or RedshiftDatasetDefinition types.
processingInputs.[].datasetDefinition.athenaDatasetDefinition Optional	object Configuration for Athena Dataset Definition input.
processingInputs.[].datasetDefinition.athenaDatasetDefinition.catalog Optional	string The name of the data catalog used in Athena query execution.
processingInputs.[].datasetDefinition.athenaDatasetDefinition.database Optional	string The name of the database used in the Athena query execution.
processingInputs.[].datasetDefinition.athenaDatasetDefinition.kmsKeyID Optional	string
processingInputs.[].datasetDefinition.athenaDatasetDefinition.outputCompression Optional	string The compression used for Athena query results.
processingInputs.[].datasetDefinition.athenaDatasetDefinition.outputFormat Optional	string The data storage format for Athena query results.
processingInputs.[].datasetDefinition.athenaDatasetDefinition.outputS3URI Optional	string
processingInputs.[].datasetDefinition.athenaDatasetDefinition.queryString Optional	string The SQL query statements, to be executed.
processingInputs.[].datasetDefinition.athenaDatasetDefinition.workGroup Optional	string The name of the workgroup in which the Athena query is being started.
processingInputs.[].datasetDefinition.dataDistributionType Optional	string
processingInputs.[].datasetDefinition.inputMode Optional	string
processingInputs.[].datasetDefinition.localPath Optional	string
processingInputs.[].datasetDefinition.redshiftDatasetDefinition Optional	object Configuration for Redshift Dataset Definition input.
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.clusterID Optional	string The Redshift cluster Identifier.
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.clusterRoleARN Optional	string
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.database Optional	string The name of the Redshift database used in Redshift query execution.
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.dbUser Optional	string The database user name used in Redshift query execution.
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.kmsKeyID Optional	string
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.outputCompression Optional	string The compression used for Redshift query results.
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.outputFormat Optional	string The data storage format for Redshift query results.
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.outputS3URI Optional	string
processingInputs.[].datasetDefinition.redshiftDatasetDefinition.queryString Optional	string The SQL query statements to be executed.
processingInputs.[].inputName Optional	string
processingInputs.[].s3Input Optional	object Configuration for downloading input data from Amazon S3 into the processing container.
processingInputs.[].s3Input.localPath Optional	string
processingInputs.[].s3Input.s3CompressionType Optional	string
processingInputs.[].s3Input.s3DataDistributionType Optional	string
processingInputs.[].s3Input.s3DataType Optional	string
processingInputs.[].s3Input.s3InputMode Optional	string
processingInputs.[].s3Input.s3URI Optional	string
processingJobName Required	string The name of the processing job. The name must be unique within an Amazon Web Services Region in the Amazon Web Services account. Regex Pattern: `^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}$`
processingOutputConfig Optional	object Output configuration for the processing job.
processingOutputConfig.kmsKeyID Optional	string
processingOutputConfig.outputs Optional	array
processingOutputConfig.outputs.[] Required	object Describes the results of a processing job. The processing output must specify
exactly one of either S3Output or FeatureStoreOutput types.
processingOutputConfig.outputs.[].featureStoreOutput Optional	object Configuration for processing job outputs in Amazon SageMaker Feature Store.
processingOutputConfig.outputs.[].featureStoreOutput.featureGroupName Optional	string
processingOutputConfig.outputs.[].outputName Optional	string
processingOutputConfig.outputs.[].s3Output Optional	object Configuration for uploading output data to Amazon S3 from the processing container.
processingOutputConfig.outputs.[].s3Output.localPath Optional	string
processingOutputConfig.outputs.[].s3Output.s3URI Optional	string
processingOutputConfig.outputs.[].s3Output.s3UploadMode Optional	string
processingResources Required	object Identifies the resources, ML compute instances, and ML storage volumes to deploy for a processing job. In distributed training, you specify more than one instance.
processingResources.clusterConfig Optional	object Configuration for the cluster used to run a processing job.
processingResources.clusterConfig.instanceCount Optional	integer
processingResources.clusterConfig.instanceType Optional	string
processingResources.clusterConfig.volumeKMSKeyID Optional	string
processingResources.clusterConfig.volumeSizeInGB Optional	integer
roleARN Required	string The Amazon Resource Name (ARN) of an IAM role that Amazon SageMaker can assume to perform tasks on your behalf. Regex Pattern: `^arn:aws[a-z\-]*:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+$`
stoppingCondition Optional	object The time limit for how long the processing job is allowed to run.
stoppingCondition.maxRuntimeInSeconds Optional	integer
tags Optional	array (Optional) An array of key-value pairs. For more information, see Using Cost Allocation Tags (https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html#allocation-whatURL) in the Amazon Web Services Billing and Cost Management User Guide.
tags.[] Required	object A tag object that consists of a key and an optional value, used to manage
metadata for SageMaker Amazon Web Services resources.

You can add tags to notebook instances, training jobs, hyperparameter tuning jobs, batch transform jobs, models, labeling jobs, work teams, endpoint configurations, and endpoints. For more information on adding tags to SageMaker resources, see AddTags (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddTags.html).

For more information on adding metadata to your Amazon Web Services resources with tagging, see Tagging Amazon Web Services resources (https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html). For advice on best practices for managing Amazon Web Services resources with tagging, see Tagging Best Practices: Implement an Effective Amazon Web Services Resource Tagging Strategy (https://d1.awsstatic.com/whitepapers/aws-tagging-best-practices.pdf). || tags.[].key
Optional | string
| | tags.[].value
Optional | string
|

Status

ackResourceMetadata: 
  arn: string
  ownerAccountID: string
  region: string
conditions:
- lastTransitionTime: string
  message: string
  reason: string
  status: string
  type: string
failureReason: string
processingJobStatus: string

Field	Description
ackResourceMetadata Optional	object All CRs managed by ACK have a common `Status.ACKResourceMetadata` member that is used to contain resource sync state, account ownership, constructed ARN for the resource
ackResourceMetadata.arn Optional	string ARN is the Amazon Resource Name for the resource. This is a globally-unique identifier and is set only by the ACK service controller once the controller has orchestrated the creation of the resource OR when it has verified that an “adopted” resource (a resource where the ARN annotation was set by the Kubernetes user on the CR) exists and matches the supplied CR’s Spec field values. https://github.com/aws/aws-controllers-k8s/issues/270
ackResourceMetadata.ownerAccountID Required	string OwnerAccountID is the AWS Account ID of the account that owns the backend AWS service API resource.
ackResourceMetadata.region Required	string Region is the AWS region in which the resource exists or will exist.
conditions Optional	array All CRs managed by ACK have a common `Status.Conditions` member that contains a collection of `ackv1alpha1.Condition` objects that describe the various terminal states of the CR and its backend AWS service API resource
conditions.[] Required	object Condition is the common struct used by all CRDs managed by ACK service
controllers to indicate terminal states of the CR and its backend AWS
service API resource
conditions.[].message Optional	string A human readable message indicating details about the transition.
conditions.[].reason Optional	string The reason for the condition’s last transition.
conditions.[].status Optional	string Status of the condition, one of True, False, Unknown.
conditions.[].type Optional	string Type is the type of the Condition
failureReason Optional	string A string, up to one KB in size, that contains the reason a processing job failed, if it failed.
processingJobStatus Optional	string Provides the status of a processing job.

ProcessingJob

Metadata#

Spec#

Status#

Metadata

Spec

Status