API Inference
This document discusses how ACK introspects an AWS API model file and
determines which CustomResourceDefinition
s (CRDs) to construct and what the
structure of those CRDs look like.
The Kubernetes Resource Model
The Kubernetes Resource Model (KRM) is a set of standards
and naming conventions that govern how an Object
may be created and
updated.
An Object
includes some metadata about the object – a
GroupVersionKind
(GVK), a Name
, a Namespace
, and zero or more Labels
and Annotations
.
In addition to this metadata, each Object
has a Spec
field which is a
struct that contains the desired state of the Object
. Objects
are
typically denoted using YAML, like so:
apiVersion: s3.services.k8s.aws/v1alpha1
kind: Bucket
metadata:
name: my-amazing-bucket
annotations:
pronounced-as: boo-kay
spec:
name: my-amazing-bucket
Above, the Object
has a GVK of “s3.services.k8s.aws/v1alpha1:Bucket” with an
internal-to-Kubernetes Name
of “my-amazing-bucket” and a single
Annotation
key/value pair “pronounced-as: boo-kay”.
The Spec
field is a structure containing desired state fields about this
Bucket. You can see here that there is a Spec.Name
field representing the
Bucket name that will be passed to the S3 CreateBucket API as the name of the
Bucket. Note that the Metadata.Name
field value is the same as the
Spec.Name
field value here, but there’s nothing mandatory about this.
When a Kubernetes user creates an Object
, typically by passing some YAML to
the kubectl create
or kubectl apply
CLI command, the Kubernetes API server
reads the manifest and determines whether the supplied contents are valid.
In order to determine if a manifest is valid, the Kubernetes API server must
look up the definition of the specified GroupVersionKind
. For all of the
resources that ACK is concerned about, what this means is that the Kubernetes
API server will search for the CustomResourceDefinition
(CRD) matching
the GroupVersionKind
.
This CRD describes the fields that comprise Object
s of that particular
GroupVersionKind
– called CustomResources
(CRs).
In the next sections we discuss:
- how ACK determines what will become a CRD
- how ACK determines the fields that go into each CRD’s
Spec
andStatus
Which things become ACK Resources?
As mentioned in the code generation documentation, ACK reads AWS API model files when generating its API types and controller implementations. These model files are JSON files contain some important information about the structure of the AWS service API, including a set of Operation definitions (commonly called “Actions” in the official AWS API documentation) and a set of Shape definitions.
Some AWS APIs have dozens (hundreds even!) of Operations exposed by the API.
Consider EC2’s API. It has over 400 separate Actions. Out of all those
Operations, how are we to tell which ones refer to something that we can model
as a Kubernetes CustomResource
?
Well, we could look at the EC2 API’s list of Operations and manually decide which ones seem “resource-y”. Operations like “AdvertiseByoipCidr” and “AcceptTransitGatewayVpcAttachment” don’t seem very “resource-y”. Operations like “CreateKeyPair” and “DeleteKeyPair”, however, do seem like they would match a resource called “KeyPair”.
And this is actually how ACK decides what is a CustomResource
and what isn’t.
It uses a simple heuristic: look through the list of Operations in the API
model file and filter out the ones that start with the string “Create”. If what
comes after the word “Create” describes a singular noun, then we create a
CustomResource
of that Kind
.
It really is that simple.
How is an ACK Resource Defined?
Let’s take a look at the CRD for ACK’s S3 Bucket (the
s3.services.k8s.aws/Bucket
GroupKind
(GK)) (snipped slightly for brevity):
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: buckets.s3.services.k8s.aws
spec:
group: s3.services.k8s.aws
names:
kind: Bucket
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: Bucket is the Schema for the Buckets API
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
description: BucketSpec defines the desired state of Bucket
properties:
acl:
type: string
createBucketConfiguration:
properties:
locationConstraint:
type: string
type: object
grantFullControl:
type: string
grantRead:
type: string
grantReadACP:
type: string
grantWrite:
type: string
grantWriteACP:
type: string
name:
type: string
objectLockEnabledForBucket:
type: boolean
required:
- name
type: object
status:
description: BucketStatus defines the observed state of Bucket
properties:
ackResourceMetadata:
properties:
arn:
type: string
ownerAccountID:
type: string
required:
- ownerAccountID
type: object
conditions:
items:
properties:
lastTransitionTime:
format: date-time
type: string
message:
type: string
reason:
type: string
status:
type: string
type:
type: string
required:
- status
- type
type: object
type: array
location:
type: string
required:
- ackResourceMetadata
- conditions
type: object
type: object
The above YAML representation of a CustomResourceDefinition
(CRD) is actually
generated from a set of Go type definitions. These Go type definitions live in
each ACK service’s services/$SERVICE/apis/$VERSION
directory.
This section of our documentation discusses how we create those Go type definitions.
controller-gen crd
][cg] CLI command and is a convenient human-readable
representation of the CustomResourceDefinition
.The Bucket CR’s Spec
field is defined above as containing a set of fields –
“acl”, “createBucketConfiguration”, “name”, etc. Each field has a JSONSchema
type that corresponds with the Go type from the associated field member.
You will also notice that in addition to the definition of a Spec
field,
there is also the definition of a Status
field for the Bucket CRs. Above,
this Status
contains fields that represent the “observed” state of the Bucket
CRs. The above shows three fields in the Bucket’s Status
:
ackResourceMetadata
, conditions
and location
.
You might be wondering how the ACK code generator determined which fields go
into the Bucket’s Spec
and which fields go into the Bucket’s Status
?
Well, it’s definitely not a manual process. Everything in ACK is code-generated and discovered by inspecting the AWS API model files.
aws-sdk-go
project. (Look
for the api-2.json
files in the linked service-specific directories…)Let’s take a look at a tiny bit of the AWS S3 API model file and
you can start to see how we identify the things that go into the Spec
and
Status
fields.
{
"metadata":{
"serviceId":"S3",
},
"operations":{
"CreateBucket":{
"name":"CreateBucket",
"http":{
"method":"PUT",
"requestUri":"/{Bucket}"
},
"input":{"shape":"CreateBucketRequest"},
"output":{"shape":"CreateBucketOutput"},
},
},
"shapes":{
"BucketCannedACL":{
"type":"string",
"enum":[
"private",
"public-read",
"public-read-write",
"authenticated-read"
]
},
"BucketName":{"type":"string"},
"CreateBucketConfiguration":{
"type":"structure",
"members":{
"LocationConstraint":{"shape":"BucketLocationConstraint"}
}
},
"CreateBucketOutput":{
"type":"structure",
"members":{
"Location":{
"shape":"Location",
}
}
},
"CreateBucketRequest":{
"type":"structure",
"required":["Bucket"],
"members":{
"ACL":{
"shape":"BucketCannedACL",
},
"Bucket":{
"shape":"BucketName",
},
"CreateBucketConfiguration":{
"shape":"CreateBucketConfiguration",
},
"GrantFullControl":{
"shape":"GrantFullControl",
},
"GrantRead":{
"shape":"GrantRead",
},
"GrantReadACP":{
"shape":"GrantReadACP",
},
"GrantWrite":{
"shape":"GrantWrite",
},
"GrantWriteACP":{
"shape":"GrantWriteACP",
},
"ObjectLockEnabledForBucket":{
"shape":"ObjectLockEnabledForBucket",
}
},
},
}
}
As mentioned above, we determine what things in an API are
CustomResourceDefinition
s by looking for Operation
s that begin with the
string “Create” and where the remainder of the Operation
name refers to a
singular noun.
For the S3 API, there happens to be only a single Operation that begins with
the string “Create”, and it happens to be “CreateBucket”.
And since “Bucket” refers to a singular noun, that is the
CustomResourceDefinition
that is identified by the ACK code generator.
The ACK code generator writes a file apis/v1alpha1/bucket.go
that contains a BucketSpec
struct definition, a BucketStatus
struct
definition and a Bucket
struct definition that ties the Spec and Status
together into our CRD.
In determining the structure of the s3.services.k8s.aws/Bucket
CRD, the ACK
code generator inspects the Shape
s referred to by the “input” and “output”
members of the “CreateBucket” Operation
: “CreateBucketRequest” and
“CreateBucketOutput” respectively.
Determining the Spec fields
For the BucketSpec
fields, we grab members of the Input
shape. The
generated Go type definition for the BucketSpec
ends up looking
like this:
// BucketSpec defines the desired state of Bucket
type BucketSpec struct {
ACL *string `json:"acl,omitempty"`
CreateBucketConfiguration *CreateBucketConfiguration `json:"createBucketConfiguration,omitempty"`
GrantFullControl *string `json:"grantFullControl,omitempty"`
GrantRead *string `json:"grantRead,omitempty"`
GrantReadACP *string `json:"grantReadACP,omitempty"`
GrantWrite *string `json:"grantWrite,omitempty"`
GrantWriteACP *string `json:"grantWriteACP,omitempty"`
// +kubebuilder:validation:Required
Name *string `json:"name"`
ObjectLockEnabledForBucket *bool `json:"objectLockEnabledForBucket,omitempty"`
}
Let’s take a closer look at the BucketSpec
fields.
The ACL
, GrantFullControl
, GrantRead
, GrantReadACP
, GrantWrite
and
GrantWriteACP
fields are simple *string
types. However, if we look at the
CreateBucketRequest
Shape definition in the API model file, we see that these
fields actually are differently-named Shapes, not *string
. Why is this? Well,
the ACK code generator “flattens” some Shapes when it notices that a named
Shape is just an alias for a simple scalar type (like *string
).
*string
?*string
and not string
. The reason for this lies in aws-sdk-go
. All
types for all Shape members are pointer types, even when the underlying
data type is a simple scalar type like bool
or int
. Yes, even when
the field is required…Note that even though the ACL
field has a Shape of BucketCannedACL
, that
Shape is actually just a string
with a set of enumerated values. Enumerated
values are collected and written out by the ACK code generator into an
apis/v1alpha1/enums.go
file, with content like this:
type BucketCannedACL string
const (
BucketCannedACL_private BucketCannedACL = "private"
BucketCannedACL_public_read BucketCannedACL = "public-read"
BucketCannedACL_public_read_write BucketCannedACL = "public-read-write"
BucketCannedACL_authenticated_read BucketCannedACL = "authenticated-read"
)
The CreateBucketConfiguration
field is of type *CreateBucketConfiguration
.
All this means is that the field refers to a nested struct. All struct type
definitions for CRD Spec or Status field members are placed by the ACK code
generator into a apis/v1alpha1/types.go
file.
Here is a snippet of that file that contains the type definition for
the CreateBucketConfiguration
struct:
type CreateBucketConfiguration struct {
LocationConstraint *string `json:"locationConstraint,omitempty"`
}
Now, the Name
field in the BucketSpec
struct seems out of place, no? There
is no “Name” member of the CreateBucketRequest
Shape, so why is there a
Name
field in BucketSpec
?
Well, this is an example of ACK’s code generator using some special
instructions contained in something called the generator.yaml
(or “generator
config”) for the S3 service controller.
Each service in the services/
directory can have a generator.yaml
file that
contains overrides and special instructions for how to interpret and transform
parts of the service’s API.
Here is part of the S3 service’s generator.yaml
file:
resources:
Bucket:
renames:
operations:
CreateBucket:
input_fields:
Bucket: Name
As you can see, the generator config for the ACK S3 service controller is
renaming the CreateBucket
Operation’s Input Shape Bucket
field to Name
.
We do this for some APIs to add a little consistency and a more
Kubernetes-native experience for the CRDs. In Kubernetes, there is a
Metadata.Name
(internal Kubernetes name) and there is typically a Spec.Name
field which refers to the external Name of the resource. So, in order to
align the s3.services.k8s.aws/Bucket
’s definition to be more Kubernetes-like,
we rename the Bucket
field to Name
.
We do this renaming for other things that produce a bit of a “stutter”, as well as where the name of a field does not conform to Go exported name constraints or naming best practices.
Determining the Status fields
Remember that fields in a CR’s Status
struct are not mutable by normal
Kubernetes users. Instead, these fields represent the latest observed state of
a resource (instead of the desired state of that resource which is
represented by fields in the CR’s Spec
struct).
The ACK code generator takes the members of the Create Operation
’s Output
shape and puts those fields into the CR’s Status
struct.
We assume that fields in the Output
that have the same name as fields in the
Input
shape for the Create Operation
refer to the resource field that was
set in the Spec
field and therefore are only interested in fields in the
Output
that are not in the Input
.
Looking at the BucketSpec
struct definition that was generated after
processing the S3 API model file, we find this:
// BucketStatus defines the observed state of Bucket
type BucketStatus struct {
// All CRs managed by ACK have a common `Status.ACKResourceMetadata` member
// that is used to contain resource sync state, account ownership,
// constructed ARN for the resource
ACKResourceMetadata *ackv1alpha1.ResourceMetadata `json:"ackResourceMetadata"`
// All CRS managed by ACK have a common `Status.Conditions` member that
// contains a collection of `ackv1alpha1.Condition` objects that describe
// the various terminal states of the CR and its backend AWS service API
// resource
Conditions []*ackv1alpha1.Condition `json:"conditions"`
Location *string `json:"location,omitempty"`
}
Let’s discuss each of the fields shown above.
First, the ACKResourceMetadata
field is included in every ACK CRD’s Status
field. It is a pointer to a ackv1alpha1.ResourceMetadata
struct.
This struct contains some standard and important pieces of information about
the resource, including the AWS Resource Name (ARN) and the Owner AWS Account
ID.
The ARN is a globally-unique identifier for the resource in AWS. The Owner AWS Account ID is the 12-digit AWS account ID that is billed for the resource.
The Conditions
field is also included in every ACK CRD’s Status field. It is
a slice of pointers to ackv1alpha1.Condition
structs. The
Condition
struct is responsible for conveying information about the latest
observed sync state of a resource, including any terminal condition states that
cause the resource to be “unsyncable”.
Next is the Location
field. This field gets its definition from the S3
CreateBucketOutput.Location
field:
"CreateBucketOutput":{
"type":"structure",
"members":{
"Location":{
"shape":"Location",
}
}
},