Skip to content

aws-stepfunctions-tasks: state machine role is missing sagemaker:AddTags permission for SageMakerCreateTransformJob task #26012

@l3ku

Description

@l3ku

Describe the bug

Using Python, I am trying to create a step functions state machine that runs an AWS SageMaker batch transform job using the .sync version of the API like this:

batch_inference_job = sfn_tasks.SageMakerCreateTransformJob(
    self,
    "BatchInferenceTransformJob",
    integration_pattern=sfn.IntegrationPattern.RUN_JOB,
    transform_job_name=sfn.JsonPath.string_at("$.transform_job_name"),
    model_name="xxx",
    ...
)

state_machine = sfn.StateMachine(
    self,
    "BatchInferencePipeline",
    state_machine_name="xxx"
    definition=batch_inference_job
)

Note that I am not providing anything in the role parameter when I instantiate the StateMachine. When I try to execute the state machine, I get the following type of error:

User: arn:aws:sts::xxx:assumed-role/xxx is not authorized to perform: sagemaker:AddTags on resource: arn:aws:sagemaker:eu-west-1:xxx:transform-job/xxx because no identity-based policy allows the sagemaker:AddTags action (Service: AmazonSageMaker; Status Code: 400; Error Code: AccessDeniedException; Request ID: xxx; Proxy: null)

When I take a closer look at what tags step functions is trying to set for the transform job (I am not setting any tags for the job myself), I see some AWS managed tags, which presumably the step functions service appends:

{
    "Key": "MANAGED_BY_AWS",
    "Value": "STARTED_BY_STEP_FUNCTIONS"
}

So from my viewpoint it seems that the role generated by CDK for the state machine should already by default include a policy that allows the sagemaker:AddTags action. When I tried spinning up the batch transform job with sfn.IntegrationPattern.REQUEST_RESPONSE, step functions didn't try to set any tags and submitting the job worked as expected.

Expected Behavior

The default role generated by cdk for the step functions state machine should have all the necessary permissions to start a job when using integration_pattern=sfn.IntegrationPattern.RUN_JOB, including sagemaker:AddTags.

Current Behavior

Got an error when step functions tried to create the batch transform job:

User: arn:aws:sts::xxx:assumed-role/xxx is not authorized to perform: sagemaker:AddTags on resource: arn:aws:sagemaker:eu-west-1:xxx:transform-job/xxx because no identity-based policy allows the sagemaker:AddTags action (Service: AmazonSageMaker; Status Code: 400; Error Code: AccessDeniedException; Request ID: xxx; Proxy: null)

Reproduction Steps

batch_inference_job = sfn_tasks.SageMakerCreateTransformJob(
    self,
    "BatchInferenceTransformJob",
    integration_pattern=sfn.IntegrationPattern.RUN_JOB,
    transform_job_name=sfn.JsonPath.string_at("$.transform_job_name"),
    model_name="xxx",
    ...
)

state_machine = sfn.StateMachine(
    self,
    "BatchInferencePipeline",
    state_machine_name="xxx"
    definition=batch_inference_job
)

Possible Solution

Not tested, but it seems that the policies for the role are added in: https://github.com/aws/aws-cdk/blob/main/packages/aws-cdk-lib/aws-stepfunctions-tasks/lib/sagemaker/create-transform-job.ts#L273. So simply adding a policy statement that allows sagemaker:AddTags there.

Additional Information/Context

No response

CDK CLI Version

2.81.0

Framework Version

No response

Node.js Version

v18.16.0

OS

MacOS 12.5

Language

Python

Language Version

3.10.9

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions