Skip to content

(stepfunctions-tasks): AthenaStartQueryExecution construct creates insufficient IAM PolicyDocument #25875

@svbfromnl

Description

@svbfromnl

Describe the bug

When synthesizing the following CDK/Python snippet, the resulting IAM configuration does not allow Athena to write the results to the designated output_bucket location:

from aws_cdk import aws_stepfunctions as sfn
from aws_cdk import aws_stepfunctions_tasks as sfn_tasks

        start_query_execution_job = sfn_tasks.AthenaStartQueryExecution(
            self,
            "Start Athena Query",
            query_string=query,
            integration_pattern=sfn.IntegrationPattern.RUN_JOB,
            query_execution_context=sfn_tasks.QueryExecutionContext(database_name=database),
            result_configuration=sfn_tasks.ResultConfiguration(
                encryption_configuration=sfn_tasks.EncryptionConfiguration(
                    encryption_option=sfn_tasks.EncryptionOption.S3_MANAGED,
                ),
                output_location=s3.Location(bucket_name=output_bucket, object_key="results"),
            ),
        )

Expected Behavior

I expect the policy to end in a dash and an asterisk, to ensure that the required Actions can take place on the resources inside the folder:

   "Action": [
    "s3:AbortMultipartUpload",
    "s3:ListBucketMultipartUploads",
    "s3:ListMultipartUploadParts",
    "s3:PutObject"
   ],
   "Effect": "Allow",
   "Resource": "arn:aws:s3:::S3_BUCKET_NAME/results/*"

Current Behavior

The current Policy does not end in an asterisk, causing permission issues when output is written to the location:

    "PolicyDocument": {
     "Statement": [
      {
       "Action": [
        "athena:getDataCatalog",
        "athena:getQueryExecution",
        "athena:startQueryExecution"
       ],
       "Effect": "Allow",
       "Resource": [
        "arn:aws:athena:us-east-1:317XXXXXX577:datacatalog/AwsDataCatalog",
        "arn:aws:athena:us-east-1:317XXXXXX577:workgroup/primary"
       ]
      },
      {
       "Action": [
        "athena:getQueryResults",
        "lakeformation:GetDataAccess",
        "s3:CreateBucket",
        "s3:GetBucketLocation",
        "s3:GetObject",
        "s3:ListBucket"
       ],
       "Effect": "Allow",
       "Resource": "*"
      },
      {
       "Action": [
        "s3:AbortMultipartUpload",
        "s3:ListBucketMultipartUploads",
        "s3:ListMultipartUploadParts",
        "s3:PutObject"
       ],
       "Effect": "Allow",
       "Resource": "arn:aws:s3:::S3_BUCKET_NAME/results"
      },

The result of the current behavior is that Athena is not able to execute the query, and errors out with the following:

Access denied when writing output to url: s3://S3_BUCKET_NAME/results/2d282a78-6b8b-48c3-aaf2-819e867ca3dc.csv . Please ensure you are allowed to access the S3 bucket. If specifying an expected bucket owner, confirm the bucket is owned by the expected account. If you are encrypting query results with KMS key, please ensure you are allowed to access your KMS key

Reproduction Steps

from aws_cdk import aws_stepfunctions as sfn
from aws_cdk import aws_stepfunctions_tasks as sfn_tasks

        start_query_execution_job = sfn_tasks.AthenaStartQueryExecution(
            self,
            "Start Athena Query",
            query_string=query,
            integration_pattern=sfn.IntegrationPattern.RUN_JOB,
            query_execution_context=sfn_tasks.QueryExecutionContext(database_name=database),
            result_configuration=sfn_tasks.ResultConfiguration(
                encryption_configuration=sfn_tasks.EncryptionConfiguration(
                    encryption_option=sfn_tasks.EncryptionOption.S3_MANAGED,
                ),
                output_location=s3.Location(bucket_name=output_bucket, object_key="results"),
            ),
        )

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.82.0 (build 3a8648a)

Framework Version

No response

Node.js Version

Node.js v20.1.0

OS

macOS Ventura 13.4

Language

Python

Language Version

Python 3.11.0

Other information

Changing the object_key variable to "results/*" causes the correct IAM Policy to be created, but the addition of the /* causes the actual location of the files in the S3 bucket to become:
s3://S3_BUCKET_NAME/results/*/b7f41696-c586-49ec-a915-51e7e1379110.csv

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.effort/smallSmall work item – less than a day of effortgood first issueRelated to contributions. See CONTRIBUTING.mdp1

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions