Skip to content

feat!: Unified variables and adds support for IAM policies#341

Merged
happyhuman merged 5 commits into
mainfrom
infra-vars
Apr 12, 2022
Merged

feat!: Unified variables and adds support for IAM policies#341
happyhuman merged 5 commits into
mainfrom
infra-vars

Conversation

@adlersantos

@adlersantos adlersantos commented Apr 12, 2022

Copy link
Copy Markdown
Member

Description

Unified variables

This PR supports an optional variables file (.vars.{ENV}.yaml) under every dataset folder, that can contain all the values of variables needed for that dataset's infra and the pipeline configurations.

The namespaces supported in the YAML file are

  • infra: a set of key-value pairs that are copied as Terraform variables under infra/terraform.tfvars
  • pipelines: a JSON object that contains dataset-specific variables (Airflow variables), copied to .{env}/datasets/{DATASET}/pipelines/{dataset}_variables.json

Because these YAML files might contain sensitive information, they aren't checked into the repo.

Support for IAM policies for GCS buckets and BQ datasets

This PR also adds support for adding IAM policies into the Terraform resource definitions of GCS buckets and BQ datasets using the YAML variables above. The convention is to use

infra:
  iam_policies:
    storage_buckets:
      bucket_name:
        - role: roles/storage.objectViewer
          members:
            - user:some-user@google.com
    bigquery_datasets:
      dataset_name:
        - role: roles/bigquery.dataViewer
          members:
            - allAuthenticatedUsers
            - user:another-user@example.com

in the YAML variables file to associate a list of IAM roles to a specific GCS bucket or BQ dataset. These IAM roles will be included in the generated Terraform files as IAM policy resources.

Checklist

Note: If an item applies to you, all of its sub-items must be fulfilled

  • (Required) This pull request is appropriately labeled
  • Please merge this pull request after it's approved
  • I'm adding or editing a feature
    • I have updated the README accordingly
    • I have added tests for the feature
  • I'm adding or editing a dataset
    • The Google Cloud Datasets team is aware of the proposed dataset
    • I put all my code inside datasets/<DATASET_NAME> and nothing outside of that directory
  • I'm adding/editing documentation
  • I'm submitting a bugfix
    • I have added tests to my bugfix (see the tests folder)
  • I'm refactoring or cleaning up some code

@adlersantos adlersantos added feature request New feature or request cleanup Cleanup or refactor code revision: feature Modify an existing feature labels Apr 12, 2022
@adlersantos adlersantos requested a review from happyhuman April 12, 2022 15:08

@happyhuman happyhuman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one comment about open(...) usage.

Comment thread scripts/deploy_dag.py
gcs_uri = f"gs://{composer_bucket}/data/variables/{filename}"
pipeline_vars_file = f"{dataset_id}_variables.json"
env_vars_file = DATASETS_PATH / dataset_id / f".vars{env_path.name}.yaml"
env_vars = yaml.load(open(env_vars_file)) if env_vars_file.exists() else {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to open the file using the with clause.

@happyhuman happyhuman merged commit c4a45a0 into main Apr 12, 2022
@happyhuman happyhuman deleted the infra-vars branch April 12, 2022 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup Cleanup or refactor code feature request New feature or request revision: feature Modify an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants