Skip to content

Enable usage of GCS URL as Module.Source#1523

Merged
mr0re1 merged 1 commit into
GoogleCloudPlatform:developfrom
mr0re1:getter
Oct 25, 2023
Merged

Enable usage of GCS URL as Module.Source#1523
mr0re1 merged 1 commit into
GoogleCloudPlatform:developfrom
mr0re1:getter

Conversation

@mr0re1

@mr0re1 mr0re1 commented Jun 29, 2023

Copy link
Copy Markdown
Collaborator
  • Add GCSBucket to list of available detectors and getters;
  • Rename GitSourceReader -> GoGetterSourceReader.
  • Update go-getter dependency v1.7.2 -> v1.7.3

https://developer.hashicorp.com/terraform/language/modules/sources#gcs-bucket

Usage:

source: gcs::[https://www.googleapis.com/storage/v1/<bucket>/<path_to_module_dir](https://www.googleapis.com/storage/v1/%3Cbucket%3E/%3Cpath_to_module_dir)>

Note:

  • Doesn't support package-notation (//) at the bucket level,
    a dir in root of the bucket is highest possible level for a package.

Testing done:

$ gsutil ls gs://<bucket>/dir_1/dir_2
gs://<bucket>/dir_1/dir_2/
gs://<bucket>/dir_1/dir_2/custom-image/
gs://<bucket>/dir_1/dir_2/pre-existing-vpc/
gs://<bucket>/dir_1/dir_2/vpc/
---
blueprint_name: tst
vars:
  project_id:  <project>
  deployment_name: tst
  region: us-central1
deployment_groups:
- group: primary
  modules:
  - id: mod
    source: <source>

$ make && rm -rf tst && time ./ghpc create test.yaml

$ make && rm -rf tst && time ./ghpc create test.yaml


* `gcs::[https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/vpc`](https://www.googleapis.com/storage/v1/%3Cbucket%3E/dir_1/dir_2/vpc%60) _no package_
```sh
...
real    0m1.536s
$ terraform -chdir=tst/primary init
...
Downloading gcs::[https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/vpc](https://www.googleapis.com/storage/v1/%3Cbucket%3E/dir_1/dir_2/vpc) for mod...
- mod in .terraform/modules/mod
... # OK
...
real    0m1.538s
...
$ terraform -chdir=tst/primary init
...
Downloading gcs::[https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/vpc/](https://www.googleapis.com/storage/v1/%3Cbucket%3E/dir_1/dir_2/vpc/) for mod...
- mod in .terraform/modules/mod
... # OK
...
real    0m2.112s
...
$ terraform -chdir=tst/primary init
...
Downloading gcs::[https://www.googleapis.com/storage/v1/<bucket>/dir_1](https://www.googleapis.com/storage/v1/%3Cbucket%3E/dir_1) for mod...
- mod in .terraform/modules/mod/dir_2/vpc
... # OK
$ ls tst/primary/.terraform/modules/mod/dir_2/
pre-existing-vpc  vpc
Error: failed to get module at gcs::[https://www.googleapis.com/storage/v1/<bucket](https://www.googleapis.com/storage/v1/%3Cbucket)> to /tmp/get-module-655452250/mod: error downloading '[https://www.googleapis.com/storage/v1/<bucket](https://www.googleapis.com/storage/v1/%3Cbucket)>': URL is not a valid GCS URL
11:     source: gcs::[https://www.googleapis.com/storage/v1/<bucket>//dir_1/dir_2/vpc](https://www.googleapis.com/storage/v1/%3Cbucket%3E//dir_1/dir_2/vpc)

go-getter problem:

$ go-getter gcs::[https://www.googleapis.com/storage/v1/<bucket>//dir_1/dir_2/vpc](https://www.googleapis.com/storage/v1/%3Cbucket%3E//dir_1/dir_2/vpc) ~/tmp/ll
2023/10/09 23:42:36 Error downloading: URL is not a valid GCS URL
Error: lstat /tmp/get-module-2223695200/mod: no such file or directory
12:     source: gcs::[https://www.googleapis.com/storage/v1/<bucket>/missing_dir](https://www.googleapis.com/storage/v1/%3Cbucket%3E/missing_dir)

This is a issue with go-getter@v1.7.2 the DirMode prevented raise of an error, after update to v1.7.3 WAI:

Error: failed to get module at gcs::[https://www.googleapis.com/storage/v1/<bucket>/missing_dir](https://www.googleapis.com/storage/v1/%3Cbucket%3E/missing_dir) to /tmp/get-module-3980916664/mod: storage: object doesn't exist
12:     source: gcs::[https://www.googleapis.com/storage/v1/<bucket>/missing_dir](https://www.googleapis.com/storage/v1/%3Cbucket%3E/missing_dir)
  • wut::kpss://wrong_protocol wrong protocol
Error: failed to get module at wut::kpss://wrong_protocol to /tmp/get-module-3977420508/mod: download not supported for scheme 'wut'
12:     source: wut::kpss://wrong_protocol
                ^
real    0m0.070s
---
blueprint_name: tst
vars:
  project_id:  <project>
  deployment_name: tst
deployment_groups:
- group: primary
  modules:
  - id: mod
    kind: packer
    source: <source>
    settings:
      subnetwork_name: any
      zone: danger

$ make && rm -rf tst && time ./ghpc create test.yaml
real    0m2.002s # OK
$ tree tst/primary/
tst/primary/
└── mod # + hcl files
real    0m4.980s # OK

$ tree tst/primary/mod/
tst/primary/mod/
└── dir_2
    ├── custom-image     # + hcl files
    ├── pre-existing-vpc # + tf files
    └── vpc              # + tf files
  • missing dir & wrong protocol yield the same result as Terraform test cases; WAI.

@mr0re1 mr0re1 requested a review from tpdownes June 29, 2023 00:13
@mr0re1 mr0re1 requested a review from nick-stroud June 29, 2023 06:08
@mr0re1 mr0re1 assigned nick-stroud and tpdownes and unassigned tpdownes and nick-stroud Jun 29, 2023
@tpdownes

Copy link
Copy Markdown
Contributor

@mr0re1 can you add link to module docs in description?

https://developer.hashicorp.com/terraform/language/modules/sources#gcs-bucket

It's unclear to me if not supporting packages is a design decision or a limitation of Terraform support for GCS? I understood plan to be to pass through whatever Terraform could take (and reject GCS entirely for Packer modules)

@mr0re1

mr0re1 commented Jun 29, 2023

Copy link
Copy Markdown
Collaborator Author

@tpdownes

It's unclear to me if not supporting packages is a design decision or a limitation of Terraform support for GCS?

It's GoGetter limitation, I wasn't able to fetch GCS URL with // in it.

reject GCS entirely for Packer modules

Is it something we want to limit intentionally? Shall we add an extra restrictions if it already works "out of the box", though without support for //?

@tpdownes tpdownes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've observed what you said in the general PR conversation. If I use this blueprint, ghpc hangs:

---
blueprint_name: remote-packer

vars:
  project_id:  ## Set GCP Project ID Here ##
  deployment_name: images
  region: us-central1
  zone: us-central1-c

# Documentation for each of the modules used below can be found at
# https://github.com/GoogleCloudPlatform/hpc-toolkit/blob/main/modules/README.md

deployment_groups:
- group: network
  modules:
  - id: network0
    source: modules/network/vpc

- group: packer
  modules:
  - id: gcs-image
    source: gcs::https://www.googleapis.com/storage/v1/BUCKET/hpc-toolkit//modules/packer/custom-image
    kind: packer
    use:
    - network0
    settings:
      disk_size: 50
      image_family: test-htc
      state_timeout: 15m

If I manually invoke the command, it also hangs:

go-getter gcs::https://www.googleapis.com/storage/v1/BUCKET/hpc-toolkit//modules/packer/custom-image dst

I don't see any evidence that it is slowly downloading the entire directory. These problems don't occur without //.

So I conclude that go-getter (both library and CLI) appears to have very bad problems with // in the URI. I think we should reject any gcs URI with // in it. WDYT?

@tpdownes

Copy link
Copy Markdown
Contributor

Hrm. What I may be seeing is extreme slowness when it encounters hidden directories with lots of files (e.g. .git .terraform). What should I see if I use // @mr0re1 ?

@tpdownes tpdownes assigned mr0re1 and unassigned tpdownes Jun 29, 2023
@tpdownes tpdownes self-requested a review June 29, 2023 22:13

@tpdownes tpdownes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a few problems. If I point to a bucket that exists, but a bad path to the module, I get:

lstat /var/folders/0_/ms1tmdh54jq7qwwvgdydj20r00s9jb/T/get-module-192337453/mod: no such file or directory

Please take action to improve a better message.

If I add a double-slash to a Terraform or Packer source, I get:

module source gcs::https://www.googleapis.com/storage/v1/BUCKET//modules/network/vpc included "//" package syntax; the "//" should typically be placed at the root of the repository:
failed to get module at gcs::https://www.googleapis.com/storage/v1/BUCKET to /var/folders/0_/ms1tmdh54jq7qwwvgdydj20r00s9jb/T/get-module-109715465/mod: error downloading 'https://www.googleapis.com/storage/v1/BUCKET': URL is not a valid GCS URL

This is bad error message for Terraform; it should tell the user not to include "//" notation at all.

For Packer, I think we might be able to add support by adding a trailing slash to the URL. Please pick one of the 2 options:

  1. Don't support "//" notation for Packer and GCS URLs
  2. Investigate adding a trailing slash to the "package root URL" to see if it works.

@mr0re1

mr0re1 commented Oct 16, 2023

Copy link
Copy Markdown
Collaborator Author

@tpdownes I've addressed your comments, added a test-log to the description. Could you please take another look?

lstat error

Addressed

Doesn't support packages

Clarified that it doesn't support it at the bucket level, bucket itself can't be treated as a package, though directory within can;

Packer specificity

Haven't found any differences in Packer VS Terraform use cases.

Trailing slashes

Seem to work fine.

Extremely slow

Added a time, seems to be reasonable

@mr0re1 mr0re1 added the release-improvements Added to release notes under the "Improvements" heading. label Oct 16, 2023
@mr0re1 mr0re1 assigned tpdownes and unassigned mr0re1 Oct 16, 2023
@mr0re1 mr0re1 requested a review from tpdownes October 16, 2023 23:31

@tpdownes tpdownes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR looks good from technical perspective and we should document it in modules/README.md in this PR. This should not take long.

I would re-format modules/README.md to read something like:

### Source (Required)
...
* Remote modules using [Terraform URL syntax](https://developer.hashicorp.com/terraform/language/modules/sources)
  * Hosted on [GitHub](https://developer.hashicorp.com/terraform/language/modules/sources#github)
  * Google Cloud Storage [Buckets](https://developer.hashicorp.com/terraform/language/modules/sources#gcs-bucket)
  * Generic [git repositories](https://developer.hashicorp.com/terraform/language/modules/sources#generic-git-repository)
* when modules are in a subdirectory of the git repository, a special
  double-slash "//" notation can be required as described below

Then add a section below "Generic Git Modules" with a very simple example:

##### Google Cloud Storage modules

(YOUR SIMPLE EXAMPLE TEXT)

@tpdownes tpdownes removed the request for review from nick-stroud October 23, 2023 16:39
@tpdownes tpdownes assigned mr0re1 and unassigned tpdownes Oct 23, 2023
@mr0re1

mr0re1 commented Oct 24, 2023

Copy link
Copy Markdown
Collaborator Author

Updated README

@mr0re1 mr0re1 requested a review from tpdownes October 24, 2023 21:56
@mr0re1 mr0re1 assigned tpdownes and unassigned mr0re1 Oct 24, 2023
* Rename `GitSourceReader` -> `GoGetterSourceReader`.
* Update `go-getter` dependency `v1.7.2 -> v1.7.3`

https://developer.hashicorp.com/terraform/language/modules/sources#gcs-bucket

**Usage:**
```yaml
source: gcs::https://www.googleapis.com/storage/v1/<bucket>/<path_to_module_dir>
```

**Note:**
* Doesn't support package-notation (`//`) at the bucket level,
a dir in root of the bucket is highest possible level for a package.

```sh
$ gsutil ls gs://<bucket>/dir_1/dir_2
gs://<bucket>/dir_1/dir_2/
gs://<bucket>/dir_1/dir_2/custom-image/
gs://<bucket>/dir_1/dir_2/pre-existing-vpc/
gs://<bucket>/dir_1/dir_2/vpc/
```

```yaml
---
blueprint_name: tst
vars:
  project_id:  <project>
  deployment_name: tst
  region: us-central1
deployment_groups:
- group: primary
  modules:
  - id: mod
    source: <source>

$ make && rm -rf tst && time ./ghpc create test.yaml
```

$ make && rm -rf tst && time ./ghpc create test.yaml
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/vpc` _no package_
```sh
...
real    0m1.536s
$ terraform -chdir=tst/primary init
...
Downloading gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/vpc for mod...
- mod in .terraform/modules/mod
... # OK
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/vpc/` _ends with slash_
```sh
...
real    0m1.538s
...
$ terraform -chdir=tst/primary init
...
Downloading gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/vpc/ for mod...
- mod in .terraform/modules/mod
... # OK
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1//dir_2/vpc` _package_
```sh
...
real    0m2.112s
...
$ terraform -chdir=tst/primary init
...
Downloading gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1 for mod...
- mod in .terraform/modules/mod/dir_2/vpc
... # OK
$ ls tst/primary/.terraform/modules/mod/dir_2/
pre-existing-vpc  vpc
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1//dir_2/vpc/` _package, ends with slash_
_same as above_

* `gcs::https://www.googleapis.com/storage/v1/<bucket>//dir_1/dir_2/vpc` _package separator after bucket name_
```sh
Error: failed to get module at gcs::https://www.googleapis.com/storage/v1/<bucket> to /tmp/get-module-655452250/mod: error downloading 'https://www.googleapis.com/storage/v1/<bucket>': URL is not a valid GCS URL
11:     source: gcs::https://www.googleapis.com/storage/v1/<bucket>//dir_1/dir_2/vpc
```

`go-getter` problem:

```sh
$ go-getter gcs::https://www.googleapis.com/storage/v1/<bucket>//dir_1/dir_2/vpc ~/tmp/ll
2023/10/09 23:42:36 Error downloading: URL is not a valid GCS URL
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/missing_dir` _missing dir_
```sh
Error: lstat /tmp/get-module-2223695200/mod: no such file or directory
12:     source: gcs::https://www.googleapis.com/storage/v1/<bucket>/missing_dir
```

This is a issue with `go-getter@v1.7.2` the `DirMode` prevented raise of an error, after update to `v1.7.3` WAI:
```sh
Error: failed to get module at gcs::https://www.googleapis.com/storage/v1/<bucket>/missing_dir to /tmp/get-module-3980916664/mod: storage: object doesn't exist
12:     source: gcs::https://www.googleapis.com/storage/v1/<bucket>/missing_dir
```

* `wut::kpss://wrong_protocol` _wrong protocol_
```sh
Error: failed to get module at wut::kpss://wrong_protocol to /tmp/get-module-3977420508/mod: download not supported for scheme 'wut'
12:     source: wut::kpss://wrong_protocol
                ^
real    0m0.070s
```

```yaml
---
blueprint_name: tst
vars:
  project_id:  <project>
  deployment_name: tst
deployment_groups:
- group: primary
  modules:
  - id: mod
    kind: packer
    source: <source>
    settings:
      subnetwork_name: any
      zone: danger

$ make && rm -rf tst && time ./ghpc create test.yaml
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/custom-image` _no package_
```sh
real    0m2.002s # OK
$ tree tst/primary/
tst/primary/
└── mod # + hcl files
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1//dir_2/custom-image` _package_
```sh
real    0m4.980s # OK

$ tree tst/primary/mod/
tst/primary/mod/
└── dir_2
    ├── custom-image     # + hcl files
    ├── pre-existing-vpc # + tf files
    └── vpc              # + tf files
```

* _missing dir & wrong protocol_ yield the same result as Terraform test cases; WAI.
@tpdownes tpdownes assigned mr0re1 and unassigned tpdownes Oct 25, 2023
@mr0re1 mr0re1 merged commit 9392730 into GoogleCloudPlatform:develop Oct 25, 2023
@mr0re1 mr0re1 deleted the getter branch October 25, 2023 16:51
max-nag pushed a commit to max-nag/hpc-toolkit that referenced this pull request Oct 27, 2023
…CloudPlatform#1523)

* Rename `GitSourceReader` -> `GoGetterSourceReader`.
* Update `go-getter` dependency `v1.7.2 -> v1.7.3`

https://developer.hashicorp.com/terraform/language/modules/sources#gcs-bucket

**Usage:**
```yaml
source: gcs::https://www.googleapis.com/storage/v1/<bucket>/<path_to_module_dir>
```

**Note:**
* Doesn't support package-notation (`//`) at the bucket level,
a dir in root of the bucket is highest possible level for a package.

```sh
$ gsutil ls gs://<bucket>/dir_1/dir_2
gs://<bucket>/dir_1/dir_2/
gs://<bucket>/dir_1/dir_2/custom-image/
gs://<bucket>/dir_1/dir_2/pre-existing-vpc/
gs://<bucket>/dir_1/dir_2/vpc/
```

```yaml
---
blueprint_name: tst
vars:
  project_id:  <project>
  deployment_name: tst
  region: us-central1
deployment_groups:
- group: primary
  modules:
  - id: mod
    source: <source>

$ make && rm -rf tst && time ./ghpc create test.yaml
```

$ make && rm -rf tst && time ./ghpc create test.yaml
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/vpc` _no package_
```sh
...
real    0m1.536s
$ terraform -chdir=tst/primary init
...
Downloading gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/vpc for mod...
- mod in .terraform/modules/mod
... # OK
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/vpc/` _ends with slash_
```sh
...
real    0m1.538s
...
$ terraform -chdir=tst/primary init
...
Downloading gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/vpc/ for mod...
- mod in .terraform/modules/mod
... # OK
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1//dir_2/vpc` _package_
```sh
...
real    0m2.112s
...
$ terraform -chdir=tst/primary init
...
Downloading gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1 for mod...
- mod in .terraform/modules/mod/dir_2/vpc
... # OK
$ ls tst/primary/.terraform/modules/mod/dir_2/
pre-existing-vpc  vpc
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1//dir_2/vpc/` _package, ends with slash_
_same as above_

* `gcs::https://www.googleapis.com/storage/v1/<bucket>//dir_1/dir_2/vpc` _package separator after bucket name_
```sh
Error: failed to get module at gcs::https://www.googleapis.com/storage/v1/<bucket> to /tmp/get-module-655452250/mod: error downloading 'https://www.googleapis.com/storage/v1/<bucket>': URL is not a valid GCS URL
11:     source: gcs::https://www.googleapis.com/storage/v1/<bucket>//dir_1/dir_2/vpc
```

`go-getter` problem:

```sh
$ go-getter gcs::https://www.googleapis.com/storage/v1/<bucket>//dir_1/dir_2/vpc ~/tmp/ll
2023/10/09 23:42:36 Error downloading: URL is not a valid GCS URL
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/missing_dir` _missing dir_
```sh
Error: lstat /tmp/get-module-2223695200/mod: no such file or directory
12:     source: gcs::https://www.googleapis.com/storage/v1/<bucket>/missing_dir
```

This is a issue with `go-getter@v1.7.2` the `DirMode` prevented raise of an error, after update to `v1.7.3` WAI:
```sh
Error: failed to get module at gcs::https://www.googleapis.com/storage/v1/<bucket>/missing_dir to /tmp/get-module-3980916664/mod: storage: object doesn't exist
12:     source: gcs::https://www.googleapis.com/storage/v1/<bucket>/missing_dir
```

* `wut::kpss://wrong_protocol` _wrong protocol_
```sh
Error: failed to get module at wut::kpss://wrong_protocol to /tmp/get-module-3977420508/mod: download not supported for scheme 'wut'
12:     source: wut::kpss://wrong_protocol
                ^
real    0m0.070s
```

```yaml
---
blueprint_name: tst
vars:
  project_id:  <project>
  deployment_name: tst
deployment_groups:
- group: primary
  modules:
  - id: mod
    kind: packer
    source: <source>
    settings:
      subnetwork_name: any
      zone: danger

$ make && rm -rf tst && time ./ghpc create test.yaml
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1/dir_2/custom-image` _no package_
```sh
real    0m2.002s # OK
$ tree tst/primary/
tst/primary/
└── mod # + hcl files
```

* `gcs::https://www.googleapis.com/storage/v1/<bucket>/dir_1//dir_2/custom-image` _package_
```sh
real    0m4.980s # OK

$ tree tst/primary/mod/
tst/primary/mod/
└── dir_2
    ├── custom-image     # + hcl files
    ├── pre-existing-vpc # + tf files
    └── vpc              # + tf files
```

* _missing dir & wrong protocol_ yield the same result as Terraform test cases; WAI.
@mr0re1 mr0re1 mentioned this pull request Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-improvements Added to release notes under the "Improvements" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants