Skip to content

feat: Onboard NASA wildfire#275

Merged
adlersantos merged 14 commits into
GoogleCloudPlatform:mainfrom
gkodukula:new_nasa_wildfire
May 19, 2022
Merged

feat: Onboard NASA wildfire#275
adlersantos merged 14 commits into
GoogleCloudPlatform:mainfrom
gkodukula:new_nasa_wildfire

Conversation

@gkodukula

@gkodukula gkodukula commented Jan 24, 2022

Copy link
Copy Markdown
Contributor

Description

Pipeline: past_week

Checklist

Note: Delete items below that aren't applicable to your pull request.

  • Please merge this PR for me once it is approved.
  • If this PR adds or edits a dataset or pipeline, it was reviewed and approved by the Google Cloud Public Datasets team beforehand.
  • If this PR adds or edits a dataset or pipeline, I put all my code inside datasets/nasa_wildfire and nothing outside of that directory.
  • This PR is appropriately labeled.

Comment thread datasets/nasa_wildfire/pipelines/_images/run_csv_transform_kub/csv_transform.py Outdated
download_file(source_url, source_file)

logging.info("Reading file ...")
df = pd.read_csv(str(source_file))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be chunking? What is the typical # records/file size?

Comment thread datasets/nasa_wildfire/pipelines/_images/run_csv_transform_kub/csv_transform.py Outdated
Comment thread datasets/nasa_wildfire/pipelines/_images/run_csv_transform_kub/csv_transform.py Outdated
Comment thread datasets/nasa_wildfire/pipelines/past_week/past_week_dag.py Outdated
Comment thread datasets/nasa_wildfire/pipelines/past_week/past_week_dag.py Outdated
TARGET_GCS_BUCKET: "{{ var.value.composer_bucket }}"
TARGET_GCS_PATH: "data/nasa_wildfire/past_week/data_output.csv"
PIPELINE_NAME: "past_week"
CSV_HEADERS: >-

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to multiline

Comment thread datasets/nasa_wildfire/pipelines/past_week/pipeline.yaml
@gkodukula

Copy link
Copy Markdown
Contributor Author

@adlersantos @happyhuman @nlarge-google please review the code after the changes as per review comments

@nlarge-google nlarge-google left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one change - remove storage bucket in dataset.yaml.

Comment thread datasets/nasa_wildfire/pipelines/dataset.yaml Outdated
@adlersantos adlersantos requested a review from nlarge-google May 19, 2022 03:31

@nlarge-google nlarge-google left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@adlersantos adlersantos merged commit f593161 into GoogleCloudPlatform:main May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants