Skip to content

feat(aws-resources): Add IAM Access Advisor#8726

Merged
kodiakhq[bot] merged 4 commits intocloudquery:mainfrom
erezrokah:feat/aws_iam_users_access_advisor
Mar 7, 2023
Merged

feat(aws-resources): Add IAM Access Advisor#8726
kodiakhq[bot] merged 4 commits intocloudquery:mainfrom
erezrokah:feat/aws_iam_users_access_advisor

Conversation

@erezrokah
Copy link
Copy Markdown
Member

@erezrokah erezrokah commented Mar 6, 2023

Summary

Fixes #1402
Fixes #1389
Fixes #1388
Fixes #1387
Fixes #1386

This PR implements https://docs.aws.amazon.com/cli/latest/reference/iam/get-service-last-accessed-details.html, but not https://docs.aws.amazon.com/cli/latest/reference/iam/get-service-last-accessed-details-with-entities.html as getting the entities is what makes this super slow.
I put the table definitions and fetchers in the same file as I think it's easier to follow the code that way, but happy to split into multiple files (one per table).

This follows a pattern of using relations to get maximum concurrency based on the spec setting.

Explanation about PKs:

Syncing these tables takes 2m21s on our playground account

BEGIN_COMMIT_OVERRIDE
feat(aws-resources): Add IAM Access Advisor tables: aws_iam_group_last_accessed_details, aws_iam_policy_last_accessed_details, aws_iam_role_last_accessed_details and aws_iam_user_last_accessed_details. These might be slow to sync on some accounts. You can skip them if needed via skip_tables: ["aws_iam_*_last_accessed_details"]
END_COMMIT_OVERRIDE

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 6, 2023

This PR has the following changes to source plugin(s) tables:

  • Table aws_iam_group_last_accessed_details was added
  • Table aws_iam_policy_last_accessed_details was added
  • Table aws_iam_role_last_accessed_details was added
  • Table aws_iam_user_last_accessed_details was added

@erezrokah erezrokah force-pushed the feat/aws_iam_users_access_advisor branch from 371249a to 639937e Compare March 6, 2023 14:06
@erezrokah erezrokah added the automerge Automatically merge once required checks pass label Mar 6, 2023
@bbernays
Copy link
Copy Markdown
Collaborator

bbernays commented Mar 6, 2023

How much concurrency does AWS support?

@erezrokah erezrokah removed the automerge Automatically merge once required checks pass label Mar 6, 2023
@erezrokah
Copy link
Copy Markdown
Member Author

How much concurrency does AWS support?

@bbernays can you explain your comment? Do you mean API limits? Concurrent report jobs limit?

I couldn't find anything about this in the AWS docs

@bbernays
Copy link
Copy Markdown
Collaborator

bbernays commented Mar 6, 2023

How much concurrency does AWS support?

@bbernays can you explain your comment? Do you mean API limits? Concurrent report jobs limit?

I couldn't find anything about this in the AWS docs

Yeah I mean about concurrent reports jobs limit. I know the original PR had all of the jobs being created sequentially, so I wonder if there was a technical reason for that

@erezrokah
Copy link
Copy Markdown
Member Author

Yeah I mean about concurrent reports jobs limit. I know the original PR had all of the jobs being created sequentially, so I wonder if there was a technical reason for that

I did not hit any limits on our playground account 😄 nor could find any docs about it

@hermanschaaf
Copy link
Copy Markdown
Contributor

I'd like to check my understanding of this code:

  1. for every group / policy / role/ user, a last_accessed_jobs resolver will get called, because of the parent-child relationship with the corresponding iam table
  2. this child resolver creates a new job and writes the ARN and job ID to the corresponding last_accessed_jobs table. Then,
  3. a last_accessed_details resolver kicks off (again because of parent-child relationship), sleeping until the job either fails or succeeds. It then writes the details to the last_accessed_details table.

Is my understanding correct? This is a clever solution! I think it subtly changes the semantics of our tables though - especially the last_accessed_jobs table. Usually, a table is expected to fetch all available data for the particular resource. In the case of last_accessed_jobs though, it's more or less being employed as a queue--users won't get a full list of all "last accessed jobs" that have ever been created on their accounts, they will only get the last accessed jobs that have been generated by this particular sync. I think that's maybe not ideal? Is it necessary to have the jobs table at all? Why not have only a details table, where we both generate the job and wait for its results?

@erezrokah
Copy link
Copy Markdown
Member Author

Is my understanding correct? This is a clever solution! I think it subtly changes the semantics of our tables though - especially the last_accessed_jobs table. Usually, a table is expected to fetch all available data for the particular resource. In the case of last_accessed_jobs though, it's more or less being employed as a queue--users won't get a full list of all "last accessed jobs" that have ever been created on their accounts, they will only get the last accessed jobs that have been generated by this particular sync. I think that's maybe not ideal? Is it necessary to have the jobs table at all? Why not have only a details table, where we both generate the job and wait for its results?

You correct! And you have a very good point. I think we can drop the jobs table as there's a 1-1 mapping between job and details. There will be a slight behavioral change (depending on the scheduler) as the details tables will take a bit longer and also get better concurrency.
The idea was to have as little API calls as possible per table, so the SDK can manage the concurrency of the resolvers.
Another thing to note is that a job ID is only valid within the same session:
image
so I didn't see a point in saving all of them.

TDLR: I'll drop the jobs tables

@erezrokah erezrokah force-pushed the feat/aws_iam_users_access_advisor branch from 8185ce0 to d3ed6b9 Compare March 7, 2023 10:18
@erezrokah
Copy link
Copy Markdown
Member Author

@hermanschaaf cleaned up the job tables in d3ed6b9

@erezrokah erezrokah added the automerge Automatically merge once required checks pass label Mar 7, 2023
@erezrokah erezrokah requested a review from hermanschaaf March 7, 2023 10:23
@kodiakhq kodiakhq bot merged commit bc53529 into cloudquery:main Mar 7, 2023
@erezrokah erezrokah deleted the feat/aws_iam_users_access_advisor branch March 7, 2023 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge Automatically merge once required checks pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IAM Access Advisor IAM Access Advisor: groups IAM Access Advisor: users IAM Access Advisor: roles IAM Access Advisor: policies

4 participants