Skip to content

fix(destinations): Propagate write errors#461

Closed
erezrokah wants to merge 2 commits intocloudquery:mainfrom
erezrokah:fix/propagate_write_errors
Closed

fix(destinations): Propagate write errors#461
erezrokah wants to merge 2 commits intocloudquery:mainfrom
erezrokah:fix/propagate_write_errors

Conversation

@erezrokah
Copy link
Copy Markdown
Member

@erezrokah erezrokah commented Dec 5, 2022

Summary

Related to cloudquery/cloudquery#5152 and #460

The error returned from the error group Wait() function is the error reported by the Go routine, not the error of the Wait() function

This is only a patch. I think we should separate application level errors (e.g. write) from protocol/communication level errors. I'll open a separate issue for that.


Use the following steps to ensure your PR is ready to be reviewed

  • Read the contribution guidelines 🧑‍🎓
  • Run go fmt to format your code 🖊
  • Lint your changes via golangci-lint run 🚨 (install golangci-lint here)
  • Update or add tests 🧪
  • Ensure the status checks below are successful ✅

@github-actions github-actions bot added fix and removed fix labels Dec 5, 2022
s.Logger.Error().Err(err).Msg("failed to wait")
return fmt.Errorf("plugin write failed: %w", err)
}
return ctx.Err()
Copy link
Copy Markdown
Member Author

@erezrokah erezrokah Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.Err() returns context canceled when the write fails, which is not the error we want

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 5, 2022

⏱️ Benchmark results

Comparing with 3fecc61

  • DefaultConcurrency-2 resources/s: 11,309 ⬇️ 4.70% decrease vs. 3fecc61
  • Glob-2 ns/op: 191 ⬇️ 7.33% decrease vs. 3fecc61
  • TablesWithChildrenDefaultConcurrency-2 resources/s: 29,626 ⬆️ 10.37% increase vs. 3fecc61
  • BufferedScanner-2 ns/op: 14.63 ⬆️ 15.11% increase vs. 3fecc61
  • LogReader-2 ns/op: 36.76 ⬇️ 3.59% decrease vs. 3fecc61

close(resources)
if err := eg.Wait(); err != nil {
s.Logger.Error().Err(err).Msg("failed to unmarshal resource. failed to wait for plugin")
return fmt.Errorf("plugin write failed: %w", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are losing the information from the original JSON unmarshal error here now - that's why the previous implementation only logged the Wait error and then returned the original JSON unmarshal error. Maybe we should keep it that way? I'm open to other options too, but we shouldn't drop that error without logging it at least

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure. I actually think we should return a list of errors, but that's a bigger change.
I think the Write error is the first one that occurred so it makes more sense to return it.
I'll add a log for the unmarshal error

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#487 should allow us to report both

kodiakhq bot pushed a commit that referenced this pull request Dec 27, 2022

Related to cloudquery/cloudquery#5152.

This fixes an issue when if the channel gets closed (for example there's an error during write), we report the `EOF` error instead of the original error.

Scenario from the issue:
1. Users runs `sync --no-migrate` on a non initialized database
2. [`Write`](https://github.com/cloudquery/plugin-sdk/blob/ab7ca972e0b187a7dfb66132a03f07479cd29bb7/internal/servers/destinations.go#L84) fails on the first table write, returning an error from this function:
https://github.com/cloudquery/plugin-sdk/blob/ab7ca972e0b187a7dfb66132a03f07479cd29bb7/internal/servers/destinations.go#L65, causing the channel to close
3.  The next iteration of `for resource := range resources` starts, receiving an `EOF` error since the channel is closed, and `EOF` error is reported.

> This is only a patch. I think we should separate application level errors (e.g. write) from protocol/communication level errors. I'll open a separate issue for that

**Also, [another related fix coming](#461

---
@erezrokah
Copy link
Copy Markdown
Member Author

Closed by #529

@erezrokah erezrokah closed this Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants