fix(spanner): context canceled in ExecuteStreamingSql#12630
fix(spanner): context canceled in ExecuteStreamingSql#12630yordis wants to merge 5 commits intogoogleapis:mainfrom
Conversation
feat(spanner): add otel semconv to spanner
|
@rahul2393 here, I am adding a test file in a few, cleaning up a bit |
33c109f to
12c16cd
Compare
|
@yordis can you fix go vet issues? |
|
@rahul2393 working on making CI to pass, yes |
31e6f7d to
a88eb08
Compare
…r for performance Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
a88eb08 to
ed8dae6
Compare
|
It is 3AM, so I am going to sleep, I will come back to this around 11 AM EST |
|
@yordis |
|
Calling last Recv in go routine was indeed to fix that #12235 |
|
@yordis Can you share what does the trace look like at your end in case of context cancelled error, for me in mainline code status is always unset in traces, does it happen for all ExecuteStreamingSQL traces or just few at random? |
|
I dont see anything weird, it will help a ton to get #12631 so I can have better data (well, the downside I can not upgrade 😭)
|
|
@yordis Is it possible for you to test this patch and see if context cancel still happens |
Just run this test on your local and check if it passes. |
I think context cancel is happening because of this #11854 (comment), we should not cancel in iter.Stop if it's already planned to be cancelled in separate go routine in case of skipping trailers. |
That make sense to me. What do you think the route should be here? I do not feel qualify to understand the entire situation at this point. I am just trying to be proactive, but I rather not be a distraction or slowness here |
|
Is there a way for you to verify in your test environment this patch and confirm issue is not happening, if yes then I can include the change in next release |
|
@yordis Another case is when application does not iterate over all the rows and call iter.Stop, then context cancel will appear in traces, and that's a valid case. Are we sure we are not skipping rows? example code which can cause "context cancel" issue is below. Above code is wrong since we called iter.Stop without iterating over all the rows |
Let me dig into this, I feel this is the case! In 1.80, did you fix reporting to the OTEL, correct? I want to confirm because before we did not have any error in the trace, but now we do, so it could be the case that the app was "broken" before, and we are chasing ghosts here. |
|
Well, I was using 1.83.0, so could be the case that wasn't fixed properly? And what I mean by "fix" is that, I wouldn't see a "failure" in my trace. Otherwise we would have "error" signalling when it is expected behavior |
It is fixed in 1.81.1 and I am not able to see context cancel errors for valid cases, but if we try to do this #12630 (comment) then it will happen in traces and that's valid case, one should not call iter.Stop without iterating over all the rows. |
|
Can you write a test replicate the context cancel span behaviour using the public methods available? I am not able to replicate your case. |
|
Let me report back, I think that we actually have a "bug" in our end, and we dont need to fix anything here. |
What if I don't want to iterate over the whole list and we do not want to see an issue? Disclaimer, we are using https://github.com/googleapis/go-gorm-spanner so we aren't calling Here is the one that is actually failing: func apiCall(ctx context.Context, db *gorm.DB, shopSellerId uuid.UUID) (uuid.UUID, error) {
var shopSeller models.ShopSeller
err := db.WithContext(ctx).
Where("id = ? AND deleted_at IS NULL", shopSellerId).
First(&shopSeller).Error
if err != nil {
if errors.Is(err, gorm.ErrRecordNotFound) {
return uuid.Nil, fmt.Errorf("...", ErrShopSellerResolutionFailed, shopSellerId)
}
return uuid.Nil, fmt.Errorf("...", err)
}
shopId, err := uuid.Parse(shopSeller.ShopID)
if err != nil {
return uuid.Nil, fmt.Errorf("...", err)
}
return shopId, nil
}The interesting thing is that, I am unsure why Gorm would decide to use "streaming" version for this query; I dont understand Spanner, but it feels streaming would be for long-running queries, or something that wouldn't be selecting a single row
|
|
Could be we need https://github.com/googleapis/go-gorm-spanner/releases/tag/v1.8.1 (the weird thing is that I can not see file changes from the PR ...) |
|
There is no file changes because it just update the spanner dependency and dummy PR was to run the release process there, let me dig into GORM now since so far I was just looking into spanner client, what version of GORM spanner are you using? |
|
I just tested in latest everything just in case So I am gonna test my actual branch by now ... sorry, just trying to gather more data just in case |
|
btw @rahul2393 would be better to jump into a Google Meet? Your team should have our contact info if I am not mistaken |
|
Since I do not have #12631 it is really difficult to see what queries are those; but the fact that it is the very last one, makes me think the issue still present
|
I assume you wanted to attach before and after report, but there is only one report is it before? |
See the drop in the graph |
|
@yordis I was able to replicate and track what's happening |
|
So what should be the resolution here based on your understanding thus far? |
|
@yordis |
|
Closing in favor of #12635 |







related to #11854