Skip to content

Add tracing for physical connection open#6091

Merged
vonzshik merged 3 commits intomainfrom
4136-trace-physical-open
Apr 9, 2025
Merged

Add tracing for physical connection open#6091
vonzshik merged 3 commits intomainfrom
4136-trace-physical-open

Conversation

@vonzshik
Copy link
Contributor

@vonzshik vonzshik commented Apr 7, 2025

Closes #4136

@vonzshik vonzshik requested a review from roji as a code owner April 7, 2025 14:54
Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vonzshik this looks great - thanks, but let's maybe discuss whether the Activity should contain data source bootstrapping (and therefore type loading)?

_connectionLogger = dataSourceConfig.LoggingConfiguration.ConnectionLogger;
_commandLogger = dataSourceConfig.LoggingConfiguration.CommandLogger;

// Make sure we do not flow AsyncLocals like Activity.Current
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Though a part of me thinks we should maybe start the activity only after the data source has been set up - that really isn't part of physical connection open. What do you think?

// At no point should we ever have an activity here
Debug.Assert(Activity.Current is null);
// Set current activity as the one from the command
// So child activities from physical open are bound to it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking here.

Though at some point we should have the conversation about whether we want to keep multiplexing, given that it's known to be non-scalable (e.g. and is no longer used in the TE benchmarks).

LogMessages.OpeningPhysicalConnection(ConnectionLogger, Host, Port, Database, UserFacingConnectionString);
var startOpenTimestamp = Stopwatch.GetTimestamp();

var activity = NpgsqlActivitySource.ConnectionOpen(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written below, there may be a case for only starting the activity after the data source is created and bootstrapped. Otherwise, the first "physical connection" is longer because it contains e.g. type loading and stuff like that; at minimum that skews stuff like average time calculations for physical open.

If we really wanted to surface startup costs like type loading properly, we could have another activity for that - though that seems like a bit much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written below, there may be a case for only starting the activity after the data source is created and bootstrapped. Otherwise, the first "physical connection" is longer because it contains e.g. type loading and stuff like that; at minimum that skews stuff like average time calculations for physical open.

I do not think anyone really calculates average time for any sort of traces (too much stuff to store), there are metrics specifically for stuff like this (for example, in Npgslq we have db.client.commands.duration). Beside, right now when we log physical open we pass the total time, including type loading.

If we really wanted to surface startup costs like type loading properly, we could have another activity for that - though that seems like a bit much.

This might actually be not a bad idea, since this will also allow us to trace NpgsqlConnection.ReloadTypes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, makes sense. If it's not much work/changes, I'd try to exclude type loading from the activity span you're introducing here (simply because it really shouldn't be part of it). We can consider introducing a new activity for it, though I don't think that's very important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, makes sense. If it's not much work/changes, I'd try to exclude type loading from the activity span you're introducing here (simply because it really shouldn't be part of it).

The main problem is that we do consider it a part of physical open. That is, if we get an exception during bootstrap or from DataSourceConnectionInitializer, we'll break the physical connection. In that case, should we pass that exception to activity? It would be really weird if we don't (since we'll throw the exception but the trace will be green, even though the connection is already closed). But then, that means that we can't close the activity until we're absolutely sure the connection is stable.

Copy link
Contributor Author

@vonzshik vonzshik Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or to be more specific, we can specify Activity.Duration via Activity.SetEndTime, so while it's still be considered as active, the duration will remains as is, but IDK

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think anyone really calculates average time for any sort of traces (too much stuff to store), there are metrics specifically for stuff like this (for example, in Npgslq we have db.client.commands.duration). Beside, right now when we log physical open we pass the total time, including type loading.

FWIW calculating aggregate metrics from tracing data is a pretty standard thing to do, and indeed we could also not include the type loading time when logging.

But good point about the exception. I still think that fundamentally, type loading really shouldn't be counted as part of physical connection open; physical connections get opened all the time, whereas type loading is a one-time startup query that we happen to need (unless it's disabled by the user). It seems to just weirdly skew things to look at it as part of connection open.

But I'm OK with setting this aside for now and moving on - we can always change this in the future if we want.

@vonzshik vonzshik merged commit 251d73b into main Apr 9, 2025
14 checks passed
@vonzshik vonzshik deleted the 4136-trace-physical-open branch April 9, 2025 12:09
This was referenced Nov 23, 2025
github-actions bot pushed a commit to awslabs/aurora-dsql-connectors that referenced this pull request Mar 16, 2026
Updated [Npgsql](https://github.com/npgsql/npgsql) from 9.0.3 to 10.0.2.

<details>
<summary>Release notes</summary>

_Sourced from [Npgsql's
releases](https://github.com/npgsql/npgsql/releases)._

## 10.0.2

v10.0.2 contains several minor bug fixes.

[Milestone
issues](https://github.com/npgsql/npgsql/milestone/135?closed=1)

**Full Changelog**:
npgsql/npgsql@v10.0.1...v10.0.2

## 10.0.1

v10.0.1 contains several minor bug fixes.

[Milestone
issues](https://github.com/npgsql/npgsql/milestone/134?closed=1)

**Full Changelog**:
npgsql/npgsql@v10.0.0...v10.0.1

## 10.0.0

See the [release
notes](https://www.npgsql.org/doc/release-notes/10.0.html).

The full list of changes is available
[here](https://github.com/npgsql/npgsql/milestone/122?closed=1).

## What's Changed
* STJ 9.0 alternative approach by @​NinoFloris in
npgsql/npgsql#5941
* Remove support for net6.0 by @​roji in
npgsql/npgsql#5947
* Some leftover cleanup for removing net6.0 by @​roji in
npgsql/npgsql#5949
* Map date/time to DateOnly/TimeOnly by default by @​roji in
npgsql/npgsql#5948
* Make the cidr<->IPNetwork mapping the default by @​roji in
npgsql/npgsql#5950
* Fix connecting with VerifyCA and VerifyFull by @​vonzshik in
npgsql/npgsql#5944
* Remove stopwatch allocations by @​vonzshik in
npgsql/npgsql#5977
* Bump actions/setup-dotnet from 4.1.0 to 4.2.0 by @​dependabot[bot] in
npgsql/npgsql#5983
* Use exception convenience methods by @​bbowyersmyth in
npgsql/npgsql#5982
* Bump actions/setup-dotnet from 4.2.0 to 4.3.0 by @​dependabot[bot] in
npgsql/npgsql#6007
* Add support for postgresql type names with dots by @​dvas-hash in
npgsql/npgsql#5971
* Send close_notify TLS alert on connection shutdown by @​vonzshik in
npgsql/npgsql#5995
* Remove DisplayClass struct creation in PgReader by @​bbowyersmyth in
npgsql/npgsql#6014
* Always dispose RemoteCertificate on SslStream by @​vonzshik in
npgsql/npgsql#6022
* Remove LongRunningConnection field from NpgsqlConnector by @​vonzshik
in npgsql/npgsql#6024
* Tighten SCRAM-SHA-256 SASL check by @​vonzshik in
npgsql/npgsql#6023
* Add SHA3 hash algorithms for SASL authentication by @​vonzshik in
npgsql/npgsql#6028
* Remove dotnet SDK version from CI (use global.json) by @​roji in
npgsql/npgsql#6037
* Add support for specifying allowed auth methods by @​vonzshik in
npgsql/npgsql#6036
* Migrate to SLNX by @​roji in
npgsql/npgsql#6053
* Switch to Ubuntu 24.04 in CI by @​roji in
npgsql/npgsql#6054
* Bump actions/setup-dotnet from 4.3.0 to 4.3.1 by @​dependabot[bot] in
npgsql/npgsql#6059
* Add basic testing for tracing by @​vonzshik in
npgsql/npgsql#6051
* parameter-collection Clone() should set correct collection instance by
@​mgravell in npgsql/npgsql#6066
* Fix brew on mac CI by @​NinoFloris in
npgsql/npgsql#6071
* Fix adding to hash lookup while renaming an unnamed parameter by
@​vonzshik in npgsql/npgsql#6073
* Update LICENSE date (2024 -> 2025) by @​kurnakovv in
npgsql/npgsql#6082
* Add tracing for physical connection open by @​vonzshik in
npgsql/npgsql#6091
* Start testing on .NET 9 by @​vonzshik in
npgsql/npgsql#5945
* Turn on <IsAotCompatible> by @​roji in
npgsql/npgsql#6097
* Reenable public API analyzer by @​roji in
npgsql/npgsql#6101
* Update Npgsql to .NET 9 by @​vonzshik in
npgsql/npgsql#6099
* Ignore system CA store if root certificate is provided by @​vonzshik
in npgsql/npgsql#6102
* Fix reading columns asynchronously via JsonNet plugin by @​vonzshik in
npgsql/npgsql#6109
* Fixes #​6107 missed should buffer in biginteger numeric converter by
@​NinoFloris in npgsql/npgsql#6117
* Fix logging parameters with batches by @​vonzshik in
npgsql/npgsql#6079
* Implement GSSAPI session encryption by @​vonzshik in
npgsql/npgsql#6131
* feat: add support for PGAPPNAME to set application name by
@​michael-todorovic in npgsql/npgsql#6139
* Fix returning null from KerberosUsernameProvider.GetUsername with
concurrent calls by @​vonzshik in
npgsql/npgsql#6137
* Add NpgsqlTsVector.Empty by @​roji in
npgsql/npgsql#6145
* Add assert to NpgsqlCommand.Transaction if it's completed by
@​vonzshik in npgsql/npgsql#6151
* Compare normalized type names by @​0MG-DEN in
npgsql/npgsql#6011
* Do CI testing for PG18 (beta) by @​roji in
npgsql/npgsql#6155
* Fix infinite consume on error with connection break by @​vonzshik in
npgsql/npgsql#6161
* Bump actions/checkout from 4 to 5 by @​dependabot[bot] in
npgsql/npgsql#6174
 ... (truncated)

## 10.0.0-rc.1



## 9.0.5

v9.0.5 contains several minor bug fixes.

[Milestone
issues](https://github.com/npgsql/npgsql/milestone/131?closed=1)

**Full Changelog**:
npgsql/npgsql@v9.0.4...v9.0.5

## 9.0.4

v9.0.4 contains several minor bug fixes.

[Milestone
issues](https://github.com/npgsql/npgsql/milestone/127?closed=1)

**Full Changelog**:
npgsql/npgsql@v9.0.3...v9.0.4

Commits viewable in [compare
view](npgsql/npgsql@v9.0.3...v10.0.2).
</details>

[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Npgsql&package-manager=nuget&previous-version=9.0.3&new-version=10.0.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenTelemetry: instrument physical connection open/close

2 participants