Add SQL Connection & Statement hooks #1341

iand675 · 2021-11-19T14:19:35Z

This PR builds on top of #1327

It adds support for hooking into various aspects of the lifecycle of a connection / statements. This functionality opens up support for tracing queries within the context of a larger chunk of code:

To see how this is being used, see the hs-opentelemetry project: https://github.com/iand675/hs-opentelemetry/blob/main/instrumentation/persistent/src/OpenTelemetry/Instrumentation/Persistent.hs

Before submitting your PR, check that you've:

Documented new APIs with Haddock markup
Added @since declarations to the Haddock
Ran stylish-haskell on any changed files.
Adhered to the code style (see the .editorconfig file for details)

After submitting your PR:

Update the Changelog.md file with a link to your PR
Bumped the version number if there isn't an (unreleased) on the Changelog
Check that CI passes (or if it fails, for reasons unrelated to your change, like CI timeouts)

parsonsmatt

OK, having thoroughly reviewed this code:

I like it a lot. Great new feature. Excellent implementation. But we can make it a bit better.

Only expose data constructors and fields via the .Internal modules so we don't have to major-bump to add a field or change a field type. Expose accessors/modifiers/etc as appropriate. If you want to hand-this-off to me then I can take it home.
Don't change the type of MkSqlBackendArgs and instead convert from IORef (Map Text STatement) in mkSqlBackend. This allows us to release this as a minor version bump, instead of a major.

Thanks!!!

parsonsmatt · 2022-01-27T21:05:20Z

persistent-postgresql/Database/Persist/Postgresql.hs

    setConnInsertManySql insertManySql' $
    maybe id setConnRepsertManySql (upsertFunction repsertManySql serverVersion) $
-    mkSqlBackend MkSqlBackendArgs
+    modifyConnVault (Vault.insert underlyingConnectionKey conn) $ mkSqlBackend MkSqlBackendArgs


To get access to the underlying backend, use the RawPostgresql type

persistent/persistent-postgresql/Database/Persist/Postgresql.hs

Line 1845 in 69bdefd

data RawPostgresql backend = RawPostgresql

This isn't sufficient for instrumentation of existing codebases unfortunately. The value lets us wrap existing connections without requiring pervasive downstream changes.

parsonsmatt · 2022-01-27T21:17:16Z

persistent-postgresql/Database/Persist/Postgresql.hs

+underlyingConnectionKey :: Vault.Key PG.Connection
+underlyingConnectionKey = unsafePerformIO Vault.newKey
+{-# NOINLINE underlyingConnectionKey #-}


This doesn't feel right to me. My understanding of this behavior is:

We create this Vault.Key PG.Connection lazily, whenever the first time we call this function. Then the value is cached due to GHC's applicative-normal-form optimization. So, we generate exactly one Key Connection here, and it is a global constant.

When we call createBackend or getSimpleConn (whichever one comes first) then we create the underlyingConnectionKey. If we call getSimpleConn first, then we'll get Nothing in that function. If we call createBackend afterwards, then the Key Conn already exists, and we stuff the Conn into the Vault.

So, I think the problem with this, is that we are overwriting the underlying Conn with every call to createBackend. Is this a problem?

withPostgresqlPool delegates to withSqlPool open' which itself calls return $ constructor (createBackend logFunc ver smap) conn. Which means that, on every new SqlBackend that is created, we end up overwriting the value here, which means that getSimpleConn is going to get the most recent PG.Conn that has been created.

OK, so this is probably not an issue. While we do generate a single Key Conn, this is used to index into the SqlBackend's internal Vault, not some globally available Vault. And, since there's only ever exactly one, then we can easily recover the PG.Connection for a given SqlBackend.

This is a bit gnarly but it shouldn't have any problems.

Points 1 and 2 are intentional. We need a stable key in order to be able to reference the connection from external instrumentation tooling. AFAICT, there's no way you could call getSimpleConn prior to constructing a backend via createBackend (at least without digging into internals), so you're guaranteed by virtue of the order of operations to have the key be initialized and able to return the underlying PostgreSQL connection. Worst case scenario, you can't prove that the connection is to a PostgreSQL database and don't do enhanced instrumentation.

Regarding 3, I don't see how we're overwriting the underlying call in createBackend, since it's just an immutable function that constructs a backend from its constituent parts?

persistent/Database/Persist/Sql/Raw.hs

parsonsmatt · 2022-01-27T21:27:51Z

persistent/Database/Persist/Sql/Run.hs

+data SqlPoolHooks m backend = SqlPoolHooks
+    { alterBackend :: backend -> m backend
+    -- ^ Alter the backend prior to executing any actions with it.
+    , runBefore :: backend -> Maybe IsolationLevel -> m ()
+    -- ^ Run this action immediately before the action is performed.
+    , runAfter :: backend -> Maybe IsolationLevel -> m ()
+    -- ^ Run this action immediately after the action is completed.
+    , runOnException :: backend -> Maybe IsolationLevel -> UE.SomeException -> m ()
+    -- ^ This action is performed when an exception is received. The
+    -- exception is provided as a convenience - it is rethrown once this
+    -- cleanup function is complete.
+    }


This is nice!

To make backwards compatibility easier, I'd want to hide this constructor/accessors in an .Internal module, and expose an API for setting/modifying/adding hooks. Eg:

module Database.Persist.SqlBackend.SqlPoolHooks ( SqlPoolHooks , defaultSqlPoolHooks , setAlterBackend , modifyAlterBackend , addAlterBackendHook , addRunBeforeHook , addRunAfterHook -- etc.. ) where module Database.Persist.SqlBackend.SqlPoolHooks.Internal where data SqlPoolHooks m backend = SqlPoolHooks { alterBackend :: backend -> m backend -- ^ Alter the backend prior to executing any actions with it. , runBefore :: backend -> Maybe IsolationLevel -> m () -- ^ Run this action immediately before the action is performed. , runAfter :: backend -> Maybe IsolationLevel -> m () -- ^ Run this action immediately after the action is completed. , runOnException :: backend -> Maybe IsolationLevel -> UE.SomeException -> m () -- ^ This action is performed when an exception is received. The -- exception is provided as a convenience - it is rethrown once this -- cleanup function is complete. }

parsonsmatt · 2022-01-27T21:29:02Z

persistent/Database/Persist/SqlBackend.hs

      SqlBackend
    , mkSqlBackend
    , MkSqlBackendArgs(..)
+    , SqlBackendHooks(..)


Suggested change

, SqlBackendHooks(..)

, SqlBackendHooks

really would like to minimize additions to the public API that require breaking changes to modify (eg data constructor direct exposure)

persistent/Database/Persist/SqlBackend/Internal.hs

parsonsmatt · 2022-01-27T21:32:27Z

persistent/Database/Persist/SqlBackend/Internal/MkSqlBackend.hs

    -- ^ This function generates the SQL and values necessary for
    -- performing an insert against the database.
-    , connStmtMap :: IORef (Map Text Statement)
+    , connStmtMap :: StatementCache


Hmm. We may be able to avoid making this a breaking change, if there's a way to go from IORef (Map Text Statement) to a StatementCache in the function that translates a MkSqlBackendArgs into a SqlBackend.

OK, yeah, this can definitely be a minor version bump! Which means there's no urgency to figure out the other breaking changes and try to bundle them together. I think this is the only breaking change in the PR.

I want to mention that I'd like to ensure that alternative statement caches can be provided too. In particular, the end goal here is that we could for example provide counters of the statement cache to detect memory leaks (real situation that can occur for long-lived postgresql connections currently AFAICT). Or as another example, we might want to provide an LRU version of the statement cache that can evict. That's why this was introduced as a breaking change. That could technically be deferred til a later date.

You should be able to write setConnStatementCache :: StatementCache -> SqlBackend -> SqlBackend, without breaking this.

In hindsight, I wish I made this a Maybe field - or even omit it from the list entirely. There's an obvious default, though it does mean that mkSqlBackend :: MkSqlBackendArgs -> IO SqlBackend needs to be done to produce the cache instead.

So, for a 2.13 release, can we avoid breaking this?

Then, for 2.14, we can either remove this (and only use the setConnStatementCache) or make it a Maybe field, and also make mkSqlBackend in IO.

parsonsmatt · 2022-01-27T21:33:06Z

persistent/Database/Persist/SqlBackend/StatementCache.hs

+module Database.Persist.SqlBackend.StatementCache
+  ( StatementCache
+  , StatementCacheKey
+  , mkCacheKeyFromQuery
+  , MkStatementCache(..)
+  , mkSimpleStatementCache
+  , mkStatementCache
+  ) where


blessed API 🙌🏻

parsonsmatt · 2022-01-27T21:33:55Z

persistent/Database/Persist/SqlBackend/StatementCache.hs

+data MkStatementCache = MkStatementCache
+  { statementCacheLookup :: StatementCacheKey -> IO (Maybe Statement)
+  -- ^ Retrieve a statement from the cache, or return nothing if it is not found.
+  --
+  -- @since 2.14.0
+  , statementCacheInsert :: StatementCacheKey -> Statement -> IO ()
+  -- ^ Put a new statement into the cache. An immediate lookup of
+  -- the statement MUST return the inserted statement for the given
+  -- cache key. Depending on the implementation, the statement cache MAY
+  -- choose to evict other statements from the cache within this function.
+  --
+  -- @since 2.14.0
+  , statementCacheClear :: IO ()
+  -- ^ Remove all statements from the cache. Implementations of this
+  -- should be sure to call `stmtFinalize` on all statements removed
+  -- from the cache.
+  --
+  -- @since 2.14.0
+  , statementCacheSize :: IO Int
+  -- ^ Get the current size of the cache.
+  --
+  -- @since 2.14.0
+  }


Suggested change

data MkStatementCache = MkStatementCache

{ statementCacheLookup :: StatementCacheKey -> IO (Maybe Statement)

-- ^ Retrieve a statement from the cache, or return nothing if it is not found.

--

-- @since 2.14.0

, statementCacheInsert :: StatementCacheKey -> Statement -> IO ()

-- ^ Put a new statement into the cache. An immediate lookup of

-- the statement MUST return the inserted statement for the given

-- cache key. Depending on the implementation, the statement cache MAY

-- choose to evict other statements from the cache within this function.

--

-- @since 2.14.0

, statementCacheClear :: IO ()

-- ^ Remove all statements from the cache. Implementations of this

-- should be sure to call `stmtFinalize` on all statements removed

-- from the cache.

--

-- @since 2.14.0

, statementCacheSize :: IO Int

-- ^ Get the current size of the cache.

--

-- @since 2.14.0

}

data MkStatementCache = MkStatementCache

{ statementCacheLookup :: StatementCacheKey -> IO (Maybe Statement)

-- ^ Retrieve a statement from the cache, or return nothing if it is not found.

--

-- @since 2.14.0

, statementCacheInsert :: StatementCacheKey -> Statement -> IO ()

-- ^ Put a new statement into the cache. An immediate lookup of

-- the statement MUST return the inserted statement for the given

-- cache key. Depending on the implementation, the statement cache MAY

-- choose to evict other statements from the cache within this function.

--

-- @since 2.14.0

, statementCacheClear :: IO ()

-- ^ Remove all statements from the cache. Implementations of this

-- should be sure to call `stmtFinalize` on all statements removed

-- from the cache.

--

-- @since 2.14.0

, statementCacheSize :: IO Int

-- ^ Get the current size of the cache.

--

-- @since 2.14.0

}

codebase uses 4 space indent

parsonsmatt · 2022-01-27T21:42:02Z

persistent/Database/Persist/SqlBackend/StatementCache.hs

+-- | Make a simple statement cache that will cache statements if they are not currently cached.
+--
+-- @since 2.14.0
+mkSimpleStatementCache :: IO MkStatementCache
+mkSimpleStatementCache = do
+    stmtMap <- newIORef Map.empty
+    pure $ MkStatementCache
+        { statementCacheLookup = \sql -> Map.lookup (cacheKey sql) <$> readIORef stmtMap
+        , statementCacheInsert = \sql stmt ->
+            modifyIORef' stmtMap (Map.insert (cacheKey sql) stmt)
+        , statementCacheClear = do
+            oldStatements <- atomicModifyIORef' stmtMap (\oldStatements -> (Map.empty, oldStatements))
+            traverse_ stmtFinalize oldStatements
+        , statementCacheSize = Map.size <$> readIORef stmtMap
+        }


The only thing here that needs IO is the newIORef, so if we extract+purify,

mkSimpleStatementCache :: IORef (Map k v) -> MkStatementCache`

meaning that we can go from IORef (Map Text Statement) to a StatementCache purely, and can therefore do it in mkSqlBackend.

Co-authored-by: Matt Parsons <parsonsmatt@gmail.com>

parsonsmatt

looks great! thank you so much for the PR 😄

iand675 added 12 commits November 19, 2021 15:08

Replace (IORef (Map Text Statement)) with StatementCache interface

909bf90

Run stylish-haskell per PR template instructions

bdf7134

Bump versions, add changelog placeholders

7c6d57a

(unreleased)

132d2f8

bump persistent-test version

4b719f9

Changelog links

88e661a

add map back to mysql imports

694fc05

typo

290bd99

Add in hook facilities for sql query instrumentation

b5a5f67

export empty hooks

44b2d16

Add getter for RDBMS field

10f15e4

changelog bumps

b8f13c0

parsonsmatt added this to the 2.14 milestone Nov 19, 2021

iand675 added 3 commits November 19, 2021 16:45

Make runSqlPool version with extensible hooks support

b266e89

Improve defaultSqlPoolHooks to not be so error-prone

b58ea24

Provide access to postgresql-simple connection from SqlBackend

69bdefd

parsonsmatt requested changes Jan 27, 2022

View reviewed changes

iand675 and others added 6 commits January 27, 2022 22:48

Update persistent/Database/Persist/Sql/Raw.hs

3c97b8a

Co-authored-by: Matt Parsons <parsonsmatt@gmail.com>

Address code review feedback

1f808e9

Merge remote-tracking branch 'upstream/master' into sql-hooks

6ea6135

walk back statement cache introduction a bit

ba82369

wish i could get mysql working locally

2e43e40

update docstrings

5d32cd2

iand675 requested a review from parsonsmatt January 28, 2022 18:56

missed a few docstrings

52d63c8

parsonsmatt removed this from the 2.14 milestone Jan 28, 2022

parsonsmatt approved these changes Jan 28, 2022

View reviewed changes

parsonsmatt merged commit d7a67f0 into yesodweb:master Jan 29, 2022

Add SQL Connection & Statement hooks #1341

Add SQL Connection & Statement hooks #1341

Uh oh!

Conversation

iand675 commented Nov 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

parsonsmatt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iand675 Jan 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

parsonsmatt left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iand675 commented Nov 19, 2021 •

edited

Loading

iand675 Jan 27, 2022 •

edited

Loading