Skip to content

Conversation

@iand675
Copy link
Contributor

@iand675 iand675 commented Nov 19, 2021

This PR builds on top of #1327

It adds support for hooking into various aspects of the lifecycle of a connection / statements. This functionality opens up support for tracing queries within the context of a larger chunk of code:
image

To see how this is being used, see the hs-opentelemetry project: https://github.com/iand675/hs-opentelemetry/blob/main/instrumentation/persistent/src/OpenTelemetry/Instrumentation/Persistent.hs

Before submitting your PR, check that you've:

  • Documented new APIs with Haddock markup
  • Added @since declarations to the Haddock
  • Ran stylish-haskell on any changed files.
  • Adhered to the code style (see the .editorconfig file for details)

After submitting your PR:

  • Update the Changelog.md file with a link to your PR
  • Bumped the version number if there isn't an (unreleased) on the Changelog
  • Check that CI passes (or if it fails, for reasons unrelated to your change, like CI timeouts)

@parsonsmatt parsonsmatt added this to the 2.14 milestone Nov 19, 2021
Copy link
Collaborator

@parsonsmatt parsonsmatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, having thoroughly reviewed this code:

I like it a lot. Great new feature. Excellent implementation. But we can make it a bit better.

  1. Only expose data constructors and fields via the .Internal modules so we don't have to major-bump to add a field or change a field type. Expose accessors/modifiers/etc as appropriate. If you want to hand-this-off to me then I can take it home.
  2. Don't change the type of MkSqlBackendArgs and instead convert from IORef (Map Text STatement) in mkSqlBackend. This allows us to release this as a minor version bump, instead of a major.

Thanks!!!

setConnInsertManySql insertManySql' $
maybe id setConnRepsertManySql (upsertFunction repsertManySql serverVersion) $
mkSqlBackend MkSqlBackendArgs
modifyConnVault (Vault.insert underlyingConnectionKey conn) $ mkSqlBackend MkSqlBackendArgs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get access to the underlying backend, use the RawPostgresql type

data RawPostgresql backend = RawPostgresql

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't sufficient for instrumentation of existing codebases unfortunately. The value lets us wrap existing connections without requiring pervasive downstream changes.

Comment on lines +371 to +373
underlyingConnectionKey :: Vault.Key PG.Connection
underlyingConnectionKey = unsafePerformIO Vault.newKey
{-# NOINLINE underlyingConnectionKey #-}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel right to me. My understanding of this behavior is:

  1. We create this Vault.Key PG.Connection lazily, whenever the first time we call this function. Then the value is cached due to GHC's applicative-normal-form optimization. So, we generate exactly one Key Connection here, and it is a global constant.
  2. When we call createBackend or getSimpleConn (whichever one comes first) then we create the underlyingConnectionKey. If we call getSimpleConn first, then we'll get Nothing in that function. If we call createBackend afterwards, then the Key Conn already exists, and we stuff the Conn into the Vault.
  3. So, I think the problem with this, is that we are overwriting the underlying Conn with every call to createBackend. Is this a problem?

withPostgresqlPool delegates to withSqlPool open' which itself calls return $ constructor (createBackend logFunc ver smap) conn. Which means that, on every new SqlBackend that is created, we end up overwriting the value here, which means that getSimpleConn is going to get the most recent PG.Conn that has been created.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so this is probably not an issue. While we do generate a single Key Conn, this is used to index into the SqlBackend's internal Vault, not some globally available Vault. And, since there's only ever exactly one, then we can easily recover the PG.Connection for a given SqlBackend.

This is a bit gnarly but it shouldn't have any problems.

Copy link
Contributor Author

@iand675 iand675 Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Points 1 and 2 are intentional. We need a stable key in order to be able to reference the connection from external instrumentation tooling. AFAICT, there's no way you could call getSimpleConn prior to constructing a backend via createBackend (at least without digging into internals), so you're guaranteed by virtue of the order of operations to have the key be initialized and able to return the underlying PostgreSQL connection. Worst case scenario, you can't prove that the connection is to a PostgreSQL database and don't do enhanced instrumentation.

Regarding 3, I don't see how we're overwriting the underlying call in createBackend, since it's just an immutable function that constructs a backend from its constituent parts?

Comment on lines 108 to 119
data SqlPoolHooks m backend = SqlPoolHooks
{ alterBackend :: backend -> m backend
-- ^ Alter the backend prior to executing any actions with it.
, runBefore :: backend -> Maybe IsolationLevel -> m ()
-- ^ Run this action immediately before the action is performed.
, runAfter :: backend -> Maybe IsolationLevel -> m ()
-- ^ Run this action immediately after the action is completed.
, runOnException :: backend -> Maybe IsolationLevel -> UE.SomeException -> m ()
-- ^ This action is performed when an exception is received. The
-- exception is provided as a convenience - it is rethrown once this
-- cleanup function is complete.
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice!

To make backwards compatibility easier, I'd want to hide this constructor/accessors in an .Internal module, and expose an API for setting/modifying/adding hooks. Eg:

module Database.Persist.SqlBackend.SqlPoolHooks 
  ( SqlPoolHooks
  , defaultSqlPoolHooks
  , setAlterBackend
  , modifyAlterBackend
  , addAlterBackendHook
  , addRunBeforeHook
  , addRunAfterHook
  -- etc..
  )
  where

module Database.Persist.SqlBackend.SqlPoolHooks.Internal where

data SqlPoolHooks m backend = SqlPoolHooks
    { alterBackend :: backend -> m backend
    -- ^ Alter the backend prior to executing any actions with it.
    , runBefore :: backend -> Maybe IsolationLevel -> m ()
    -- ^ Run this action immediately before the action is performed.
    , runAfter :: backend -> Maybe IsolationLevel -> m ()
    -- ^ Run this action immediately after the action is completed.
    , runOnException :: backend -> Maybe IsolationLevel -> UE.SomeException -> m ()
    -- ^ This action is performed when an exception is received. The
    -- exception is provided as a convenience - it is rethrown once this
    -- cleanup function is complete.
    }

SqlBackend
, mkSqlBackend
, MkSqlBackendArgs(..)
, SqlBackendHooks(..)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
, SqlBackendHooks(..)
, SqlBackendHooks

really would like to minimize additions to the public API that require breaking changes to modify (eg data constructor direct exposure)

-- ^ This function generates the SQL and values necessary for
-- performing an insert against the database.
, connStmtMap :: IORef (Map Text Statement)
, connStmtMap :: StatementCache
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. We may be able to avoid making this a breaking change, if there's a way to go from IORef (Map Text Statement) to a StatementCache in the function that translates a MkSqlBackendArgs into a SqlBackend.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yeah, this can definitely be a minor version bump! Which means there's no urgency to figure out the other breaking changes and try to bundle them together. I think this is the only breaking change in the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to mention that I'd like to ensure that alternative statement caches can be provided too. In particular, the end goal here is that we could for example provide counters of the statement cache to detect memory leaks (real situation that can occur for long-lived postgresql connections currently AFAICT). Or as another example, we might want to provide an LRU version of the statement cache that can evict. That's why this was introduced as a breaking change. That could technically be deferred til a later date.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to write setConnStatementCache :: StatementCache -> SqlBackend -> SqlBackend, without breaking this.

In hindsight, I wish I made this a Maybe field - or even omit it from the list entirely. There's an obvious default, though it does mean that mkSqlBackend :: MkSqlBackendArgs -> IO SqlBackend needs to be done to produce the cache instead.

So, for a 2.13 release, can we avoid breaking this?

Then, for 2.14, we can either remove this (and only use the setConnStatementCache) or make it a Maybe field, and also make mkSqlBackend in IO.

Comment on lines +2 to +9
module Database.Persist.SqlBackend.StatementCache
( StatementCache
, StatementCacheKey
, mkCacheKeyFromQuery
, MkStatementCache(..)
, mkSimpleStatementCache
, mkStatementCache
) where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blessed API 🙌🏻

Comment on lines 20 to 42
data MkStatementCache = MkStatementCache
{ statementCacheLookup :: StatementCacheKey -> IO (Maybe Statement)
-- ^ Retrieve a statement from the cache, or return nothing if it is not found.
--
-- @since 2.14.0
, statementCacheInsert :: StatementCacheKey -> Statement -> IO ()
-- ^ Put a new statement into the cache. An immediate lookup of
-- the statement MUST return the inserted statement for the given
-- cache key. Depending on the implementation, the statement cache MAY
-- choose to evict other statements from the cache within this function.
--
-- @since 2.14.0
, statementCacheClear :: IO ()
-- ^ Remove all statements from the cache. Implementations of this
-- should be sure to call `stmtFinalize` on all statements removed
-- from the cache.
--
-- @since 2.14.0
, statementCacheSize :: IO Int
-- ^ Get the current size of the cache.
--
-- @since 2.14.0
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data MkStatementCache = MkStatementCache
{ statementCacheLookup :: StatementCacheKey -> IO (Maybe Statement)
-- ^ Retrieve a statement from the cache, or return nothing if it is not found.
--
-- @since 2.14.0
, statementCacheInsert :: StatementCacheKey -> Statement -> IO ()
-- ^ Put a new statement into the cache. An immediate lookup of
-- the statement MUST return the inserted statement for the given
-- cache key. Depending on the implementation, the statement cache MAY
-- choose to evict other statements from the cache within this function.
--
-- @since 2.14.0
, statementCacheClear :: IO ()
-- ^ Remove all statements from the cache. Implementations of this
-- should be sure to call `stmtFinalize` on all statements removed
-- from the cache.
--
-- @since 2.14.0
, statementCacheSize :: IO Int
-- ^ Get the current size of the cache.
--
-- @since 2.14.0
}
data MkStatementCache = MkStatementCache
{ statementCacheLookup :: StatementCacheKey -> IO (Maybe Statement)
-- ^ Retrieve a statement from the cache, or return nothing if it is not found.
--
-- @since 2.14.0
, statementCacheInsert :: StatementCacheKey -> Statement -> IO ()
-- ^ Put a new statement into the cache. An immediate lookup of
-- the statement MUST return the inserted statement for the given
-- cache key. Depending on the implementation, the statement cache MAY
-- choose to evict other statements from the cache within this function.
--
-- @since 2.14.0
, statementCacheClear :: IO ()
-- ^ Remove all statements from the cache. Implementations of this
-- should be sure to call `stmtFinalize` on all statements removed
-- from the cache.
--
-- @since 2.14.0
, statementCacheSize :: IO Int
-- ^ Get the current size of the cache.
--
-- @since 2.14.0
}

codebase uses 4 space indent

Comment on lines 45 to 59
-- | Make a simple statement cache that will cache statements if they are not currently cached.
--
-- @since 2.14.0
mkSimpleStatementCache :: IO MkStatementCache
mkSimpleStatementCache = do
stmtMap <- newIORef Map.empty
pure $ MkStatementCache
{ statementCacheLookup = \sql -> Map.lookup (cacheKey sql) <$> readIORef stmtMap
, statementCacheInsert = \sql stmt ->
modifyIORef' stmtMap (Map.insert (cacheKey sql) stmt)
, statementCacheClear = do
oldStatements <- atomicModifyIORef' stmtMap (\oldStatements -> (Map.empty, oldStatements))
traverse_ stmtFinalize oldStatements
, statementCacheSize = Map.size <$> readIORef stmtMap
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing here that needs IO is the newIORef, so if we extract+purify,

mkSimpleStatementCache :: IORef (Map k v) -> MkStatementCache`

meaning that we can go from IORef (Map Text Statement) to a StatementCache purely, and can therefore do it in mkSqlBackend.

@iand675 iand675 requested a review from parsonsmatt January 28, 2022 18:56
@parsonsmatt parsonsmatt removed this from the 2.14 milestone Jan 28, 2022
Copy link
Collaborator

@parsonsmatt parsonsmatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! thank you so much for the PR 😄

@parsonsmatt parsonsmatt merged commit d7a67f0 into yesodweb:master Jan 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants