Metadata caching - Parsed SQL and prepared statement handle caching#272
Metadata caching - Parsed SQL and prepared statement handle caching#272
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #272 +/- ##
============================================
+ Coverage 37.23% 37.79% +0.55%
- Complexity 1652 1972 +320
============================================
Files 102 107 +5
Lines 23660 28006 +4346
Branches 3873 4846 +973
============================================
+ Hits 8810 10585 +1775
- Misses 13250 15527 +2277
- Partials 1600 1894 +294
Continue to review full report at Codecov.
|
|
If the parse cache is going to be shared across multiple connections you'll need a way to evict invalid entries. Otherwise the following will break a long running application until it restarts:
Step 4 would repeatedly give an error with no way to succeed till the app is bounced and the cache is reset. One solution to this would be to have entries evicted if methods that are using the cache throw an exception. An error would still happen if the cache is out of date but at least it'd only happen once per running application. Subsequent attempts to repeat the same operation would then succeed as the next attempt would get fresh metadata. Alternatively, having the cache be at the Statement level mostly avoids this issue as you'd only run into it if the DDL change happens in the middle of repeated usage of the same Statement. As most JDBC usage involves creating Statements and then closing them when the unit of work is complete, the errant window would be much smaller. Breaking it out into pros / cons:
Note that I'm not advocating one over the other ... just providing some clarity on options. If it can be disabled via connection properties and gracefully handle errant evictions, the shared cache would likely be the best for the common case where DDL changes are not happening. |
|
Wow, @brettwooldridge & @sehrope, thanks for all of the feedback! I just added a simple implementation of actual reuse of created server-side prepared statement handles. The prepared statment handle cache is currently controlled by the SQLServerConnection.preparedStatementHandleCacheSize_SHOULD_BE_CONNECTION_STRING_PROPERTY field, a value > 0 means cache is on :-). This cache should further improve performance. We now have two caches:
Trying to reply as best I can:
Thanks! |
|
Regarding the Guava dependency, a few things.
Regarding the static cache, I think you are correct. As a pure parsed SQL cache, I don't see a reason not to share VM-wide. I was thinking ahead, and injecting my own assumptions about implementation, assuming that the metadata cache would eventually be married with the parsed SQL cache. But that is not necessarily so, clearly. Regarding the 988 private ResultSet buildExecuteMetaData() throws SQLServerException {
989 String fmtSQL = sqlCommand;
990 if (fmtSQL.indexOf(LEFT_CURLY_BRACKET) >= 0) {
991 fmtSQL = (new JDBCSyntaxTranslator()).translate(fmtSQL);
992 }
993
994 ResultSet emptyResultSet = null;
995 try {
996 fmtSQL = replaceMarkerWithNull(fmtSQL);
...But it seems the result of line 199 userSQL = translator.translate(sql);Great to see the appearance of a metadata cache ... or the beginnings of one! 🎉 I'll need a closer reading before, so take these comments with a grain of salt... I think the prepared statement handles can be shared across Connections, right? I believe the server-side access plans are shared across connections, so it should be possible -- assuming an access scheme keyed by the hostname/port/database like that discussed above. I also believe that in the end, one of the biggest performance wins will come from caching the metadata returned by |
|
Thx @brettwooldridge! For Guava your point on not being a simple drop-in replacement anymore settles it for me. Will replace :-) Execution plans are shared across connections, and even users but prepared handles are not. They are tied to a specific connection. You can see if you try to do sp_prepare on two connections that both will get handle number 1 back. Grain of salt applied :-) WRT to the metadata from sp_prepare we actually never call sp_prepare. All of our drivers now call sp_prepexec instead (which always returns metadata). I.e. we don't do the prepare until the first execution and then we prepare and execute at the same time. Following executes call sp_execute. buildExecuteMetaData seems to be only used if you want metadata for a statement without executing (in fact this should be changed from using set fmtonly to sp_describe_first_result_set). Good point on row 991! Like you say it should be changed to: Made the change :-) Thanks! |
|
@TobiasSQL Most Java ORMs call the equivalent of Connection.prepareStatement() followed by stmt.getParameterMetaData(). It was the poor performance of that pattern in jTDS that led to my implementing metadata caching back in the day. However we implement the metadata caching, it is good to keep that common pattern in mind. |
|
@TobiasSQL Could you please open this pr to dev branch? Thank you. |
|
@v-afrafi, changed to dev, thanks for noticing! |
|
As an alternative to Guava's cache, how about caffeine? ( https://github.com/ben-manes/caffeine ) It is a near drop-in replacement for Guava's cache, it was written by the guy that made the original guava cache (with lessons learned), and it has a much smaller footprint in general compared to guava. |
|
A synchronized LRU is supported by Java's Attempting to solve that led to ConcurrentLinkedHashMap. This was used heavily (Cassandra, etc), and often embedded (Groovy, Camel, etc). It is small and JDK6 based. Guava's Cache was retrofitted with a CLHM's design and embeds many features worked out in custom decorators. This provides many best practices, such as using memoization rather than racy get-compute-put. Unfortunately its original design around soft references hindered our efforts, since we had to work within that framework rather than from a stronger foundation. Caffeine is a JDK8 rewrite that improves on the above. It uses a modern eviction policy, offers more features, and has had very healthy adoption. However, this richness comes at the cost of a modestly sized jar. Some do shade it, but its probably not ideal for a jdbc driver. I'd recommend embedding CLHM as it is probably the best fit. |
|
@ben-manes & @cogman, thanks! Given concurrency is important for at least the parsed SQL text cache, CLHM sounds like a good option to embed. I will take a closer look at it once I get stuff into reasonable shape :-) @brettwooldridge , thanks for pointing out the ORM pattern using getParameterMetaData(), makes sense. I reworked things to also handle caching this metadata. Overall I am reworking a bunch of stuff from first prototype as it has a few issues, should hopefully be able to push some changes soon. |
|
Cleaned up prototype and fixed a bunch of issues. Depending on feedback next POR is to replace cache impl. and start using hash for cache key. |
|
I would suggest against using Sha-1 for the hash key. Sha-1 was designed to reduce collisions in a cryptographically safe manor, but it is slow. Faster algorithms exist which have low collision rates. Particularly, fnv3 or murmur would be preferable as high performance hashing algorithms with low collision rates. With that said, it probably would be a better idea to use the string as the key but limit the size that will be cached, since large statements are less likely to be reused. If you are concerned about size, then you could compress the statement. But I wouldn't do that as a first step. |
|
As an FYI the commits coming in now are fixes based on findings from our test suites that have not been GitHub:ed yet. |
|
Got it. I'll run the benchmark later today. |
|
@TobiasSQL Have you had a look at my The reference counting is tightened up further still. |
|
Here are the benchmark results, Best of Three runs: mssql-jdbc-6.1.7.jre8.preview.jar mssql-jdbc-6.1.8-SNAPSHOT.jre8.jar ( mssql-jdbc-6.1.8-SNAPSHOT.jre8.jar ( |
|
Previous benchmark was the oltpbenchmark mssql-jdbc-6.1.7.jre8.preview.jar mssql-jdbc-6.1.8-SNAPSHOT.jre8.jar ( mssql-jdbc-6.1.8-SNAPSHOT.jre8.jar ( |
|
Left comments for your PR @brettwooldridge, sorry for the delay! I am hoping we get the current caching finalized for release next week. The week after should hopefully work great, I am thinking you can then submit a PR against the dev branch (rather than metadataCaching). We mainly need to make sure all issues found by the other test cases are resolved first. |
|
@TobiasSQL Sure, sounds great. I'm still testing and tweaking my
mssql-jdbc-6.1.7.jre8.preview.jar 🐢 mssql-jdbc-6.1.8.jre8.preview.jar ( |
|
Beautiful! @brettwooldridge, is the comparison branch based on current state of metadataCaching, dev, or master, or something else? |
|
The comparison was with v6.1.7. |
|
@brettwooldridge, you can now open a PR against dev for your improvements :-) |
|
Never mind, we have some more things to go through before the merge. Expecting it next week. |
|
@brettwooldridge, this is a great point. Agree with the implementation, makes sense with calling setCapacity to me. I was noodling on this as well, not sure about tying it to the discard queue they seem to have at best a distant relationship ;-) Two options come to mind:
Looking around, additionally I think we need to provide a limit for the max statement text size to place in the parse cache to prevent accidental memory hogging. My suggestion is to start with a conservative and fairly large value (say 10k), we can later make it configurable in a following release. Thoughts? |
|
Given that the statement cache isn't static (which I forgot) it seems to me that only option 1 above makes sense, i.e. separate configuration knob (get/setParsedSQLCacheSize). |
Still a prototype, opened PR to allow for conversation.