Skip to content

Add CachingConnectionPool class#565

Closed
ibrewster wants to merge 7 commits intopsycopg:masterfrom
ibrewster:master
Closed

Add CachingConnectionPool class#565
ibrewster wants to merge 7 commits intopsycopg:masterfrom
ibrewster:master

Conversation

@ibrewster
Copy link
Copy Markdown

With existing pool classes, the pool opens a fixed number of connections upon instantiation. If more connections are needed, it will open a connection, and then immediately close the connection upon it being returned to the pool. If fewer connections are needed, the unused connections are still held open forever, using resources on the DB server unnecessarily.

This commit adds a caching pool class that will keep returned connections open for a (user-configurable) period of time, enabling connection re-use and reducing connect/disconnect overhead during periods of heavy use, while freeing up resources by closing old connections during periods of lighter use, and removing the need to figure out a "magic number" of minion's to balance overhead and server resource use.

@dvarrazzo
Copy link
Copy Markdown
Member

Thank you for your contribution. As mentioned in #563, would you mind having a chat in the ML about this change?

I'm sure a lot of people would benefit of a more useful pool implementation; I haven't used the pool object very often myself, so I'm not the best person to assess if your design is the best thing to do (rest assured that I find it an improvement).

Would it also be possible to have pool-related tests in the test suite? I don't think there is any.

@ibrewster
Copy link
Copy Markdown
Author

ibrewster commented Jun 8, 2017

Yeah, I noticed the lack of tests. It was slightly disappointing, as I was planning to just copy and modify the existing tests for the ThreadedConnectionPool object for this one, but there wasn't one. I can look into what it would take to add some tests.

I have posted on the main python mailing list about this (in fact, I did that before I ever posted the original comment). Only one person there commented, suggesting I just up the minconn parameter. It looks like the mailing list link on the psycopg2 page links to the PostgreSQL mailing list, I could certainly post there as well.

Edit: Oh, I see, there's a psycopg board on the PostgreSQL mailing list site. Yeah, I'll go ahead and post there.

@dvarrazzo
Copy link
Copy Markdown
Member

Docs need update too :)

@ibrewster
Copy link
Copy Markdown
Author

You mean in general for the pool classes or just in relation to this addition? :) I've never been terribly good at documenting stuff. My boss harps on that every few months :P

@dvarrazzo
Copy link
Copy Markdown
Member

Just this class: http://initd.org/psycopg/docs/pool.html

The page is a mix of what introspected from the docstring and docs-specific material, if that is more fitting to the docs than to the docstring: see doc/src/pool.rst.

To generate the docs should be enough to run:

cd doc
make env
make html

@ibrewster
Copy link
Copy Markdown
Author

Well, the ML discussion doesn't appear to be generating much interest so far. Maybe I'm the only one concerned about connection overhead :)

When I was writing the tests, I noticed that some of the utility test base classes provided didn't seem to be python3 compatible. Am I just missing something?

@dvarrazzo
Copy link
Copy Markdown
Member

All the psycopg code and tests is written for python2 and converted to python3 by 2to3 (as it was customary when py3 support was added). you can use make PYTHON=python3 to convert the code and make PYTHON=python3 check to run the tests with a different Python interpreter (which results in a single command that you can run from command line anyway).

@floqqi
Copy link
Copy Markdown

floqqi commented Oct 9, 2017

@ibrewster @dvarrazzo any progress here? Would love to see this in the new release :)

@ibrewster
Copy link
Copy Markdown
Author

Code is written and tested. I put it out on the mailing list for comment, but never really got any feedback.

@zackw
Copy link
Copy Markdown

zackw commented Jan 29, 2018

I have a use for this code in a web application that alternates between brief periods of extremely high load (so we don't want to be starting a fresh connection for every request) and multi-week periods of no load at all (during which time the database server processes should not hang around and tie up resources needed for other activity on the shared host).

So, with my end-user-of-psycopg2 hat on, I would like to vote for the feature; but then I would also like to offer some feedback on the implementation. First off, it seems like expiry should be added at the level of AbstractConnectionPool so that it is available in all of the concrete pool classes; to put it another way, one could reasonably want this feature in the non-thread-safe pool as well as the thread-safe pool (maybe you're doing concurrency 100% with async/await). Relatedly, it looks like the code for CachingConnectionPool repeats a lot of the ThreadedConnectionPool logic, can you please try to factor that out? And finally, it doesn't seem sufficient to me to do cleanup only when a connection is released, because at that point you don't know if any more connections are going to come in before the expiration interval runs out. Shouldn't there be some sort of timer that gets started on putconn, and reset on getconn, and only if the timer runs out do you call prune?

@amachanic
Copy link
Copy Markdown

I just came here searching for a better connection pool than the default ones available, and discovered this rather old PR. Why was it never accepted?

Having spent a lot of time in the .NET world, I expect connection pools to work slightly differently than the Psycopg implementation. In ADO.NET the pool starts at the configured min size, and grows to the configured max size -- but along the way new connections are never closed. The pool keeps them, operating under the assumption that if at some point a high water mark is hit, it will probably be hit again. I think that's a much better default than the existing state of affairs in Psycopg.

That said, I really like the basic idea behind this PR. It is based on a much more sensible assumption than that of the built in pools, and can achieve an even better balance than the ADO.NET behavior. But I think the timing of the cache check is a bit weird (to @zackw's point). It seems to me that when I really want expiration is at some time after a spike, when things have cooled off. Consider that something kicked in, my program used 10 connections, and now it's an hour later and all is quiet. I'd like those connections to go away. I think that it makes sense, therefore, to launch a background thread that will wake up from time to time to do the appropriate work.

@ibrewster
Copy link
Copy Markdown
Author

The expiration method I wrote was done for simplicity and because it works well enough for my use case. I can certainly agree that it could be improved, however, to make it more useful for a wider variety of uses. That said, given that this PR has seen virtually no activity, it makes it harder to be motivated to improve on the code :)

I also agree with the points on refactoring - this module was based heavily off the ThreadedConnectionPool logic, and so it does share a lot of code. I didn't want to mess with any of the existing code, however, since it isn't mine, so I wrote it to be completely independent. That's something that could be changed if the authors of psycopg2 wanted it to be.

@dvarrazzo
Copy link
Copy Markdown
Member

@amachanic what you describes is a policy which belongs to an application. I don't think a library should start a thread, ever.

This MR slipped below my attention threshold possibly because of the lack of interest it has generated, yes. I find the proposed behaviour reasonable, but the pool is not an object I ever needed, or that I could ever use without further customization, and then there is pgpool/pgbouncer too which can be used. Smells like integration, again too many decisions to make for a driver to get them right.

We have still some time before releasing 2.8. @ibrewster would you like to hear again from the ML if they think your design of pool and connection lifetime make sense to other people, and what does @fogzot thinks about it. If there is enough consensus that moving there is a good idea I'm happy to merge. Otherwise I can make a judgement call myself, but it wouldn't be the most informed one.

@ibrewster I'd be also ok with the base classes being refactored in order to accommodate your subclass better.

In general, if someone thinks a certain feature would be useful for psycopg, and are not able to have it merged because opposition or because lack of focus, there is always the possibility to put your name on it and release it as an additional python module on pypi: the pool is just a normal object using psycopg objects, doesn't strictly need to be distributed by us.

@ibrewster
Copy link
Copy Markdown
Author

pgpool/pgbouncer actually don't address the issue for which I wrote this pool: the need to open/close many connections from the application, and the overhead involved with that. They'll reduce the load on the server, sure, as well as providing connecting queuing and load balancing, if desired, but the application still has to open and close connections to them just as often. So unless the overhead of connecting to one of them is significantly less than the overhead of connecting directly to the DB, they don't help one iota in improving performance of an application that makes many short-lived connections :)

That said, I do agree with your point about the (apparent) lack of interest. Aside from the comments this PR has received, no one seemed interested - at least, not enough to comment on the ML. I can try again though, see if anything comes of it this time. Or I might just go ahead and release it separately as you mentioned. Might be the easiest option - that way it is available if anyone wants it, but the psycopg2 code base doesn't get cluttered with unused modules.

@Changaco
Copy link
Copy Markdown
Contributor

I'm interested in this. I didn't realize that the existing pool classes were flawed until I saw this PR, but now that I know about it I'm guessing that it has a measurable impact on latency.

@Changaco
Copy link
Copy Markdown
Contributor

I'm going to open a new PR with a rebased and refactored version of the code in this branch.

@Changaco
Copy link
Copy Markdown
Contributor

The new PR is ready for review: #974.

@dvarrazzo
Copy link
Copy Markdown
Member

The psycopg2 pool probably will not receive further improvements. The psycopg3 pool behaves in ways compatible with what you kindly suggested and implemented here: see the description here.

@dvarrazzo dvarrazzo closed this May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants