I am excited to continue on the #PGSQL Phriday blogging series. This series was first started by Ryan Booz earlier this year. We are now on the third edition, which is all about the community.
What is the PostgreSQL community to you?
I personally am relatively new to the PG community. I only started attending events and working with PG about 5 years ago. I’ve talked to a lot of different people in the community and I’ve found many different people have different ideas of what the community is and where it is.
I would love to hear more varied views and opinions on this topic. Here are some ideas if you need help getting started with a blog post.
List of resources you commonly use in the community?
Favorite event(s) that you attend to make you closely connected with the community?
A story of help/work provided by you or someone else in the community?
How would you get started in the community?
How does #PGSQLPhriday work?
Post your blog post about “What is the PostgreSQL community to you?” by Friday, December 16th.December 2nd.
Include “PGSQL Phriday #003” in the title or first paragraph of the blog posting.
Link back to this blog posting, for example: “Pat here is my voice on Community! ‘PGSQL Phriday #003′”.
Announce your blog post in one or any of these places:
Always have a backup! You might need this if you are working on a restore as well.
This blog post is related to #PGSQLPhriday. The original post by Ads can be seen here.
Here is the question that Ads has asked in the post.
Describe how you do backups for your PostgreSQL databases.
Which tool(s) are you using, where do you store backups, how often do you do backups?
Are there any recommendations you can give the reader how to improve their backups?
Any lesser known features in your favorite backup tool?
Any new and cool features in a recently released version?
Bonus question: Is pg_dump a backup tool?
If you go back to ads’ original blog post, he asks, “Backups? Do you need that?”. When I read this as the topic for #PGSQLPhriday, I thought to myself this guy is crazy, he must have had too much ice cream if he believes we don’t need backups! But then I realized he was just using that as a headline to get me reading more. Thank you ads for helping my heart to skip a beat a little. I owe you a good ice cream at the next event for that one.
Let’s Talk Backups
I’m still fairly new to Postgres having only started about 5 years ago. I started with Pg9.6 and we quickly moved that environment to pg10. When I arrived at this company, pg_dump was the only backup we were using. The DB at that time was still around 50GB, it was reasonable to do the backups in a timeframe that worked for us. A dump was done every night and stored off to another server. After some time we started to test Pg_basebackup. This allowed us to full server backup each night. It was a huge improvement as far as speed and ability to handle much larger data sizes.
The downside to this approach is that you have to restore the full server directory to another server or over the top of your existing server. In the world of the DBA, one of the more frequent things you do is restore just a table that gets dropped by a DBA, Engineer, Sysadmin, or even the CTO sometimes. When you have to simply restore one table the base backup makes that pretty hard since you HAVE to restore the entire Server/Directory to another location.
Now don’t get me wrong it’s not easy to do that with PG_dump either, but with Dump at least you can just point that to your same server and just a new DB name. That means it’s usually a lot faster to restore just that table. Also with the Dump, there is a good chance it will restore the data quickly. If you already have the structure of the table you can simply restore the data to the original database and not wait for things like Indexes or keys to be restored from the dump.
Regardless of the tools or the Database (SQL Server, PG, MySQL, ES) here are some key items I always make sure of in any of my Database situations
Test all Backups! You have NOT backed up if you don’t test the backup by restoring it. This doesn’t have to be every file but you should have an audit restore running every X days/week that can test a backup file for you. Just do it!
Make sure others can backup/restore and write down the Backup and Restore procedure. Even if you have a seasoned team of DBA’s you never know when the whole group will be unavailable. Sometimes you need the person that knows the least amount about restoring data to be able to restore the data. We always maintain documentation on how to restore the DB and make sure several teams in the company have both the permissions and the understanding of how to do it.
Know how you are planning to restore the DB. 99.99999% of the time you are not going to restore over the top of the DB/Server you are already using. Most of the time you are going to restore to a new server/new system/different DB. Think about this when you are designing your backup and restore process. A lot of tools/systems will talk about how quickly they can put the whole drive back in place from just a few hours ago. But rarely does a customer want everything back to a few hours ago. Typically it’s just one aspect of the Database that’s been impacted and you don’t want to roll back all the other pieces of the system. When designing your strategy you should be thinking about what a restore looks like and what you typically will use it for.
Don’t just backup the DATA! Many database systems have configurations and settings that are key to making the database work properly. Perhaps you have a custom configuration DB in your system that you need to make the application work. In PG We would back up the PG conf file as well with our backup scripts. This allowed us to restore the server if needed. You don’t need to back it up as frequently as the data files but it should still get backed up.
Stop backing up to the server! Yes, I still see this at customer locations. It’s perfectly fine to write the backup to a local disk and THEN synchronize it to the cloud, s3, or tape drive(they still exist). Just make sure you are doing that step and have alerts if that step fails. If you go back to #1 in this post you should run your test restore from your cloud backups, not off the local disk. If you only have backups on a local server and a local drive even if it’s in a data center do yourself a favor and get an s3 bucket and synchronize the backups to the bucket. Recently I had a customer not be able to retrieve the data from the data center because the data center got hit with ransomware. They locked everything down and wouldn’t let anything out. If we had a backup in s3 or on the cloud we would have had a copy and could have kept moving forward and rebuilding. This almost put the company out of business but luckily we found a backup outside the data center that was a few months old.
I have so many more stories I could tell about backups and restore! I will keep this post relatively short and dedicate a future post related just to these items. I hope this list will help you to build your backup strategy. I would love to hear about issues you have had with backups and restores!
This post will be related to #PGSQLPhriday! Here are the rules/requirements for the post.
“ For this first event, I want you to write a blog post that shares three of your go-to Postgres best practices, but write it in the form of the game Two truths and a lie.
For example, perhaps there are three configuration settings you always modify whenever setting up a new cluster. You could share why these settings are important to modify and how you find the correct values, but for one of them, use Ilya’s comical approach to teach us the correct way to set the value. (ie. “I never set shared_buffers to more than 128MB to conserve server resources.” )
For my 3 tips I wanted to talk about logical replication and configuration. I do a lot of work with DW/Reporting/replication. I figured it would be good to talk about some of the key parameters you need to know when setting up logical replication. I will keep this in the form of 2 truths and a lie, I’ll reveal at the bottom which one is which.
wal_level = logical
Probably one of the most key things to set for logical replication is your wal_level. For logical replication it has to be set to Logical. I typically have this set to “logical” for most of my servers. I rarely have a server that does not participate in logical replication and it’s best to just set this up front. If you know that you won’t be doing any replication then something like minimal is a good idea but I personally don’t run into this scenario often. You do have to restart the server for this setting. This is one of the reasons I keep it on my main checklist for a server to make sure it’s configured right when the server starts.
max_logical_replication_workers = 1
Keeping this at a low number means you are not utilizing as many server resources doing silly things like moving data! You should keep this as low as possible on your subscriber, this way you can automatically throttle your logical replication and it can always stay behind.
max_replication_slots = 10+
The default for max_replication_slots is 10, I frequently find this is very low for the servers I’m trying to do logical replication for. Since this also requires a restart I will frequently start this out at 25 so that I can avoid restarting simply to add new replications. Along with this parameter you should always move “max_wal_senders”. This should be the same number + any physical replication slots you have as well. Make sure these are moved together.
Let’s talk about Truth and Lies now! Pretty sure most of you out there would have sensed the lie already but let me make it very clear.
max_logical_replication_workers = 1
Please do NOT make this a low number. You Want this to be a higher number so that your logical replication can keep up. If you have a LOT of publications/Subscriptions you should think about increasing this. Be careful though this does take from “max_worker_processes”. If that number isn’t higher as well then you will run into problems. It’s hard to put specific good guidelines around this since it really depends on how much replication work you are doing and how many resources you have on the server. These are numbers to adjust and test and see what works for your system.
Thank you to Ryan for starting up #PGSQLPhriday, It was nice to get writing again and I am looking forward to more of them in the future!