Top.Mail.Ru
? ?
15 April 2006 @ 12:51 pm
DRAFT: The Culprits: Current list of privs

This list is intended to track privs by merit, rather than privs as recorded by the system. Please let me know if I missed anybody or got any wrong, which I'm sure I did. (I have blanket-decided that all admins are I2s in Web by merit if they weren't SHes before, since honestly, if any of you want to work on Web I'm not making you start out at screened. I mean, seriously, that would be screwed up. If any of you guys decide to get reviews, your next step will be merit I3.)

I1

I2

I3

SH

Admin

15 April 2006 @ 09:44 am
DRAFT: Memcache, and other Blacke Magick

What is a memcache? Is it like a cache that tells you what X-man you are?

If you're not familiar with web caching in general, hit up the web-cache training post before reading this one. All set? Good.

Memcache is a system created by the LiveJournal development team to reduce load on their database servers. (It has since been adopted by other sites, but its primary users and maintainers are still affiliated with LiveJournal.) To explain what it's for, we're going to have to explain a little bit about how LJ (and lots of web apps) work. The way a page on LiveJournal is generally loaded is this:

[Diagram of a LiveJournal page load]

Trust me, it's less complicated than it looks. The lines of this color represent an actual connection, the lines of this color represent paths the connection could have taken, and lines of this color represent the database's internal syncing mechanisms. So here's what's happening:

  1. You enter a LiveJournal URL and hit return.
  2. Your browser talks to the load balancer.
  3. Unbeknownst to your browser, the load balancer goes out to find a web server to answer the request, and puts you in touch with that server.
  4. For (mostly) unchanging content like images, this is as far as we need to go. The web server sends the content back to your browser. But most pages on LJ aren't unchanging; they're dynamic. So what happens next is that the web server goes to ask the database server to generate the content on it.
  5. The database server digs out your journal entry (or any other dynamically-generated stuff) from its internal storage, and passes it to the web server, which then integrates it into the page and sends it back to you.
  6. If you changed anything in the database when you hit the page (by updating your journal, for instance), the database server that you were talking to passes those changes to the other database servers. This process is called replication.

This picture gets more complicated when we talk about clusters and I'm elliding some subtleties about master/slave database relationships, but this is close enough for government work.

The problem with this system is that the connection to the database server is expensive, because that's where all the data really is and accessing it frequently involves talking to a bunch of disks. (When dealing with computers, disks are generally the slowest part of the system.) Ideally, as few connections to those servers as possible should happen. So what memcache does is store the results of your database query on one of the web servers when you make it, just like your browser cache does. Next time someone asks for that same piece of data, the web server figures out which server should be caching it, and checks there first before getting in touch with the database server. When you change that piece of data, the web server erases it from the cache when the database server is updated so that the next query will see the real data instead of whatever was there last time someone looked at the page. This cache is stored entirely in the memory of the machine (hence the name memcache), which makes it very fast.

This all sounds great, or at least it did once my head stopped hurting. So what's the problem?

There are a lot of things that can go wrong here, but there are two that are of primary concern to us:

Replag

Replag is short for "replication lag". This occurs when the database server that you sent your data to doesn't immediately update the other database servers about it, usually because it's a lot busier than it usually is for some reason. If this happens, you might connect to the wrong database server and see the old version of the data instead of the new one, and it's a piece of data that gets memcached, that old data might get put into the cache. If it's not memcached, this can result in someone seeing different versions of the page every time they reload it (since which web and database server you get sent to is, from your point of view, totally random.

Stale Memcache

If the memcached version of a page isn't cleared when the page is updated, the old version will still appear to be there (even if the new version has been properly stored in the database). This looks very much like a browser caching problem, except clearing the cache doesn't help with it.

So what can be done?

Most problems like this can be solved by time. And because they initially resemble caching problems, this means we'll never see them; we tell the user to clear their browser cache, and in the time between when we say this and when the user reads the answer, the blockage has been cleared and all is right with the world again. If the page gets updated by the user, that will also generally clear things out. Some problems of this kind are more persistant, though.

Web SHes have access to a tool that lets them clear the memcache for a particular user. In cases where a page is stuck in an old state, we're 100% sure that the page should be updated, and the old state is visible to the volunteers as well as the user (hence not browser cache), this tool can be used on the user that owns that page to force it to update. This is not a step that should be taken lightly, because it puts a lot of strain on LiveJournal's system to repopulate all that data, which is why use of it is restricted to SHes. However, if you're an interim, and you think a problem may be due to memcache, please IC and explain why you think this; just because you can't run the tool yourself doesn't mean you can't help out, and SHes aren't all-knowing. This tool also isn't perfect; it purges most things, but not every single conceivable memcached thing attached to a user. So if you don't notice any change in the user's journal after purging (and you've cleared your browser cache), it didn't do what you wanted.