Brad Fitzpatrick ([info]bradfitz) wrote in [info]lj_dev,
@ 2003-05-27 02:29:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
memcached
You might've noticed things are faster lately.

You might've also noticed a lot of CVS activity dealing with "memcached".

You might've put the two together. :-)

Yes, we're now running a bunch of memory cache daemons. In the LJ code, we always try to get things first from memory, then go to the database only as a last resort. (and when we do, we put it in memory)

The API is essentially dictionary: set and get. (there's some stuff like add/replace/delete/get_multi/etc...)

The servers have bounded memory usage, and discard infrequently used items when memory is needed.

Any memory server can go up or down at any time, and the LJ code automatically adjusts.

I wrote the initial protocol, client and server, but the Perl server was way too inefficient. [info]avva rewrote the server in C, using a 2.4 backport of Linux 2.5's epoll() system call, Judy, and my constant nagging about nitpicky performance things. End result: it consumes hardly any CPU at all.

So, the server works. The APIs work. (oh, and it lets you transparently store objects in addition to regular scalars, so maybe we need more buzzwords: "distributed, fail-safe, object caching system".)

The next step is making more stuff use it. So far we cache:

-- logtext, talktext
-- logprops, talkprops
-- user login sessions
-- translation strings
-- userpic widths/heights/owners

This is just a start.

My end goal is to cache everything possible. I want pages to serve entirely from memory whenever possible.

We'll be buying as much memory as needed to keep cache hit rates high. (and luckily it's damn cheap.) We'll also be converting database servers we no longer need (due to this change) into memory cache servers.

server is in wcmtools:memcached/. client APIs are in livejournal:cgi-bin/LJ/MemCache.pm.

Um... questions?

Update: I spoke too soon. Our machine with 12 GB of RAM died, exposing a shortcoming in our dead-host detection, making the site worse than no memcache. Fixing.


(44 comments) - (Post a new comment)


[info]jc
2003-05-27 04:13 am UTC (link)
I'd be curious to know how much memory is currently in use, and how much more you'll think may be needed in six months' time...

(Reply to this)


[info]ntang
2003-05-27 04:50 am UTC (link)
Does the memcache server run on every webserver, or in a separate layer? If it's on every server, how do you deal with the fact that a person could jump from server to server? (Or does their cookie lock them in? I can't remember, it's too early in the morning.) If it's a separate layer, is there any auto-failover mechanism if one box dies, or do the webservers connected to it just fail?

Hey, you should post an architectural diagram of how things work nowadays, there've been so many changes over the past year or so since your last one, it'd help a lot in the visualization of things.

(Reply to this) (Thread)


[info]avva
2003-05-27 05:24 am UTC (link)
memcache daemons run on separate machines, not on webservers. Webservers talk to them by TCP.

The memory cache is global in the sense that a particular item will always be allocated to the same daemon among those currently running (selected by a hash function). So even though many memcache daemons are running simultaneously (currently on the same machine, but soon on several servers), Perl code running on any webserver will ask the correct memcache daemon for information (it'll ask the same daemon which was earlier asked to store this information).

The memory cache is always redundant in the sense that if it doesn't have the information -- because it was never stored in the cache, because it was deleted to free memory for more recently used items, or for other reasons including crashes and whatnot -- the client-side code will just fetch it from the database. The purpose of the memory cache is to reduce reads from the database as much as possible by providing efficient caching of as much info as possible, with a high hit rate.

(Reply to this) (Parent)(Thread)


[info]ntang
2003-05-27 09:26 pm UTC (link)
Well, I guess my question was this - what happens if the memcache daemon dies entirely? (Or fails to respond, or the server it runs on dies, etc. etc. etc.)

Will the request fail to connect, time out after a second or two, and connect directly to the DB? Will the webserver freeze, trying to establish a connection to the memcache? How is that handled?

Also, how is data expired from the cache, to prevent stale entries?

Finally: does the memcache daemon also act as a connection pooling layer, or do the webservers still connect directly to the db's when the memcache servers don't have the data they requested?

Thanks! :)

(Reply to this) (Parent)(Thread)


[info]bradfitz
2003-05-27 11:21 pm UTC (link)
Web server first talks to memcached, then to db if necessary. If memcached is empty, dead, or not responding, it times out quickly, marks it as dead for a minute or so and moves on. It's entirely optional. If it's quickly available and has the results, it's used, and DB queries are avoided.

LRU cache.

No connection pooling necessary. MySQL connections are so cheap, web servers connect directly as needed, then disconnect at the end of the request. (this is a change from earlier) We now run tons of webservers (and more coming soon), all start and ending DB connections at will, cached only within the scope of a request.

(Reply to this) (Parent)

Re:
[info]avva
2003-05-28 03:15 am UTC (link)
Will the request fail to connect, time out after a second or two, and connect directly to the DB?

That's what happens now.

Will the webserver freeze, trying to establish a connection to the memcache?

That's what happened yesterday, before it was fixed.

Also, how is data expired from the cache, to prevent stale entries?

Every time an entry is fetched from the cache, it's moved to the head of least-recently-used linked list of all entries in the cache. Whenever memory must be freed, items are dropped off the bottom of this list, so what's freed is what hasn't been used for the longest time.

Finally: does the memcache daemon also act as a connection pooling layer, or do the webservers still connect directly to the db's when the memcache servers don't have the data they requested?

memory cache is self-contained and doesn't talk to db. If a webserver can't find what it needs it memory cache, it talks to the db directly. The percentage of cases it does find what it needs in memory cache is the cache't hit rate, it's been around 70% lately.


Thanks! :)

(Reply to this) (Parent)(Thread)


[info]uke
2003-05-30 12:55 pm UTC (link)
I think that the original question had to do with detecting when an entry has been obsoleted by a newer version with different data, which your answer didn't address. Or do these objects never change?

(Reply to this) (Parent)(Thread)

Re:
[info]avva
2003-05-30 12:59 pm UTC (link)
Sure, whenever a new value for the same key comes along, it deletes the old one (there're also conditional commands, useful in some cases, like add="only store this value if there's no existing value for this key" or replace="only store this value if it replaces existing value for this key").

(Reply to this) (Parent)(Thread)


[info]uke
2003-05-30 01:05 pm UTC (link)
Ok cool, thanks! I hope you don't mind if I press on this some more though. It sounds like you've described something in which the database has to know which data items are cached, so it can push out updated versions to the caches where such items are in fact cached. Either that, or everything is cached. (Or you have the possibilities of reading inconsistent data.)

(Reply to this) (Parent)(Thread)

Re:
[info]avva
2003-05-30 01:49 pm UTC (link)
The database has no knowledge of the memory cache and the memory cache has no knowledge of the database.

It is the webserver, when it wants to update some information, that first updates it on the memory cache and then writes it into the database. The memory cache is kept current because any kind of information that's kept in it is always refreshed in it near that place in the code where it's written into the database.

(Reply to this) (Parent)


[info]jc
2003-05-27 05:33 am UTC (link)
I've been reading through the database schemas recently, but entity relationship diagrams (ick) would probably help a lot of people's understandings.

(Reply to this) (Parent)


[info]compwiz
2003-05-27 05:44 am UTC (link)
So does this have anything to do with some friends pages showing entries out of chronological order?

(Reply to this) (Thread)


[info]bradfitz
2003-05-27 08:50 am UTC (link)
No.... nothing like that's been touched.

Friends pages are always sorted by date posted, not date their computer said it was. Maybe one of your friends is in a different time zone, or has their computer's clock set wrong.

(Reply to this) (Parent)(Thread)


[info]compwiz
2003-05-27 09:42 am UTC (link)
Nope, I've seen [info]agentlizzle's friends page update with one new entry on top, and then after a reload, a new entry shows up directly underneath.

(Reply to this) (Parent)(Thread)


[info]mart
2003-05-27 10:45 am UTC (link)

Among other possibilities, the entry which appeared “below” may have initially been invisible to you due to security or backdating and later had this changed so it showed up for you.

(Reply to this) (Parent)(Thread)


[info]compwiz
2003-05-27 10:47 am UTC (link)
I'd considered those, but it's happened multiple times over the course of the past few days with different users, none of which changed backdating or security.

(Reply to this) (Parent)(Thread)


[info]controversial
2003-05-27 09:39 pm UTC (link)
I have noticed this as well, on occasion.
You are not mad.

(Reply to this) (Parent)


[info]damnitnicole
2003-05-27 09:51 pm UTC (link)
SAme here. For me, happens mostly with community entries. Odd.

(Reply to this) (Parent)


[info]equally_diverse
2003-05-27 10:48 pm UTC (link)
I also noticed, but also, comments. Yeah, comments don't always show up right away, some taking between 1 and 10 minutes. I don't get it either.

(Reply to this) (Parent)


[info]henningz
2003-05-27 11:07 pm UTC (link)
you are right. this happened all the week (and still does today). the problem is more significant the more friends you have on your list. the developers either didn't notice or probably pretend not to notice :(

(Reply to this) (Parent)(Thread)


[info]compwiz
2003-05-27 11:18 pm UTC (link)
out of curiosity, are all the people who are reporting this bug on the chef cluster?

(Reply to this) (Parent)(Thread)


[info]henningz
2003-05-27 11:32 pm UTC (link)
hm... nope, I'm on the Santa cluster

(Reply to this) (Parent)

reporting this bug, on the chef cluster?
[info]justgoto
2003-05-28 01:11 am UTC (link)
I am on chef and I don't notice those problems, mainly my problem is the pages will time-out.

(Reply to this) (Parent)


[info]agentlizzle
2003-05-27 11:20 pm UTC (link)
They pretend not to notice and then claim backdating or security. :)

(Reply to this) (Parent)(Thread)


[info]henningz
2003-05-27 11:30 pm UTC (link)
that's exactly what they always do when something doesn't work ;)

(Reply to this) (Parent)(Thread)

Re:
[info]agentlizzle
2003-05-27 11:31 pm UTC (link)
Hehe easiest "solution" for them, I guess.

(Reply to this) (Parent)(Thread)


[info]henningz
2003-05-27 11:37 pm UTC (link)
No, the easiest would be just to ignore. But claiming security makes them look intelligent ;)

(Reply to this) (Parent)(Thread)


[info]bradfitz
2003-05-27 11:40 pm UTC (link)
Can you please stay on topic?

If you want something fixed, file an intelligent bug report instead of complaining that we suck.

(Reply to this) (Parent)(Thread)


[info]henningz
2003-05-27 11:51 pm UTC (link)
Sorry, nobody complains that you suck...
:(

(Reply to this) (Parent)


[info]yubbie
2003-05-28 12:56 am UTC (link)
Yea, I've noticed this 4-5 times in the last week. An entry will appear a couple of items down in my friends list, sandwiched between two I've already seen. Or I'll post the first comment in a thread, and go back and look, and it's solo. And then look again a few minutes later, and there's another comment that was aparently posted 5 minutes before mine that has suddenly appeared.

I'd chalked it up to crap being committed by different machines and caches not timing out, but hadn't considered until now that it might not supposed to happen that way.

(Reply to this) (Parent)


[info]pne
2003-05-28 01:36 am UTC (link)
[info]rahaeli said elsewhere that this might be due to NTP problems on Green? This was in response to a post asking about the wrong-order problem.

(Reply to this) (Parent)


[info]legolas
2003-05-27 01:20 pm UTC (link)
Excuse my slightly negative comments but for some reason the lj servers seem to be slower to me, on top of reporting this error almost constantly:

--------------
Server Error
The following error occurred:

The server closed the connection while reading the response. Contact your system administrator. (SERVER_RESPONSE_CLOSE)
Please contact the administrator.
--------------
(the error message is probably be mozilla 1.2.1 's message for a serverside timeout or something, ie simply says 'internal server error')

Does this possibly have anything to do with the caching? I access the servers at almost the same times every day, around 21h GMT, perhaps some process like backup or the like destroys caches at these times? In fact, while posting this, I have the error in 2 windows, and in this one if I hit the post button...

I'm really just thinking outloud here, if it seems to make no sense it's probably because it doesn't...

(Reply to this) (Thread)


[info]mxfreak
2003-05-28 04:24 am UTC (link)
I've had the same behaviour over the last few days, sometimes to such an extent that I've given up on posting in the evenings.

(Reply to this) (Parent)(Thread)


[info]the_summer_wind
2003-05-28 05:51 pm UTC (link)
After waiting and waiting, I just give up, also.

(Reply to this) (Parent)


[info]equally_diverse
2003-05-27 10:45 pm UTC (link)
I notice a really nice change of pace with the livejournal servers, but still, pages time out, but on the second refresh quickly come back to life. Any idea what the problem is?

(Reply to this)


[info]terwilliger
2003-05-30 01:48 am UTC (link)
Brad,
Did you guys try out (with tweaking) MySQL 4.0's built in query caching? Was the performance gain non-substantial or the caching not flexible enough?

- T

(Reply to this) (Thread)


[info]bradfitz
2003-05-30 01:56 am UTC (link)
As part of the memcached release & docs, I'll be writing more on that topic but in short, it sucks. (in helps in some cases, but not for us, and not for lots of sites....)

You'll have to wait for the better answer later.

(Reply to this) (Parent)(Thread)


[info]boobay
2003-06-06 10:11 am UTC (link)
mysql is a great product but you are right, with an activated query cache I got an almost 'crash': mysql got stuck after 2 days, load avg went > 120 as mysql got slower and slower, had to restart the daemon.

(Reply to this) (Parent)

ok i dont understand all that but.......
[info]xlittlejx
2003-06-01 02:16 pm UTC (link)
i do have a question...
how come i cant modify my journals layout or colors..??..please post back in my journal.thank you

(Reply to this)

All in RAM?
[info]gaal
2003-06-07 04:51 am UTC (link)
I'm guessing the size of the entire LJ database is about 300GB. How difficult would it be to bring up the following setup?

master:
- used only for writes

slaves:
- big machine with a *lot* or RAM, say half a TB
- big ramdisk
- mysql files on the ramdisk
- writes go to masters, which are disk-based and slow.

Issues:
- finding an architechture with that much RAM
- making the ramdisk use it
- failover: when a slave dies and comes back up, it'll need
to read 300gb from somewhere; ouch.

Pros: when it works, there'll be nothing as fast.

(Reply to this) (Thread)

Re: All in RAM?
[info]gaal
2003-06-07 05:17 am UTC (link)
Okay, forget about that. The biggest RAM size I could find on a single box was 40GB, and that was on a monster server from IBM. (Dell had something with 32GB.)

I wonder, though, if RAM sizes (and prices!) are going to catch up with LJ data size. If so, it may be worth revisiting this scheme one day.

(Reply to this) (Parent)

Re: All in RAM?
[info]bradfitz
2003-06-07 10:26 am UTC (link)
You totally missed the point of the new architecture.

Your proposal sucks for lots of reasons:

-- can only have 64 GB in an x86 box, and then it gets slower since you have to use PAE.

-- if that machine dies, all that cache is lost. better to have it distributed.

-- you're still using mysql, which blocks all reads during writes, even if the writes are fast.

You say: "when it works, there'll be nothing as fast"

But that's false. Our current setup is way faster than MySQL even on memory. MySQL has to parse queries, form an optimizer plan, seek around in b-tree indexes, keeps lots of metadata, etc... And if you do use the MySQL in-memory tables, they have to be fixed-size records and only up to 2 GB of cache.

Our setup lets us scale to any amount of memory (add more machines to the pool), and if one dies, there's no pain.. only lost a small fraction, and soon the hit rates are up again, because those requests are now evenly hashed against the remaining alive machines.

Our setup has no concurrency issues. We can be updating new versions of an objecy while a thousand readers are still slowly reading the old object (where slowly is 100 MBps).

I just can't see your point of view.

We have 14 GB of cache online right now and we're getting about an 85% cache hit rate. So basically the databases aren't doing anything but writes.

(Reply to this) (Parent)(Thread)

Re: All in RAM?
[info]gaal
2003-06-07 12:36 pm UTC (link)
Ah, you're storing objects whose construction was expensive. I have this "I see now" look on my face, thanks for the clarifications.


Some more questions though:

- does it make no sense to use 64-bit machines, if this (presumably) makes RAM much easier to access?

- the entire cache is one global namespace. How will key clashes be avoided? (essentially: how come using ":" and "." for delimiters safe?)

(Reply to this) (Parent)(Thread)

Re: All in RAM?
[info]bradfitz
2003-06-07 09:02 pm UTC (link)
- does it make no sense to use 64-bit machines, if this (presumably) makes RAM much easier to access?

It makes sense to use whatever's most cost-effective. Right now I'm even considering using a bunch of VIA Mini-ITX motherboards with cheap VIA x86 clone processors. We don't need a great SCSI or IDE or RAID subsystem... just ethernet, a crappy processor, and as much memory as we can get. A little Mini-ITX all-in-one, fanless motherboard might be just the solution.

- the entire cache is one global namespace. How will key clashes be avoided? (essentially: how come using ":" and "." for delimiters safe?)

We just select our key names with an ounce of consideration. Like:

[object_type]:[userid]:[itemid]

That's not hard at all.

(Reply to this) (Parent)


(44 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…