Home
28 May 2009 @ 11:54 am
I'm posting here 'cause Ops/Eng is a little busy. This morning one of our user clusters just... fell over. This caused a plethora of error messages and some journals to be completely unavailable. The issue was resolved in less than an hour, and we've switched to the slave while Ops works on it.

Edit 12:40 p.m. PST/19:40 GMT: there have been a few lingering issues - we'll do a web tier restart soon, which has the potential to clear those up. web tier restart is done
 
 
Current Mood: working
 
 
Hello  LJ,

  ** Edit, had to push back a week due to shipping delays :( **

  Just wanted to give our users a heads up, that next week we will be performing work on our memcache tier. During this time, you may experience a little wait time while loading up journals and communities. We will be monitoring our website health and response times closely to ensure that our user experience is impacted as little as possible, however with maintenance work of this size, the possibilities of site slowness / intermittent downtime can increase.

These Upgrades will be performed on our servers during 16:00 - 24:00 UTC Apr 28st and 29nd.

We will be upgrading 1 machine at a time to minimize problems a user may experience.


Read more... )
Tags:
 
 
Current Location: Hiding Under Your Bed!!
Current Mood: optimistic
Current Music: Lagwagon - Smile
 
 
EDIT@6:12AM UTC

We're done with our work. No downtime or network drops were detected! Happy! All your positive thoughts definitely helped out! Please tell us that the site seems faster (even if it doesn't), cuz you know, I get needy when I'm away from home. :D Seriously, if you see any problems that are out of the ordinary [info]support is the best way to contact us.

----

It's that time again! In a little over 12 hours, I'll be heading to the airport, leaving on a jet plane, not quoting the rest of that John Denver song (bless his heart) and arriving in our frosty cold data center in Billings, Montana to do some NETWORK MAINTENANCE.

This maintenance window will be from April 4, 04:00 - 07:00 UTC. For the rest of us not in a zone that is even remotely UTC'ish, please check out this link and choose your city/timezone.

Though the window is 3 hours, I do not expect livejournal.com to be down anywhere near that time. Of course we'll be working during that time, and we will see network connectivity "blips" as well as potential slowness but the site should be up and functional.

The Work )
 
 
Current Mood: i tend to repeat myself
 
 
EDIT@07:34 UTC/GMT.
We're 1 important step closer to our goal! Tomorrow we have to break the db replicating briefly, push new code, test it, then if it goes well we'll push it to production and enjoy a slightly snazzier LJ by night time. There won't be a separate lj_maintenance post for that work though as it should (hopefully) be transparent to us.

Remember to check status.livejournal.org just in case things don't go according to plan; we'll make sure we update that.

---

This isn't a repeat of our previous post, rather an update. I got tired of doing all that "EDIT1", "EDIT2" hoopla. New posts all around!

The main focus originally was to get the network fixed. That changed, we changed, *I've* changed. We're going to try the network changes but AFTER we try the database changes. The focus is trying to get the db changes in place for basically one community, one very large community, one very hyper community. And also to get my friend to stop IM'ing about it. :D

This means that there *will* be complete downtime on our website, probably an hour or more. The time window has stayed the same.

high level tech details )
 
 
This isn't related to ONTD, that saga is still continuing.

Time converter here. Find your city and watch what day it is since for those of us in the USA it's actually going to be on Tuesday night that the work wil occur.

Even though the window is going to be 2 hours in length, LiveJournal will not be down that entire time. Just be aware that *during* this 2 hour window, connectivity may be slow or timeout completely.

Details are in this lj_maintenance entry, see point #1 which we had to postpone until we got the extra parts in.

bt