“Re-mirroring Storm”

You may have heard about problems with Amazon’s server farm earlier this week and, I think, last weekend. I just ran across a technical description of what happened:

When this network connectivity issue occurred, a large number of EBS nodes in a single EBS cluster lost connection to their replicas. When the incorrect traffic shift was rolled back and network connectivity was restored, these nodes rapidly began searching the EBS cluster for available server space where they could re-mirror data. Once again, in a normally functioning cluster, this occurs in milliseconds. In this case, because the issue affected such a large number of volumes concurrently, the free capacity of the EBS cluster was quickly exhausted, leaving many of the nodes “stuck” in a loop, continuously searching the cluster for free space. This quickly led to a “re-mirroring storm,” where a large number of volumes were effectively “stuck” while the nodes searched the cluster for the storage space it needed for its new replica. At this point, about 13% of the volumes in the affected Availability Zone were in this “stuck” state.

After the initial sequence of events described above, the degraded EBS cluster had an immediate impact on the EBS control plane. When the EBS cluster in the affected Availability Zone entered the re-mirroring storm and exhausted its available capacity, the cluster became unable to service “create volume” API requests. Because the EBS control plane (and the create volume API in particular) was configured with a long time-out period, these slow API calls began to back up and resulted in thread starvation in the EBS control plane. The EBS control plane has a regional pool of available threads it can use to service requests. When these threads were completely filled up by the large number of queued requests, the EBS control plane had no ability to service API requests and began to fail API requests for other Availability Zones in that Region as well. At 2:40 AM PDT on April 21st, the team deployed a change that disabled all new Create Volume requests in the affected Availability Zone, and by 2:50 AM PDT, latencies and error rates for all other EBS related APIs recovered.

I don’t think any gaming companies were affected, though Zynga might have been. But I don’t care. I love failure analysis in general, and in computer systems in particular. I’m such a geek. A 3000 year old, fabulously scarlet-coiffed geek, to be sure, but a geek, through and through.

UPDATE: The outage started just after midnight April 21 PDT.

Crunch and Fluff

Brian “Psychochild” Green asks the question “Why all this number focus?” in MMO’s today. To clarify, he’s talking about number focus on the part of the game designers.

I played a lot of D&D in college, along with other RPGs like Paranoia, White Wolf’s Vampire and Werewolf, etc. I met my GF of 16 years playing D&D. Our group played together for nearly 4 years, and some of us played virtual tabletop RPGs and MMOs together later. Now, we weren’t originally the most story-focused group. We often played Vampire: the Masquerade in the “fanged superheroes” mode rather than purely as the goth “oh my tortured soul” way. Personally, I liked Werewolf: the Apocalypse better for this style of play, but we did some good role-playing as well.

Playing DDO for a bit has put this into stark contrast. Despite being a fairly power-gaming heavy group, I don’t think I focused on stats nearly as much as I have in DDO. For example, in DDO, you have items that give bonuses to your individual stats. I might find some Ogre Power gloves that give +1 to +6 Strength. These items are pretty common in the game and increase in power with levels. In my pen and paper games playing 3rd edition for several months, we saw 0. That’s right, none. Even back in university playing power-gaming-focused 2nd edition we had less than a half dozen items that increased stats after years of regular play and multiple parties.

Lest you think this is essential to the genre, I offer a counterexample: Legend of Zelda. Link has no stats at all. There is no “build”.

As a player, I often think that high stats are the refuge of the unimaginative. The obvious thing to do for a fighter is push your AC and hit points to the maximum. The obvious thing for a dps focused role is to maximize damage output – for a melee type that means STR and weapons.

But why game designers? I’m not sure. Maybe they think that the endless quest for improving gear is what motivates player to keep playing. And for a long time, it did. I think there are other things that will motivate people though. Things like autonomy, mastery and purpose.

But designers face the same dilemna as players. If a player strays from the defined path, they risk failure. And they will fail on their own. If you can’t tank the raid boss when you have the best hp’s and AC on the server, then it wasn’t your fault. But if you don’t have the best numbers then it was your fault.

For a designer the success criteria is different. If a designer got rid of the gear/numbers grind, then when players stop playing after a month, they will be second guessed. And the bigger the budget of a game, the bigger the repercussions if a game flops.

Psychochild’s commenter Stabs has a really interesting comment:

I feel diku MMO is a soap bubble about to burst into dozens of different types of games. The only thing holding it together is the feeling you have to add every feature to attract every type of player (eg WAR crowbaring in a rudimentary and ridiculous form of crafting, mocked in the latest Kiasacast, when they suddenly worried about not appealing to the crafting sub-market). There are many game elements that just don’t work together any more – open world gameplay conflicts with instancing, battlegrounds kill rvr (the losing side just queues for honour on a plate rather continue with asymmetrical conflict).

It’s a good time for games.

[The link to a definition of "diku" is mine, -toldain]

They Really Do Have More Fun, No Really


I’ve decided I’ll spend the next three million years as a blonde. I think it looks fabulous, don’t you? I will set the trend for the Wilhelm Arcturus Decade. I’d start a Facebook page, but I don’t want to look like a copycat, a silly bird.

Speaking of silly birds, maybe I’ll be able to get one as a pet, now that
CCP announced that they will be introducing mounts and pets to their popular MMO, EVE Online. I’m really looking forward to earning some of those achievements.

And it will let me stand out when I go to Psychochild’s campaign rallies. He announced today that he is running for president, on a promise to fix all the bugs. I say, we need to refocus our military spending and put a lot more enchanters into the military. I mean, come on. We all know that enchanters are a big force multiplier. Instead of fighting 1000 enemy soldiers all at once, we could kill them off one at a time! Think how much less mana our medics would have to burn! And with Clarity making our gasoline and jet fuel last longer, we’d level up our military that much faster for the endgame!

I haven’t heard from Tipa yet today, but knowing how much she loves Wizard101, she’s probably playing with her new pet.