Monday, March 18, 2013

Sometimes, Running a Business Stinks

The past 72 hours been a pretty wild ride. And it's highlighted some of the less glamorous aspects of running an online business. I think things are starting to stabilize this afternoon, but I'm pretty exhausted from the experience.

Things Were Running Smoothly When...

Friday was a pretty normal day. Quiet, even. Most of the day was spent working on plot development for NEO Scavenger. The day didn't go without it's hitches, but as days go, it was nothing unusual. I was happy to finish the day with some plot knots to untie over the weekend, so I wrapped-up, had some dinner, and Rochelle and I went bowling with some friends.

The next morning, I was pretty excited to see a sudden spike in traffic in my logs. NEO Scavenger was mentioned in a post apocalyptic survival reddit post, and dozens of folks were popping by the site to check out the game. Cool!

Site Maintenance

Except, my ISP was doing scheduled maintenance early Saturday morning. It wasn't until I tried loading that I realized it was down.

5-day Site Visitor Graph: Hourly

Well crap. That stinks. Just as a bunch of new folks are drawn to check out my game, the site goes dark. And worse, dark for close to 12 hours.

"Oh well, " I said. It's a bummer, but nobody's fault, really. I mean, I guess I could blame myself for not having a redundant server or something, but I'm not that high tech (or deep-pocketed). I decided to just live with it. Besides, friendly Reddit server take-downs were nothing new. Folks probably would just figure my little server was flooded temporarily.

Site Flakiness

Later that day, was back up and running. I did my usual email and forum check, to verify nobody was reporting any issues. And all seemed clear.

Except for one thing: every other page load seemed to turn up blank. No error message, no content, nothing. Just a blank white screen. Refreshing the page seemed to fix it, so I figured it was a temporal thing. "It'll sort itself out."

No, it won't. The following day, I was replying to a customer having issues with NEO Scavenger on their Mac, and it was still happening. Happening everywhere. Sometimes it was a content page on the site. Other times it was a forum. Even some of the site admin pages were failing. And as before, usually a reload would fix it. But the reloading was becoming less and less reliable. Sometimes, I had to reload the page 4-5 times to see anything. And if my customers were seeing the same thing, then that was unacceptable.

The White Screen of Death

I decided to dig into the issue a bit more. I started searching for related issues on the web, and was initially happy to see others reporting the same issue. Blank screens in Drupal (the content management system I use, v6.28) were pretty common. Maybe finding a solution will be easy?

However, upon further investigation, I was less happy to discover I had the same problem. This problem, as it was known to the Drupal community, was the "White Screen of Death."

The WSOD is a common issue with Drupal, but it doesn't have a common solution. In fact, there are almost as many causes of the WSOD as there are Drupal installations, and finding the right one for you can be a real quest.

The link above is probably the biggest authority on the issue, and even there, there are no less than 28 different causes listed in the article, and some 80+ comments detailing other issues customers have had. Basically, Drupal's WSOD is a symptom of practically every Drupal disease. I'm having trouble thinking of a human analogue. Headache? Fever? Common cold?

I spent hours on Sunday trying to make rhyme or reason of it. None of the remedies I saw seemed to help. I couldn't even get error or log messages. And worse, it was an intermittent problem, so I couldn't even reliably cause it to happen.

The only things I could verify about it were:

  1. I only get the WSOD on my live server. Migrating the db from live to my localhost didn't duplicate the WSOD issues.
  2. I only started getting the WSOD since my ISP's scheduled maintenance, when the server was (apparently) upgraded/restarted.
  3. I only get a WSOD in Chrome.
  4. Additionally, Chrome seemed to exhibit stylesheet issues when the page did load. Textareas would be too narrow to fit their <div>, or the page would nudge upward when I clicked a link (seemingly to adjust alignment with the Admin Menu module's topnav bar). Something was stalling the css until I clicked a link, upon which it would fix itself, then load a new page with stalled css (or js, or WSOD).
  5. Firefox never exhibits any issues.
  6. IE seems to work too, except for one page partially loading (later determined to be a known issue with YouTube embedding)
  7. No errors appeared on the page, in Drupal's logs, nor server logs, even with error reporting hard-coded to be on in index.php.
  8. Using Chrome's "Inspect Element" context menu option revealed that the page was entirely missing the <body> tag and contents. It was just an <html> and <head>, and the <head> seemed to be missing some elements. Also, Chrome usually complained of "Failed to load resource" on the page itself, but all css/js/images were loaded ok.
  9. Using "View Source" on Chrome apparently reloaded the page, and showed the correct, full content source.
  10. The WSOD appeared more frequently when js and css optimization was turned on (but still occurred when all Drupal optimization/compression/caching was disabled).
  11. Flushing all caches had no effect.
  12. Running update.php had no effect.
  13. Manually truncating Drupal db tables had no effect.
  14. Using Backup & Restore on Drupal's db had no effect.
  15. Disabling all modules outside of Core and Core Optional had no effect.
  16. Switching from BBG's custom theme to the Garland theme had no effect.
  17. Rebuilding permissions had no effect.
By the time Sunday evening rolled around, was stripped down to core modules, no theme, no caching, and had a rebuilt database. And it was still flaking out.

Worse, the node access modules I disabled caused the permissions table to get out-of-date, which caused all site content to disappear for all users, all the time. Basically, when the WSOD wasn't happening, all of the site's pages were empty blue shells, and the forums all had 0 posts in them.

Even worse still, to rebuild the permissions and fix the empty content, I had to run a script via Drupal. And that script failed with a WSOD whenever I attempted it.

I had totally messed up my site. The "Site Crash" label in the above image refers to the time when I took the site offline to avoid any more users seeing the empty shell of a site.

Fortunately, Firefox was able to run the permission rebuild script without any WSOD. And I was able to at least get the site showing content again.

But as my efforts continued past 10pm, I decided it wasn't going to be solved anytime soon. I started making changes necessary to return the site to the formerly flaky intermittent WSOD. It wasn't ideal, but the occasional user reload was a far cry better than no site at all. If nothing else, I wanted the forums and contact page online for users to report issues, if needed.

I posted a news item to the homepage alerting customers to the WSOD issue, and apologized for the inconvenience of the downtime. Then, I went to bed.

Come Monday, It'll Be Alright

I was back at the computer at 6:15am. And unsurprisingly, the site hadn't fixed itself. I fired up the usual websites, checked messages, looked for forum posts. Some users reported seeing similar WSOD issues. And, bless them, they blamed their internet connection instead of me.

I decided to try a different approach this time. Instead of grasping at straws offered by forums on the net, I decided to debug Drupal. I added print statements to Drupal's index.php, to see if I could trace the value of the content. And when that failed to reveal anything, I started adding traces to Drupal's core code (*.inc files).

I don't like doing this sort of thing, as I'm nervous about screwing things up worse than they are. Plus, doing it in a way that doesn't affect the live site's users is tricky. But in retrospect, it's the only way to really know what's going on.

I found a function in which loads various Drupal bits in phases: drupal_bootstrap($phase). Every page on a Drupal site calls this function first, doing a full bootstrap (all phases). I added a trace inside the while loop that executes for each phase as it loads, and I printed the ID of the phase.Testing on my local site, I could see that my site loads phases 0-8.

When I uploaded the modified file to the live server, I saw traces for all phases. Reload. All phases again. Repeat this another dozen times, on different pages where I most often encountered the WSOD. Everything loads normally.

Was the WSOD gone?

Tentatively, I backed out the changes, so it was back to normal. Still, the site seemed to be loading ok. I reenabled caching to normal. Still ok. Turned on page compression. Still ok. I stopped short of turning on optimized js and css. Maybe I'll muster the courage for that tomorrow.

A Wizard Did It

So what happened? Uploading that file seemed to stop the WSOD, and leave it fixed even after that file is restored. What dark magic is this?

That's actually my first theory: dark magic. But if you pin me down for a more mundane answer, I'm going to guess it was some sort of behind-the-scenes cache. I'm not sure what else would explain a site's complete performance alteration when a single file is uploaded and then un-uploaded again.

That was at about 11am today. As of 5pm, I haven't seen a WSOD. Mercifully, no players have posted in the White Screen tech support thread on my forums, either. I'm hopeful this issue is resolved.

But What About That Mac User?

Oh yeah, remember me mentioning way up there that this whole investigation started when trying to help a Mac user with NEO Scavenger? Yeah, Mac compatibility is an issue in it's own right, concurrent with the site debacle.

I'm not going to detail that issue here, as it's a pretty lengthy topic of it's own. The forum thread linked above has all the details. And what's more, I've partially touched on it in the past.

The short version is this: Flash is rapidly becoming as much a burden as a boon. For someone trying to develop a stand-alone application, I'm at the point where I am highly reluctant to recommend Flash as a viable option. Issues include:

  1. "Create projector" no longer supported on Linux, as of v11.2.
  2. Flash CS6 no longer supported on OSX 10.8.
  3. Any projectors one does create are going to trip security on modern OSes. And in Mac's case, Gatekeeper is a sticky issue for OSX 10.7.3 and 10.7.4.
  4. Digitally signing Flash projectors appears to have spotty support, unless one uses 3rd party wrappers (which are, in themselves, reportedly unreliable).
  5. Adobe's recommended solution, building AIR apps, is unsupported on Linux. Also, AIR's installation process is flawed, at best. Also, AIR has the periodic "Update AIR" nagging dialog.
  6. Flash is no longer officially supported on Android nor iOS.
I was a long adherent to Flash. It served me well. NEO Scavenger wouldn't be if it weren't for Flash's ease of use, and then-universal deployment options.

So it's unfortunate that this era appears to be in it's winter.

All's Well That Ends Well

The good news? At least we're back to normal. The site seems to be running again. Upgrading OSX to 10.7.5 seems to fix Flash projector Gatekeeper woes until I can find another way to certify projectors. I think I may actually be able to return to plot development tomorrow.

Let's just hope that stretch of actual game dev continues for a while!

Monday, March 4, 2013


Recently, I was contacted by a couple of digital games vendors (ShinyLoot and Amazon Digital Games) about the possibility of selling NEO Scavenger on their services.

First of all, that's great news! It's encouraging to have folks asking to host NEO Scavenger. And those services each represent good-but-different opportunities to serve a greater market. I look forward to exploring those, and other, options further as NEO Scavenger approaches completion.

The point of this post, however, is more about the big picture, and how to properly balance NEO Scavenger's sale across multiple vendors. Specifically, I want to muse on the topic of "cross play."

What Is Cross-Play?

When I say "cross-play," what I'm referring to is the ability to buy a game from one service, and to have that unlock access on other services. I realize there are a few definitions of "cross-play" out there, but for the purposes of this post, it's about buying the game once, and playing anywhere.

In the case of NEO Scavenger, buying beta access at automatically grants access on Desura. Similarly, buying a copy of the game on Desura allows the user to unlock a copy at, using the Desura Connect feature.

When NEO Scavenger was sold in the Be Mine 5 bundle, Desura keys were given out, so those customers were also eligible for cross-play. Though in retrospect, that was less than fully effective: many customers didn't realize they had Desura keys, and weren't aware how to update, or even that they could. I should probably see about doing keys-only, rather than binaries plus keys, in the future to reduce confusion.

Why Provide Cross-Play?

One of the biggest reasons to provide cross-play is simply because it makes the customer happy. Customers feel like they're part of something bigger, and dealing with a business that values them more than their dollar. It let's them know that no matter which store they bought your product in, they're your customer, and you appreciate them just the same. And customers who feel appreciated are more likely to return for future business.

It also helps keep the audience fragmentation to a minimum. If communities were to develop at each storefront, then those communities may diverge over time, making it difficult to satisfy all tastes as they become incompatible. Smoothing out inroads between communities helps everyone stay on the same page, and helps keep the audience united.

There are also marketing advantages to providing cross-play. Customers who are pleased with the arrangement are more likely to recommend it to friends. And the relative rarity of cross-play may mean the press finds it a newsworthy business practice.

It can also accelerate sales, in the case of a gradual product roll-out (as is the case with NEO Scavenger). Knowing that one's preferred service will be supported in the future sometimes causes hesitation in customers, as they'd prefer to wait until the game comes out there. However, if purchasing the game now entitles them to a free unlock on their service of choice later, then there's no reason to wait. This was particularly the case with Steam users, as noted by the comments in NEO Scavenger's Greenlight page.

Lastly, there's a neat technical benefit to having cross-play enabled for one's game: redundancy. If something fails on one service, there's a way to get affected customers up and running again on a different cross-play service.

A recent example of this was when a new build was launched on Friday to both and Desura. The new version went live immediately on, but the Desura version was awaiting authorization over the weekend. Customers who wanted access right away simply used the connect feature to get the latest version at instead.

Why Not?

One of the first reasons most will think of against providing cross-play is that there's a potential hit to sales. This is probably true. Anecdotal evidence abounds for customers buying and re-buying the same game across different services as they become available. Some are accidental (e.g. losing a CD or key), others are intentional (e.g. wanting the game in their Steam library for convenience). In some cases, they're even points of pride for the customer ("I love this game so much I bought it five times!").

In the first two cases, the customer is repurchasing the game for their own benefit. In the third case, it's primarily benefiting the developer. With cross-play, the customer can still repurchase the game if they want to, so the third situation is unchanged.

The important thing to remember about the first two cases is that customers who don't want to repurchase the game no longer have to. It might cost the developer the price of a new (or discounted) copy, but it gains the developer some good will. And I'd rather have an existing customer's good will than their additional $10 or less.

There is the possibility of customers giving cross-play keys to friends, so I suppose that's a consideration. I'm not really one to police pirates, though. And I figure that if they're going to pirate the game, cross-play doesn't greatly change their ability over the DRM-free version I sell now.

Technical feasibility may be a bigger reason. So far, cross-play between Desura and has been pretty easy to implement. Desura has a workflow already setup for this, and adding a custom one to my site was entirely up to me. I want to do the same with Steam, though I may only be able to provide one-way unlocks (e.g. customers get Steam keys, but not the other way around) depending on whether Steam has any mechanisms in place for cross-play.

Furthermore, each new case creates an increasing amount of integration work, and it would likely grow out of hand pretty quickly. There are a lot of services out there. E.g. ShinyLoot, Amazon, Good Old Games, Impulse, Origin, Google, iTunes, Ubuntu Software Center, etc. They all likely have different capabilities.

As such, it may be the case that my cross-play plans are impossible across 100% of vendors, even if I had the manpower.

Selective Cross-Play

In all likelihood, 100% reciprocal cross-play will not be possible as I offer NEO Scavenger in more places. So I'll need to come up with an acceptable compromise. The alternative would be to ignore sales channels that didn't offer cross-play tools.

I already suspect Steam is a one-way street in this regard, and cutting out Steam from sales channels would be pretty dumb. In fact, I think most online storefronts lack Desura's "connect" feature, so I may be living in a fantasy land right now.

One thing I can still do, however, is enable cross-play from some other services. I.e. if they purchase at my site, they can get free keys to Steam, Desura, and maybe another service to two. More than likely, Steam and Desura are all most players care about anyway.

Plus, offering the one-way cross-play radiating out from makes it somewhat more attractive to buy it there. The customer could still get a version on the service they want, plus elsewhere in the network. And more of the purchase money goes into developing new games. That seems like a nice win-win.

I'll have to see how this plays out, though. So far, I've been pretty lucky in that my only other vendor, Desura, makes this cross-play easy to do. So I've been treating it as my preferred business practice. But as I start working with new vendors, I'm see that the cross-play option is going to be complex to support.

Hopefully, I can find a way that is both sustainable and appreciated by my customers!