Moving Away from AWS

When Amazon began their project to act as the world’s online all-in-one shop, they knew they’d need to build one hell of a data center operation to cope with the demand. And they did it. Quite brilliantly, in fact.

Then they realized that by building worldwide data centers sufficient to cope with the worst of the peak demand (think: Christmas), they’d inevitably be overbuilding, leaving 95% of the capacity free most of the time. Why not do something with all that excess data center capacity…like, rent it out to other folks?

That, so the legend goes, was the genesis for what became known as Amazon Web Services (AWS), which has now grown to encompass countless services and computers spread across numerous connected data centers around the world. Their services now power everything from e-commerce to Dropbox to the Department of Defense. Indeed, if AWS ever does suffer one of their very-rare outages (the last I recall was a brief outage affecting their Virginia data center a year or so ago), it brings down significant parts of the internet.

We became a customer of AWS almost a decade ago, to help us serve up the installer images and picture disks in ComicBase over their “S3” (“Simple Storage Solution”) platform. Then, when I made the decision to move my family to Nashville and we had to split the IT operations in our California office, we decided to move our rack of web, database, and email servers up to Amazon’s cloud. AWS promised to let us spin up virtual servers and databases–essentially renting time on their hardware–and assign as much or as little resources as it took to get the job done.

It took us about a month to get the move done, and it was terrifying when we turned off the power to our local server rack (it felt like we were shutting down the business) . But to our great relief, we were able to walk over to a computer in our office outside our now-silent server room, fire up a web browser, go to www.comicbase.com, and see everything working just the way it should, hosted by Amazon’s extraordinary EC2 (“Elastic Compute Cloud”) and RDS (“Relational Database Services”). After a few weeks of making sure all was well, I and my family packed ourselves into a car, drove to Nashville, and the business carried on the entire time. We were living in the future.

So why, 3 years later, did I just spend the better part of a month moving all our infrastructure back down to our own servers again? Basically, it came down to cost, speed, and the ability to grow.

Bandwidth and Storage Costs
S3 — storing files up on Amazon’s virtual drives — is pretty cheap; what isn’t cheap is the bandwidth required to serve them up. If you download a full set of Archive Edition installers, for instance, it costs us a couple of bucks in bandwidth alone. Multiply by thousands, and things start adding up. The real killer, however, was the massive amount of web traffic caused by the combination of cover downloading and serving up image requests to image-heavy websites like ComicBase.com and AtomicAvenue.com. In a typical month, our data transfer is measured in the Terabytes–and the bandwidth portion of our Amazon bill definitely had moved into “ouch!” territory.

We were also paying the price for the promise we’d made to give each of our customers 2GB of allocated cloud storage to store database backups. When we were buying the hard drives ourselves, this wasn’t a super expensive proposition. But when we were now renting the space on a monthly basis from Amazon, we wound up effectively paying the price of the physical hardware many times over during the course of a year.

The Need for Speed
Our situation got tougher when we decided to add the ability to have ComicBase Pro and Archive Edition automatically generate reports for mobile use each time users saved a backup to the cloud. This let us give customers the ability to always have their data ready when they viewed your collection on their mobile devices, without needing to remember to save their reports ahead of time. It’s a cool feature–one which I use all the time to view my own collection–but it required a whole new set of constantly-running infrastructure to pull off.

Specifically, we had to create a back-end reporting process (“Jimmy” — after Jimmy Olsen, the intrepid reporter of Superman fame). Jimmy’s job is to watch for new databases that had been backed up, look through them, and generate any requested reports–many for users with tens of thousands of comics in their collections. Just getting all the picture references together to embed into one these massive reports could take 20 minutes on the virtualized Amazon systems.

Even with the “c4 large” compute-oriented server instances we wound up upgrading or Amazon account to, this was a terribly long time, and often left us with dozens of reports backed up awaiting processing. We could of course upgrade to more powerful computing instances, faster IO throughput allocations, etc., but only at an alarming increase in our already considerable monthly spend.

With terabytes of stored data, an escalating bandwidth bill, and all our plans for the future requiring far more resources than we were already using, it was time to start looking for alternatives.

Do it Yourself
When we launched ComicBase 2020 just before this past Halloween, we tried a very brief experiment in at least moving the new download images off Amazon and hosting them on a Dropbox share to save on the bandwidth bill.

The first attempt at this ended less than a day after it was begun, when I awakened to numerous complaints that our download site was offline, and a note from Dropbox letting us know that we’d (very quickly) exceeded a 200 GB/day bandwidth limit we hadn’t ever realized was part of the Dropbox service rules. (I could definitely see their point: they were also paying for S3 storage and AWS bandwidth to power their service–albeit at much lesser rates than us, thanks to bulk discounts they get on the astonishing amount of data they move on a daily basis). Unfortunately, there was no way to buy more bandwidth from Dropbox, so after one more day of, “maybe it’s just a fluke since we just launched” thinking–followed a day later by getting cut off by Dropbox again–we abandoned that experiment.

After a couple of days of moving the download images back up to S3 (and gulping as we contemplated the bandwidth bill implications), we wound up installing a new dedicated internet connection without any data caps, and quickly moved a web server to it whose sole purpose was to distribute disk image downloads.

Very quickly, however, we started the work to build custom data servers, based off the fastest hardware on the market, and stuffed full of ultra-fast NVMe SSDs (in RAID configuration, no less), as well as redundant deep storage, on-premise storage arrays, and off-premise emergency backup storage. All the money for this hardware wound up going on my Amazon Visa card, and ironically, I would up with a ton of Amazon Rewards points to spend at Christmas time, courtesy of the huge hardware spend.

After that began the work of moving first the database, then the email, web, and FTP servers down to the new hardware. I’ll spare you the horrific details here, but if anyone’s undergoing a similar move and wants tips and/or war stories, feel free to reach out. The whole thing from start to end took about 3 solid weeks, including a set of all-nighters and late-nighters over this past long weekend to do the final switch-over.

As of this morning at 2AM, we’d moved the last of the servers off of Amazon’s cloud, and are doing all our business once again, on our own hardware. Just before sitting down to write this, I scared myself silly once more as I shut down the remote computer which had been hosting ComicBase.com and AtomicAvenue.com on Amazon’s cloud. And once again, I started to breathe normally again when I was able to successfully fire up a web browser in the office and see that the sites–and the business–were still running: once again on our own hardware.

So far, things seem like they’re going pretty well. The new hardware is tearing through the reporting tasks in a fraction of the time it used to take; sites are loading dramatically faster; and the only real technical issues we encountered were a few minor permission and site configuration glitches that so far have been quickly resolved.

Unless it all goes horribly pear-shaped in the next few days, I’ll be deleting our Amazon server instances entirely. While I’m definitely appreciating the new speed and flexibility the new servers are giving us (and I’m looking forward to not writing what had become our business’ biggest single check of each month), I still have to hand it to the folks at AWS: you guys do a heck of a job, and you provided a world class service when we needed you most. I also love that a little Mom-n-Pop shop like ourselves could access a data center operation that would be the envy of the largest corporate environments I’ve ever worked in. With the incredible array of services you now provide, it wouldn’t surprise me in the least if we didn’t wind up doing business again in the future.

Attack of the Script Kiddies

For the past few weeks, we’ve been engaged in a big move of our servers back down from the Amazon cloud to on-premises servers. While Amazon runs an amazing service, the bandwidth bill for ComicBase is a killer, and we can afford to throw way more processing power and disk storage at it if we simply buy the hardware than if we rent it from Amazon. By using on-premise hardware, we get to go way faster, way cheaper, and keep more control of our data.

Although I’m quite looking forward to not writing my largest single check of each month to Amazon, Running your own gear means running your own data center–with all that entails. Namely, you’re completely responsible for everything from backups to firewalls to even power. (I used to keep a generator and set of power cords at the ready back in California for when our infamous “rolling blackouts” would hit, in order to minimize server downtime).

On the backup front, we’re actually improving our position, using multiple layers of RAID, traditional disk backups, and off-site cloud storage. Basically, even if the place burns to the ground, we should be able to pick up the pieces and carry on pretty quickly.

What really gets old, however, is dealing with the network security foo. Unless you’ve run a site yourself, it’s hard to believe how fast and frequent the attacks come on every part of your system, courtesy of our friend the internet.

Mind you, these are not, for the most part, targeted attacks by the sort of ace hackers you see on TV and movies. Instead, it’s a constant barrage of “script kiddies” — drones and bored teens using automated “hacking” tools to assault virtually every surface of a publicly facing server using the computer-equivalent of auto-dialers and brute-force guessing.

Whether it’s the front-facing firewall, web sites, email servers, or what have you, looking at the logs shows that mere hours after the servers went live, they were being perpetually pounded with password-guessing attacks, attempts to relay spam, port scans, etc. None of these stood a chance in hell of succeeding (sorry, kiddies, the password to our admin account is not “password”) but it was amazing to see how quickly “virgin” servers, on new IP addresses, started getting pounded on. In one case, we started seeing automated probes of a server before it had even gone live to our own production team!

All this is to say that it’s a jungle out there, folks. For heaven’t sake use decent passwords (a good start: don’t let your password be any word that’s in a dictionary); change the default account passwords and user names for all your various networking hardware, don’t re-use passwords from system to system, and look for a good password manager to keep them all straight for yourself (I’m personally partial to 1Password, although I got hip to that program before they switched to a monthly billing model).

And yeah, watch those server logs. Most of the script-kiddie attacks are about as effective as the robocalls which start with a synthesized voice claiming, “HELLO, THIS IS IRS CALLING. YOU ARE LATE IN MAKING PAYMENT.” But we’ve also seen some more sophisticated attacks employing publicly known email addresses, names of company officers and more. Bottom line: watch yourself when you’re on the internet, and realize the scumbags are always looking for targets. Don’t make it easy on them.

TV Man: My Favorite Weird News Story (Art Project?) of the Year

The Funniest Thing I’ve Ever Seen From Congress

Possibly the first time I feel like I’ve truly gotten my tax dollar’s worth in pure entertainment:

Not only is this just an amazingly funny takedown of a breathtakingly stupid piece of proposed legislation–but it also introduced me to what is apparently a whole line of quite–umm–striking work from artist Jason Heuser, whose modern-day masterworks include:

George Washington wielding a mini-gun!
Bill Clinton, Lady-Killer!
Teddy Roosevelt Taking Down Bigfoot!
And George W. Bush with Twin Revolvers, Riding a Shark!

And here, of course, is the patriotic image which started it all:

Ronald Reagan, Riding a Velociraptor, firing a machine-gun, with a rocket launcher on his back.

Check out Jason’s Etsy store here:

Breaking HTTPPostAsync When Debugging in IIS Express, or “Wasting 5 Hours in Programming’s Version of a Really Crummy Escape Room”

It’s 3 AM, the day after Daylight Savings Time threw everyone’s internal sleep clocks into absolute chaos. (I say “chaos” based on both my own personal feelings, as well as the flood of fire service calls we’ve had today, including an overdose, a suicide attempt, and numerous other ways that our local residents have signaled their general lack of fervor at the idea of getting up tomorrow).

Worse yet, had it not been for the time change, I could have started this blog post with “It’s 2 AM, and the fear is gone” — and my opening would have been much cooler. Now I’m blaming Daylight Savings Time for writer’s block too. Way to go, DST.

But nevertheless, here I am, writing a pretty darned geeky blog with the hopes that some poor schmoe might stumble upon it in a session of mad Googling and save themselves some of the five hours I’ve just blown on one of the more painful programming pitfalls I’ve managed to stumble into in recent memory.

As part of a general modernization of ComicBase’s web APIs, we’re testing out a new set of calls to our servers which locate all the items you’ve sold on Atomic Avenue and let you deduct them from your inventory–as well as (minor spoilers here) finding all the comics you’ve scanned with the app while you’re out in the real world and which you now want to add to your desktop database.

Since it’s incredibly helpful to be able to watch the action on both the client and the server side of things when you’re doing work like this (and since it’s considered presumptuous for the programmer to set breakpoints on the production server which would stop the site cold), I’ve been working with a local copy of the ComicBase.com and AtomicAvenue.com sites, running under a development version of the web server software called “IIS Express” . Things had been going well, and I was watching the program carefully validate the user’s credentials, look up their databases, get the right data and post it back to the user–all the while checking for all the jillions of things that could go wrong in terms of bad passwords, invalid user accounts, lost network connections, and just about any other simulated problem you can imagine–trying to make sure we handled them all as gracefully as possible.

It’d been a long weekend on this project, but as I say down around 10 to finish things up, I was feeling pretty good about my chances to knock off early, grab a beer, and maybe even check out that crazy Polish cyberpunk video game I’d started a while back (Observer). All I really had to do was step through the different cases in the debugger, make sure they were being handled right, then remove the breakpoints and watch the whole thing run at speed to get a sense for how the system would feel in real use.

Everything was going well, but as I started tidying up and removing my breakpoints, breakpoints, suddenly I started getting bad data back from the web requests which were rock solid mere moments earlier.

So I put the breakpoints back and started single-stepping through them, puzzled all the server calls came back exactly as expected–only to give 404 errors moments later when I let them run at speed.

That’s when the night started to blur into one long slog which resembled nothing so much as an escape room whose puzzles had been planned by a madman. I’d check the code, it would behave. I’d set a breakpoint for a couple of lines after the call completed, and it’d work. But if there was ever a case where two web calls in a row fired off, the second one would always fail.

“OK”, I thought… It’s probably some sort of thread issue, which seemed all the more plausible that any call that I waited even a couple of seconds on before proceeding to in the debugger would run normally. Unfortunately, chasing down problems like this–whether they’re thread deadlocks or inadvertent calls to non thread-safe libraries–are a royal pain in the tucchus to track down.

The hours went by as I double-checked that all my async calls were properly awaited, that I hadn’t accidentally blocked them by calling “.result” at the end of any methods, and so on and so on with all manner of obscure programming lore. This was followed by endless googling on StackOverflow to see if anyone else had a similar problem or could suggest answers.

I tried removing the asynchronous calls; I tried marking all the relevant async calls with ConfigureAwait (False) to help them keep their context straight; I even tried rewriting all the HTTPClient calls in the old-style WebClient mode which allowed me to get rid of the mere idea of anything being asynchronous at all. Sure it’d mess up system performance and make the app seem slower to users, but as the clock edged past 2 AM and all the Fiddler packet traces in the world showed nothing useful, I was willing to try darn near anything to make some progress.

But even rewriting the whole set of web calls to be fully synchronous using the ancient WebClient routines was getting me nowhere. They ran great in the debugger, but immediately returned 404 errors when run without breakpoints set. What the living heck was going on?

So then–as much to make my Fiddler traces make more sense if I had to post the whole thing up on StackOverflow in the hopes that someone smarter than me could figure it out, I decided to move the new routines up to our production server and get a trace of them running there.

And they worked.

Perfectly.

With no debug points set.

Over the next several minutes, many curses were muttered as I leaned on the Ctrl-Z (Undo) key and watched the last several hours of my typing undone, block by block, until I was basically back where I was when I sat down to work tonight. The only real difference was that the code I was using to call the routines was pointing to the real server, running the real version of IIS instead of the IIS Express running on my development system.

And the whole darn thing was working right.

Sooo… what did we learn here? Well, there’s apparently a strange glitch in the behavior of the various web pieces of the Microsoft web client framework which keeps repeated calls to those routines from resolving properly when used on a Microsoft Visual Studio 2017 session on IIS Express. Basically, if you’re going to use the local server to debug, something may not resolve quite as fast as it should when it comes to the web calls, and if your calls start stacking up, you might want to try either slowing down your debugging, or moving some of the critical pieces to their final homes and testing there before you give up.

I also learned a lot of ways not to solve this problem, which has its own sort of value to programmers. And I would up learning about 4 entirely different techniques for making web post calls–all of which blew up in exactly the same way when run at speed on the development system. In a way, that’s what made me suspect that the problem may not have been purely code-related at all.

I also learned that I truly detest Daylight Savings Time. And now at 3:55 am, I am absolutely going to bed.

Aww! My first bomb threat–Spammers are soooo cute!

From my email yesterday:

From: Riley Mitchell <Marcel@virtualfirefox.com>
Sent: Thursday, December 13, 2018 11:59 AM
To: sales@comicbase.com
Subject: Rescue service will complicate the situation

Hello.  My recruited person carried a bomb (Hexogen) into the building where your business is located. My man assembled the bomb according to my instructions. It  is small and it is hidden very carefully, it can not damage the building structure, but in case of its explosion there will be many wounded people.

My man is watching the situation around the building. Ifany strange behavior, panic or policeman is noticed the bomb will be exploded.

I can call off my recruited person if you make a transfer. 20’000 usd is the price for your life. Pay it to me in BTC and I guarantee that I will withdraw my recruited person and the device will not detonate. But do not try to cheat- my guarantee will become valid only after 3 confirmations in blockchain.

My payment details (btc address)-1CF9VQhwjJutPxwVq5QLFA7j7baq4RDb3w

You must pay me by the end of the workday. If the workday is over and people start leaving the building the device will detonate.

This is just a business, if you don’t transfer me the bitcoin and the bomb explodes, next time other commercial enterprises will send me more bitcoins, because it is not an isolated incident.

To stay anonimous I will no longer enter this email account. I check my  wallet every 25 min and if I see the money I will order my man to get away.

If an explosion occurred and the authorities read this letter: We arent the terrorist society and dont assume any  liability for acts of terrorism in other places.

Standing strong with heroic resolve in the face of this terrifying threat, I refused to negotiate with this international criminal mastermind. It was a tough decision, but these are the times where it’s critical to show these thugs what Americans are made of. 

I’m sure you’ll all be relieved to know that the building (my house) still stands, although frighteningly, we did lose Bob the Minion, one of our beloved inflatable Christmas decorations, whose lifeless body was found in the center of our lawn this morning.

I’m sure many in the sleeping outer world must think Bob’s death was a mere blower malfunction of a four year old $30 Christmas decoration. But we can now reveal that his demise was almost certainly an orchestrated hit job to let us know that These Men Were Serious.

But even as we mourn the loss of Bob–a faithful employee whose warmth and goofy appearance always brought a smile to all who knew him–we owe it to his memory to remain steadfast, and to never give in to shadowy forces such as these which would terrorize innocents in search of financial gain.

The Surveillance State of the Web (And A Better Alternative to Chrome?)

surveillance-1-fotoliaI don’t know about the rest of you, but I’m getting increasingly creeped out by the constant surveillance that we’re all under, particularly in the Google/FaceBook environment. To summarize but a few of the highlights:

  • Your physical movements are being constantly noted and aggregated via your phone’s GPS and location services–even when you seemingly opt out of those services.
  • Even if you’re not logged in to Google or FaceBook (or Amazon, or Microsoft), the tech giants are actively tracking your movements around the web by the use of omnipresent ad tracking cookies which their ad networks serve up on almost any major site you visit.
  • Every time you sign in to a site using social sign-on (that “Log in with FaceBook” or “Log in with Google” code that so many sites–and yes, even ComicBase and Atomic Avenue–use to allow you to avoid adding yet another password to your no-doubt huge list) your login is noted on FaceBook or Google’s servers–and the IP and tracking cookie information allows them to know that you’re the same person who visited any site where one of their ads appears–and now that formerly anonymous site usage is tied to a verifiable identity.
  • Everything you say within range of your Smart TV or Alexa speaker can be recorded and saved on their servers when they’re activated.
  • Every search you type in a search engine or browser is recorded, logged, and aggregated–along with your IP and device information.
  • Every time you call out “Hey Google” or trigger Siri, Cortana, or Bixby, your voice and search are recorded and stored.

And as if this wasn’t enough to complete the panopticon of your life, Google Chrome, the dominant web browser in the world today, recently released a change which automatically logs you in to help “sync” your information between devices.  Of course, keeping all your bookmarks current is the visible benefit to you–but firmly establishing your identity and correlating everything you do while using a web browser or mobile device is the true benefit to Google.

Unless you’re a noted crime figure, it’s likely you’ve never been under anything like the level of surveillance that you’re under today, courtesy of our web browsers and our smartphones. Note also that it’s enormously difficult to escape any of it, even using tools like VPNs, since there are so many redundant mechanisms and tripwires scattered around the web–all with the common mission of aggregating who you are, where you are, and recording as much of what you do and where you go as possible.

And it’s worth noting that all this is just the stuff that we personally signed up for by installing our various tech gadgets and apps, and by clicking through the “Agree” buttons on all those end-user licenses we never read. At the same time, actual law enforcement (and even private companies) are participating on a more global scale to monitor our every movement using everything from both fixed and roaming license-plate scanners and facial recognition, easy-pass toll devices, car GPS transponders, and wholesale processing of entire networks of internet and cellular data — no search warrants required, so long as a particular individual isn’t being targeted. All the information just sits there until it’s needed, which ever-cheaper storage ensures can be a long time indeed.

A Few Countermeasures

To start at the end: I don’t believe there’s a practical way to keep any real sort of privacy in today’s world, but there’s much you can do to at least staunch the river of information you’re constantly sending to the tech companies. Call it pure cussedness on my own part, but even if the battle to escape a surveillance state is a losing one, I see no reason to staff up my own Stasi command post with the personal mission of spying on myself.

It’s been months since I uninstalled the constantly-snooping FaceBook app from my mobile devices, and try to make sure I “log out” whenever I (more and more rarely) visit the FaceBook web site. But this is merely Orwell’s equivalent of blocking the hidden camera in the television when Big Brother has dozens of other listening devices hidden around your house, as well as thousands dangling from street lamps.

Also, do yourself a favor and take a trip over to the Google Privacy Settings and FaceBook Privacy Settings–particularly the deeper “Profile” and “History” sections. After you get over the shock of seeing that both sites can meticulously trace that road trip you took in 2015 down to pictures of the lunch you had at that out-of-the-way cafe, do yourself a favor and delete it all, and turn off as much of the tracking as you can. Then come back every month or so and do it all over again, as you’ll discover that any number of things you did–as simple as putting in a direction request in Google Maps, or buying concert tickets to a show–will continue to add new information to your personal dossier. I’d personally never assume that anything deleted is gone forever–backups often exist, after all–but it’s a start.

Log out of Google, Bing, and other search engines whenever possible, and turn off their “sync” options. Yes, it’s less convenient to check your favorite news sites this way, but remember that–as currently implemented–anything you sync, like your web history–is also synched to Google’s servers.

(And speaking of Google’s servers: all those saved passwords are being backed up too–and the passwords to your local wifi networks are apparently saved as clear-text on Google’s servers. Even when data is encrypted, however, it’s a safe bet to assume that the people doing the encrypting have a copy of the keys.)

 

Giving the Brave Browser a try

brave-logotype-full-color

Since the recent Chrome sign-in fiasco (which Google is currently backing away from slightly), I’ve decided to see what else I can do to stem the flood of personal information to the Silicon Valley tech giant. On a recommendation, I recently gave the Brave browser a shot, and I like what I’m seeing so far.

Built built a crew led by ex-Mozilla chief and Javascript inventor Brendan Eich, it’s a browser that embraces the clean design of early Chrome, while combining it with very smooth and user-controllable privacy settings which seem to do an excellent job of blocking intrusive ads, tracking cookies, and the like. Best of all, by getting rid of all this surveillance foo, it seems to load and display pages noticeably faster than any of the more established alternatives like Chrome, Safari, and Edge.

Eich and the crew over at Brave also seem to be rethinking the whole online ad ecosystem. Since the ability to block ads also threatens to undercut the financial basis that supports the sites you use, they’re trying to rebalance the financial incentives by letting you directly support sites you visit using cryptocurrency-based “Basic Attention Tokens” or BATs, which act to funnel your voluntary donations to the sites you view the most. I’m not sure how I feel about the whole scheme at this point (and I’m more than a little skeptical of cryptocurrencies in general), but I do appreciate that the Brave crew is thinking about the overall problem, and I applaud their view that we ought to be moving beyond the place where we, the web’s users, must effectively become the product to be sold in order to provide all the great “free” new and information the web provides.

For now, however, I’m giving Brave a spin, and so far I’ve been impressed enough to make it my default browser on both my desktop and mobile devices. Here’s a decent video review of the whole thing by ThioJoe.