January 20, 2016 draft infrastructure

'(Bare Metal vs The Cloud)

Audience: devops

Only you can decide if the cloud is flipping you the bird.
title

Most of us start our fledgling apps in the cloud, on EC2 or Linode or DigitalOcean or ASmallOrange or whatever. With Membean, we’ve bounced around on all of those. But we eventually ended up in real data centers, for a lot of reasons.

TL;DR

I’ve had servers spread around several VPS providers for the last eight years. The last four years have been mostly colocated on hardware I’ve purchased, and that HW has served us well. Granted, a lot of my time (and thus salary) is spent on taking care of machines, but it’s not much more (if any) than we’d be spending if we had stuck fully to the cloud. I think it’s a lot less.

Cases for the VPS Cloud

  • You’ll need multiple servers before long, for scale and redundancy. So, early on, two little VPSs are much cheaper than two physical servers.

  • It’s really easy to create nodes through a web UI.

  • Don’t need to know much about data centers.

  • In some cases, theoretically cheaper if you spin down servers, and your off-peak hours are long. I believe the spin-down-to-save model is unique to EC2. A Droplet incurs the same price whether running or powered off.

  • Backups are simple. Just check a box when you create your Droplet, and you’re probably safe enough.

War Stories (Beware the Cloud)

Linode was down for days.

I have seen more than a few emails from DO and Linode saying that a node has been rebooted due to an unknown issue. If these had been production systems, OUCH!

There are times where CPU has spiked for no apparent reason.

CPU availability may be unpredictable.

VMWare Lock-in

I’ve stayed away from virtualization, particularly proprietary offerings. Running servers is expensive enough as it is.

I don’t see what the big gains are from virtualization for SMBs. If hardware is going to fail, it’ll wipe out the whole machine and all its virtualized nodes.

It’s yet another thing to understand and monitor. Plus, there are potentially some performance costs.

Cost

I’m not convinced that paying 10x for a RHEL licensed Megacorp Certified System gets you much more than a more standard off-brand setup.

You can find beefy servers for as low as $1500.

Performance

Latency

Nodes won’t be right next to each other. Maybe there are tactics to control this, but I haven’t found many yet.

Risks

DDoS is always a concern, whether Cloud or Data Center. After moving to a DC, we got hit hard enough by a DDoS that it took down the whole data center. That was the worst outage we’ve had, and resulted in 8 hours of downtime. We never found out why we were targeted, but learned some good lessons from it.

I actually think we’re less likely to be impacted by DDoS in a DC. It seems to be the bigger players, like Linode and Amazon who get targeted. I hope DO is prepared for what happened to Linode. The DCs we’ve been in have few high-profile customers in them. So a whole network outage seems less likely.

Common Objections

“The cloud gives me snapshots so I can just build up one server and then duplicate them from the single good snapshot.”

You’ll still want to get your setup into a configuration management system, like Ansible or Chef or Puppet.

“The cloud takes care of monitoring for me.”

You’ll still need to come up with monitoring solutions, like monit, monitorix, nagios, etc.

“I’m not comfortable running servers myself.”

That’s a valid concern, but you need to learn a lot about them anyway, even if you’re in the cloud. The cloud does kind of save you from dealing with HW failures, but the still certainly happen in the cloud.

And, failures are often not something you have to handle all on your own. The DC folks are very knowledgable about networking issues, and swapping out failed disks is something they do a lot of.

“But I don’t live anywhere near a good data center.”

We started out in a nearby DC thinking that we’d want to get inside when horror struck. That lasted a year before we looked outside to find a better DC thousands of miles away. It’s possible to do everything remotely. Remote hands are pretty cheap, usually around $25/hr.

I’ve had them replace disks, rebuild machines, install OSs, reconfigure RAIDs, figure out my network blunderings, and much more.

They’ve even tuned network routes so that our cross-DC transfers are much improved.

Downsides