It’s hardly a secret that the sleeper story for big data this year is the cloud. The latest Ovum survey of cloud spending among enterprises globally shows, conservatively, that nearly half are boosting their cloud spend this year. And for big data and Hadoop, we estimate that roughly a third of all implementations are currently heading for the cloud.
For data platform providers, the cloud is becoming the big leveler in the market. You can lift and shift workloads and run the cloud like it’s an extension of your organization’s data center. But for many organizations, the cloud provides the opportunity for them to rethink their platform strategies.
In the cloud, elasticity changes not only how you manage workloads, but what workloads to run, because cost replaces capacity as your criterion. The cloud is also conducive to rethinking how to manage applications, as containers supersede VMs. And when it comes to databases, the scale-out nature of cloud infrastructure provides the opportunity to rethink how ACID transactions are managed, how data gets replicated, and how databases are recovered.
The cloud provides the ideal environment for frenemies. Amazon and Microsoft have extensive ecosystems of third parties who run their applications and databases in the cloud; but both also have their own native platform offerings that compete with many of the third parties that they host. And by the way, the same goes for Google Cloud.
For enterprises, the cloud was originally a tactical choice for running test/dev workloads or enabling line of business organizations to bypass the data center for discretionary or greenfield applications. But increasingly, the cloud is becoming the destination for heartbeat workloads, and that raises the competitive stakes.
And when enterprises look at migrating the applications (and databases) that are core to the business, it prompts the “lift and shift” vs. “lift and transform” debate. Should you choose the latter route, the question then is whether to move your existing environment or look to the native services of the cloud platform provider itself. In other words, if you decide to move to Amazon, should you go with their database stack as the default because of the native tie-ins with AWS’s storage, networking, messaging, and so on?
No wonder there’s an escalating rivalry between Amazon and Oracle. Amazon reimagines PostgreSQL for the cloud as a “good enough” replacement for Oracle. Then Oracle counters with pricing strategies aimed at keeping Oracle databases within the Oracle Public Cloud, which it pitches as a more advanced reinvented cloud.
And so for partners who host their cloud services on AWS, Azure, or Google Cloud, the challenge is proving the value of a familiar strategy: best of breed. More to the point, what unique advantage does your database, tool, application, or analytics service provide that the native offerings of AWS, Azure, of Google Cloud don’t?
Snowflake is one such data warehouse provider that runs on Amazon and competes with Redshift. It’s a bit of an irony that Snowflake’s data warehouse one-up’s Redshift in one key respect: elasticity. It uses Amazon S3, Amazon’s cloud object storage that is optimized for storing large volumes of data. Snowflake, which has received its share of love in this space, gets around the balky performance of S3 (which was not designed for interactive query) with a caching layer.
By comparison, Redshift, relies on local, and costlier EBS block storage, which is designed for low latencies. Now, Amazon does have an answer for ad hoc query and analytics on S3: Athena. But Athena at this point is meant for isolated queries, not as a production engine.
So, if you have a means for buffering against S3, there’s another major advantage to using it: You are working off a single source of the truth, and not a replica. So if your organization is already S3 for running Spark or machine learning jobs, it can use the same body of data for Snowflake production analytics jobs. No need to worry about whether you’re working separate versions of the truth.
Snowflake’s other differentiators include its ability to query JSON document data without having to resort to the usual practices for SQL relational data warehouses of flattening these complex documents into single wide columns. In so doing, Snowflake can introspect JSON documents, as long as their nested structures are reasonably strongly typed. And a more recent enhancement, as reported by Big on Data brother Andrew Brust, is a multi-clustering capability that further exploits the scale of the cloud to support higher concurrency.
Snowflake recently closed a fourth venture round that doubled its financing to over $200 million. With the release of the multi-cluster feature last year, Snowflake’s elastic cloud data warehouse platform is pretty well fleshed out. With the new round, the focus will be expanding its market reach outside North America. Snowflake has already carved an initial European foothold in AWS’s Frankfurt, Germany cloud region, and from there will add new regions in EMEA followed later with planting its flag across Asia/Pacific.
Snowflake’s challenge is very much akin to providers like Informatica and Tibco, both of whom withstood consolidation of enterprise software markets that found many of their rivals getting absorbed by the likes of IBM, Microsoft, Oracle, and SAP. For cloud data warehouses, it will be showing a value proposition to counter the “nobody gets fired for buying Amazon” mindset. At least it’s entering the battle with much deeper pockets.
Cloud conquered: How far can businesses push their use of on-demand IT?