Monday
May192014

Is the Big Data Backlash Real?

2014-28-April-lost-luster.jpg

(First published on CMSWire)

Earth be still.  Big data has lost its luster.

Could it be that analyzing terabytes, exabytes and zettabytes of information won’t make us smarter … or, even worse, could it make us wrong?

We’re beginning to see headlines like “Google and the flu: how big data will help us make gigantic mistakes” in the Guardian, “Eight (No, Nine!) Problems With Big Data” in the New York Times and “Big Data: Are we making a big mistake?” on author and Financial Times columnist’s Tim Harford’s site.

If you believe what you read, then big data isn’t the ticket that we once thought it was.

Or, maybe it still is, say a whole host of others. They’ll likely point out that “big data” is simply resting in Gartner’s “trough of disillusionment” at the moment, because, as with most new technologies, the number of failures outweigh the number of successes early on.

So, if you buy Gartner’s theory, we’ll slowly but surely, learn to do big data better, climb out of the trough and onto the Slope of Enlightenment where it will become more and more embraced by the mainstream.

Big data will present us with tremendous new insights, just not quite yet.

Is the Trough an Illusion?

If the big data naysayers and Gartner’s theory are right, then you’d think that vendors and industry leaders would be experiencing less demand. But that doesn’t seem to be happening.

Could the media be looking at a few isolated cases versus the market as a whole?

We asked managers at Alpine Data Labs, Alteryx, Birst, Cloudera, Datameer, ElasticSearch, GoGrid, Metric Insights and Zettaset whether sales are down and if, based on their interactions with customers, they thought big data was a bust, oversold or stuck in that trough of disillusionment.Here’s what they told us.

2014-28-April-groschupf

Stefan Groschupf, CEO of Datameer: "I believe we've already made it through the Trough of Disillusionment not only because we're seeing huge demand and adoption, but because we're seeing it from traditionally late-adopting verticals like life sciences, healthcare and manufacturing." 

2014-28-April-Vogt.jpg

Jim Vogt, CEO of Zettaset:  "Contrary to media reports that big data is dead, we have seen a definite upswing in demand for our Hadoop enterprise security solution since the beginning of the year. While 2013 was a time for many organizations to run Hadoop pilot projects and kick the tires, 2014 is shaping up as a year when companies move from pilot into production.  This also drives demand, as organizations scale up their deployments and begin to integrate analytics/BI applications and Hadoop together into a working ecosystem.  It also doesn’t hurt that we provide enterprise customers with a sophisticated solution to protect their data, as security is becoming a top-of-mind concern for users of Hadoop.”

2014-28-April-mathew.jpg

George K. Mathew, President and COO of Alteryx: "We're seeing a steady growth of firms that want to take advantage of all their data. Hundreds of our customers are getting the value out of traditional and big data sources being blended together to perform advanced analytics — ultimately to drive the next best decision. The Alteryx product roadmap by the user experience of data analysts connecting against these sources, achieving scale in data processing, and unlocking the legacy trapped in SAS or SPSS. We believe that the variety of data sources that are blended and analyzed is becoming more important than the size of data."

2014-28-April-Saldich.jpg

Alan Saldich, vice president of Marketing at Cloudera: "The hype-reality gap around big data has narrowed considerably. Most of our enterprise customers spent the last few years investigating Hadoop and related technologies to find out what was possible. Those early investigators were really excited, as were the vendors, but the solutions just weren't ready in many cases to solve broad data management problems, so there was some disillusionment.

"At the same time, the technology has changed, and continues to change very quickly. The early adopters who were excited by the possibilities but ran into practical realities that made Hadoop difficult to deploy are now seeing innovation that overcomes those issues.

"Over the past 12 months, big data products built on Hadoop have changed radically. Across our customer base, the ones who pressed forward are actually deploying pretty amazing systems that have many of the capabilities that were missing a year or two ago. The press and analysts in many cases are behind the curve as compared with the large enterprises who are successfully deploying Hadoop at very large scale, and displacing investment in legacy systems because of the lingering skepticism that built up in 2012-2013."

Bruno Aziza, CMO at Alpine Data Labs: "The disillusionment you are referring is related to the fact that companies have spent millions of dollars on big data infrastructure but are still struggling to gain business value from those investments. According to recent reports, only 4 percent of companies can attribute better decision making to the use of big data.

"That’s not acceptable and they can do better. The companies we see succeed are the ones that have understood that analytics is 'big data’s killer app.'

"I don’t mean that they have armies of analysts building more pretty front-end and visualizations on top of large datasets. I mean that they have understood that strong data science at scale is the only way to gain value from ever-growing data.

"To win here, companies need to drive their attention to the “big data” analytical applications that grounded on reliable and scalable math. Doing so, they will create shortcuts and gain competitive advantage by further embedding this math into their processes and push it to the edge of their organization."

2014-28-april-Jen-Grant.jpg

Jen Grant, CMO of ElasticSearch: "We see some customers that have invested in 'big data' and are frustrated with the results. But big data was defined to them as storage solutions that scaled. I think a lot of the frustration in the current hype-cylce sets in because merely storing all of your data does not achieve what a typical business leader wants.

"They want to create a competitive advantage by being smarter than their competition. Capacity is only the first step.

The ability to search, analyze and visualize that data — in real time — and then put it in the hands of people that can take action is the true promise of 'big data.' We have actually seen an acceleration in interest in the ElasticSearch ELK stack because while people are growing a bit tired of the hype around Big Data, they quickly realize extracting value from environments like Hadoop is key.”

2014-28-April-peters

Brad Peters, Chairman and Chief Product Officer of Birst: "Although big data was once viewed as the golden child of tech, its bloom is fading in terms of the value that it is able to deliver all on its own.

"Not that long ago, the focus was on finding, capturing and storing data. Today the shift in focus is to unlocking the value from each and every piece of data we can uncover. When we combine 'big data' with the new generation of analytical tools, we move beyond simply capturing mountains of data and instead you can create actionable information that can be used every day to drive business value.

"Big data in and of itself doesn't drive value to your bottom line — it’s what you do with that data that makes the difference.

"Shifting focus from capturing to refining and presenting data in new ways so that companies can then leverage it to maximize productivity, boost revenue, drive sales, etc., is ultimately going to be the driver of competitive advantage and will separate the market leaders from those that can't keep the pace."

Marius Moscovici, founder and CEO of Metric Insights: "It's not a big data slow down per se, rather, the market has hit a transition point in how companies derive value from data. 

"We've hit the next stage in the business intelligence maturation cycle. Big data has become in essence a victim of its own success to the point where the everyday executive is completely bombarded with dashboards. This is why there will no doubt be further industry consolidation and the companies that emerge from the pack will have a renewed emphasis on pushing out the data that impacts companies most in the context of what is happening with a business versus focusing on visually appealing dashboards that don't get opened. Welcome to the era of BI 3.0."

Tony Barbagallo, vice president of Product Management and Marketing at ClustrixDB: "There are always early adopters for new technologies, and Big Data Analytics is no exception. From our vantage point as a SQL database vendor we have specifically noticed an initial flock of momentum to the NoSQL movement because it has garnered appeal from developers who like the freedom to develop in their language of choice.

Thumbnail image for 2014-28-April-Tony Barbagallo

Unfortunately, the very freedom they enjoy is creating undue complexities as they must essentially rewrite the SQL language to accomplish the more complex analytic queries that are needed to realize the information from their data warehouses. From our standpoint, while NoSQL technologies are finding their way in the big data landscape, scale-out SQL technologies like ClustrixDB are actually starting to see an uptick in demand.

In the end, the desire to analyze and act on large amounts of data will never wane, although the technologies required to accomplish those analytics easily and efficiently may change and in the end, I think they will all come back around to SQL.”

Thumbnail image for 2014-28-April-John Keagy

John Keagy, CEO of GoGrid: "Trough of disillusionment? We’re actually seeing a huge uptick and interest in Big Data solutions — especially if you can eliminate the steep learning curve and time-consuming task of deploying those solutions through automated orchestration. At GoGrid, the biggest Big Data demand is from our customers in the digital advertising, e-commerce, and healthcare verticals."

Is the Big Data Backlash Real?

Needless to say, it depends who you ask, but from where we sit and from the vendors’ perspective, interest in big data isn’t waning. What is waning is the hype; it’s being replaced by big data analytics and the Enterprise’s insistence that we make big data discoveries that will have impact.


 
Tuesday
Feb182014

Who Says 'Big Data Needs to Shrink to Grow'?

While most people were busy nursing their New Year’s Eve hangovers or getting busy with their resolutions on January 1, the New York Times ran a rather interesting headline: "Big Data Shrinks to Grow." We looked at it and said, really? That’s not our experience, but continued to read, anyway.

The article states that interest in big data is waning; the author bases his claim on the fact that Google searches for the term “big data” are no longer rising. We think that may be a pretty lousy basis for an argument; after all, it could be that the reason that searches for “big data” are down is because many people already know what it is. (As I stated in my year end big data wrap-up “If 2012 was the year your grandmother instigated big data conversations at the dinner table (yes, the 'buzz' around it actually was that big) then 2013 will go down in history as the year the enterprise began to make serious plans around it.")

We also hope that enterprises don’t craft their big data strategies via Google search. But enough about that.

The article also points out that Kaggle (a site that hosts data scientist competitions) has changed its business model from one that spans the marketplace to one that specializes in specific industries, starting with Oil and Gas.

Interesting? Yes, but a trend? We're not so sure. Hopefully data scientists, statisticians or even high school students with a little common sense will point out that one or two companies changing their business strategies does not a market trend make.

The article does bring up more interesting questions like: do data scientists and other professionals who work with big data need to have deep industry insight to deliver discoveries that warrant the costs of wrestling with big data in the enterprise?

It quotes Kaggle founder Anthony Goldbloom saying:

We liked to say ‘It’s all about the data,’ but the reality is that you have to understand enough about the domain in order to make a business. What a pharmaceutical company thinks a prediction about a chemical’s toxicity is worth is very different from what Clorox thinks shelf space is worth. There is a lot to learn in each area.”

Goldbloom makes a good point, but does this mean that data scientists who don’t have industry specific training can’t yield the returns that big data hype suggests?

If so, then the big data business may be in serious trouble because data scientists are rare enough, add another skill to their heavy list of “must have requirements” and we’ll not only have to wait for them to finish their post graduate training but to also get jobs and work for five years before they can add value.

The Experts Weigh In

Enough about what we think. We asked four firms that provide products and services to enterprises to comment on questions like: Does big data need to shrink to grow? and Do data scientists need to have industry specific experience to be worthy of their hefty price tags?

Here’s what they said:

On the question as to whether data scientists need to have domain knowledge to make cost justifiable contributions Sandy Steier, CEO and co-founder of 1010data said:

To add value to any industry, a person would presumably need a certain amount of both analytical expertise and domain knowledge. An interesting question is, is it better to start with domain knowledge and learn big data analysis from there, or is it better to start with analytical experience and then apply it to a new domain?

I believe the latter is the easier path to success. I'm sure some domain experts would want to emphasize domain knowledge, but in my 35 years of analytical experience and 14 years of growing 1010data, that's what I have seen pretty consistently. Certainly the normal path is to be schooled in analysis or programming, get a job in a specific industry, and then perhaps even change industries."

Byron Banks, Vice President, database & technology at SAP offered a different point of view:

We believe industry and domain expertise is essential for big data initiatives to succeed. Big data, like any new technology or IT trend, will only prosper if there is a direct link to quantifiable business results. Unfortunately there have been a number of cases where the technology is driving the project — organizations collecting lots of data because they now can, rather than first focusing on the needs of the business and then looking for the best approach, big data or otherwise, to support those goals. This has led to some of the hype-cycle criticisms regarding big data.

 

At SAP we are focused on helping customers to 'big data enable' their existing business processes and to go after new business opportunities. We believe the best way to get started in the right direction is to engage data scientists who know the customer’s industry, and can make the link between the business goals, potential data sources, and the IT organization and technologies. SAP has a team of data scientists, hired from industry, not IT, and we also rely on our business partners to ensure deep industry-specific knowledge is brought to every big data project we undertake."

On the question as to whether big data (the industry) must “shrink to grow” Stefan Andreasen, CTO and co-founder at Kapow Software, a Kofax company, said:

The answer to this all depends on how you define big data. If you mean that big data is only about data processing and analytics, yes, you might be right that the initial 'hype' is over and there will be a 'shrink' before we see the growth again.

However, if you look at big data as a whole new way to work with data, then it will only grow, not shrink.

For me the essence of big data is the need to work with more and more data sources, each spitting out ever changing data in ever changing formats, to find the right answer. For example like when shopping for an airline tickets, you go to more and more places, over many days, to secure a better price. That way of defining big data will only grow.

Said in other words, thinking of the 3 V’s of big data — Volume, Velocity and Variety — yes, it might be true the need to process more volume will shrink, but the need to get real time data from more sources will only grow.”

Michael Collins, head of product marketing at LucidWorks, offered a different answer:

The concern with big data isn’t the growth, but that organizations are focusing on the size and not working to rationalize how their data-driven applications bring forward to users the relevant information that is needed for useful decision making. Verticalization will not address the many various types of data such as financial data (as an example) that is developed daily. As more organizations realize that the first instinctual human behavior is to search, they will step back and re-examine their big data stack and understand how to capitalize on one option while addressing the main issue that has no capability of shrinking. Analytics and BI are rendered of less value when the data they are working with isn’t relevant — enter search."

Something Else to Consider

In my mind there’s still something else to consider. Exactly who are the “big data” experts and data scientists bringing expertise into the industry? As a headhunter, I know that for every nine posers there is only one who actually has the experience needed to deliver results.

To enterprise project owners and C-level executives I beg the following, Look at the resumes of the experts you’re bringing in to work on your big data initiatives (even if they work for consulting firms and went to good schools). What were they doing five years ago? When did they have the time and the opportunity to gain their big data and data scientist expertise? Are you handing them their first opportunity to analyze big data, to work with stats and algorithms since their undergrad days? (Fine if you are, but you should know that that’s what you are doing.)

Just because “experts” have trained themselves (or been trained) to talk a good game doesn’t mean they can play one. Sites like LinkedIn are full of discussions on “Answers to questions you’ll be asked on Hadoop interviews.”

Is There Really Gold in Big Data? Is It Worth Going After?

CEOs who have developed solid relationships with great data scientists and other professionals skilled in working with big data sets will answer those questions “Yes” without hesitating; but the key here may be that they’ve enjoyed solid business to scientist to big data pro interactions. It’s worth noting that what may be “interesting” to the scientist may not be worthwhile for the business and vice versa. We need these experts to talk.

What do you think?

Title image by fluidworkshop (Shutterstock)

Monday
Jul292013

Data Warehouse = Relic? What Cloudera, Microsoft, Pivotal, SAP, Teradata, WANdisco Have to Say

Originally posted on CMSWire

When there’s disruption in an industry, bold statements are made. And in an era of Big Data, we seem to be hearing them all the time.

Last week at the Cloudera Summit, we heard another one. Cloudera’s chief executive, Mike Olson, called on Enterprises to “unaccept the status quo” as it relates to data management. He preached the gospel of Hadoop:

For more than thirty years, the data management industry has relied on relational databases, dedicated expensive storage and other very expensive special purpose legacy systems,” he said. “While that approach was very powerful for decades, the accelerating tsunami of data arriving every minute, every hour, every day has begun to overwhelm those systems. Advancements in business analytics have been hindered by the relatively small fraction of the total data made available, and the cost to store, move and process it. Today, the existing standard approach is changing. The center-of-gravity of the enterprise data center is shifting — it’s moving toward Hadoop.”

And while few doubt that Hadoop can/will make a profound impact on how decisions will be made in the future, just how, how big and how soon is open to question. Vendors like EMC say they are “all in” on Hadoop — in fact, they are so far in that they've spawned Pivotal, a company whose mission is to build a new data fabric whose starting point is Hadoop.

Other vendors like Teradata, SAP and Microsoft say that Hadoop is important, but not so important that it overshadows other important technologies.

One of the great joys of reporting on technology is that we get to look at the big picture and watch things unfold; we ask questions and make generalized conclusions far more often than we take sides.

In this case, we've asked data management leaders about Hadoop, its significance and its future. Here’s what (comments in alphabetical order by vendor name) each of them have to say with regards to Hadoop and other data management/data warehousing technologies:

Notes: 1) We contacted a few other vendors for their opinions, but they did not respond. 2) We’ll be looking at Big Data databases in a future article which is why they aren't featured here.

Cloudera’s CEO, Mike Olson (from a prepared statement):

Storing and analyzing all (of) one’s data with old guard legacy systems doesn’t make sense economically or technically. Now organizations have a choice. They no longer need to make undereducated, on-the-fly decisions about what data to keep and what to offload, or business decisions with insufficient information.

Hadoop has fundamentally transformed the economics of data management, making it possible to choose to keep all (of) one’s data, without an exorbitant, ongoing investment in a cumbersome technology that can’t keep pace with the growth of data or the evolving needs of a business.

Cloudera (and its Hadoop-related products) is making it possible to store and manage all data — today — so organizations can leverage it whenever and however they see fit. This opens up new opportunities for data discovery and insights that were never before possible under the old paradigm. Welcome to the new era of data management.”

Microsoft’s Director of Product Marketing, Server and Tools Business, Herain Oberoi:

Technology will continue to improve, making big data more accessible to more users on the platforms of their choice. These improvements will help more people get actionable insights quickly, conveniently and more economically from their data.

Hadoop is both a compelling solution for analyzing unstructured data at low cost and a critical part of the big data ecosystem, and Microsoft has been working with Apache Hadoop project founding member and most active committer, Hortonworks, to deliver Hadoop based solutions (HDInsight) both on Windows and in the cloud on Windows Azure. The portability, security and simplified deployment of these solutions, as well as their interoperability with Microsoft’s award-winning business intelligence tools, create unique and differentiated value for customers.”

We're seeing rapid adoption with the Hadoop business growing 60-70% a year and we believe it will continue to grow in popularity. Apache Hadoop provides a great foundation for the next-gen data platform and we’ve leveraged it to add a proven, interactive standard SQL query layer: our innovative HAWQ technology. HAWQ is essentially a fully functional, high-performance relational database that runs in Hadoop and speaks SQL natively to deliver performance improvements of 50X to 500X as it helps customers gain insight from different types of data spread across multiple systems.”

SAP’s Director of Big Data, David Jonker

The 21st century demands new approaches to managing data, including the enterprise's data warehouse environment. Increasingly, enterprises will build logical data warehouses that virtualize data access from specialized data stores. At SAP we believe in-memory platforms, such as SAP HANA, will be at the center of the logical data warehouse, with relational databases, Hadoop and other NoSQL data stores acting as repositories and staging environments. While the traditional data warehouse needs modern technology, such as in-memory and columnar databases, rigorous data warehousing practices must continue to ensure the quality of mission-critical data, such as an enterprise's financials. Hadoop is complementary technology especially suited to supporting the work of data scientists.”

Teradata’s VP, Unified Data Architecture Marketing, Steve Wooledge

Teradata agrees that companies should "unaccept the status quo," and this applies across the whole enterprise data architecture, not just Hadoop, and not just data warehousing. This is the focus of our Unified Data Architecture vision, products, and services — integrating a best-of-breed architecture with Hadoop as a key component.

Hadoop is changing how data management is deployed and it is understandable why companies are excited. As with any new technology, the promise and "hope" of change is alluring, but to feed the frenzy without providing holistic, strategic guidance to customers is dangerous. If Hadoop is a hammer, then every problem looks like a nail and companies will walk away with imprecise architectures, which cannot meet stringent business service level agreements

At Teradata, we work with customers to incorporate Hadoop along with other workload-specific platforms (Teradata and partner technologies) in a seamless analytic environment across data storage, transformation, preparation, analytics, and operationalization in the business. It can be described with the metaphor of 1+1 = 3. No one technology can be optimized for every type of workload or customer use case. It is Teradata’s goal to help its customers to leverage all their data, by the effective deployment of transformational technologies that drive tangible business results."

WANdisco’s CTO and VP Engineering of Big Data, Jagane Sundar

The cost of storing data on Hadoop is orders of magnitude cheaper than any of the alternatives. The only thing preventing Hadoop from becoming the de-facto storage solution for all data is the lack of enterprise grade high availability and disaster recovery solutions. Companies such as WANdisco are focused on addressing these deficiencies. Once WAN-scope continuous availability and disaster recovery solutions are available for Hadoop, its widespread adoption for storage of all data is inevitable.”

It’s clear that all of our contributors agree that Hadoop will play an important role in the future; but how significant, in exactly what way, and how soon remains open to question; we’re not sure when that question will be answered.

After all, for every Netflix whose business and success is practically powered by Hadoop, there’s a consumer products company who went through the price and the pains of implementing it without gaining any insights they were willing to take action on. Those companies will be hesitant to go “all in” on Hadoop, at least for right now.

We’re at the point where the rubber meets the road … the point where Big Data delivers or doesn’t,” says Chris Taylor in a recent blog post on Wired. He adds that “Patience with experimentation will wear thin over the next year or so and there need to be more ‘everyday’ companies taking advantage of Big Data and talking about their successes. Big Data is headed for the Trough (of disillusionment) as long as there are more people trying than succeeding, and that’s where we are right now.”

If Taylor’s last point is true, the “status quo” won’t move forward in a Big Data way all that quickly despite what anyone says. CIO’s will have more of a “wait and see” attitude and their only compelling reason to move to Hadoop will be to save money. Though that’s a compelling reason, it may not outweigh the real or perceived risks of moving to a new technology which many don’t yet see as being “enterprise" ready. And, perhaps more importantly, it doesn't deliver on the transformative promises of Big Data.

Pivotal’s Chief Scientist, Milind Bhandarkar:

We're seeing rapid adoption with the Hadoop business growing 60-70% a year and we believe it will continue to grow in popularity. Apache Hadoop provides a great foundation for the next-gen data platform and we’ve leveraged it to add a proven, interactive standard SQL query layer: our innovative HAWQ technology. HAWQ is essentially a fully functional, high-performance relational database that runs in Hadoop and speaks SQL natively to deliver performance improvements of 50X to 500X as it helps customers gain insight from different types of data spread across multiple systems

Monday
Jul292013

Will Oracle Become a Bit Player? DataStax's Apache Cassandra Takes the Main Stage

DataStax Apache Cassandra takes on OracleOnce upon ago in the 1980s, before Microsoft Windows was invented, before IBM introduced the first PC, and more than a decade before Tim Berners-Lee introduced the World Wide Web, the first version of Oracle database Software went to market. 

The version was called version 2 because Larry Ellison and partners feared no one would want to risk using the first version of a product.

Since that time, Oracle has sold its products and services to 20 of the 20 top airlines, 20 of the 20 top automotive companies, 20 of the 20 top banks, 20 of the 20 top governments, 20 of the 20 top high tech companies, 20 of the 20 top insurers, 20 of the 20 top manufacturers, 20 of the 20 top oil and gas companies, 20 of the 20 top pharmas, 20 of the 20 top retailers, 20 of the 20 top telcos, 20 of the 20 top utilities …

We could go on, but the point is made, Oracle, and particularly its database management offerings that took root more than three decades ago, rule many of today’s leading Enterprises.

That Was Then, This is Now

“That’s problematic,” says Robin Schumacher, Vice President of Product Management at DataStax, a company which sells and supports an enterprise version of Big Data database Apache Cassandra.

“Oracle wasn’t built for today’s modern, data-driven applications,” he says. Some companies who continue to use it for their strategic applications may lose not only market share, but also the shirts off of their backs.

Consider that at the time Oracle was designed, information wasn’t coming in at today’s high rate of speed; that it was coming into only one location, primarily from one location; that it was structured and well-organized; and that when volume grew too large, data was purged because storing it was expensive.

That is how it was yesterday; today it is no longer the case. Consider today’s rate of data velocity, the many different kinds of sensors that render data exhaust; the many different locations (including mobile) that data flows into and out from, the fact that the data doesn’t tend to be organized, and that global businesses need to be able to write and read everywhere at any time.

Traditional relational database management systems, like Oracle, can’t deal with the latter.

“Today’s Enterprises need to work with databases that are flexible, fluid and that can keep up with the pace of business,” says Schumacher.

And if a company’s database strategy is insufficient, the consequences can be quite costly. They range from losing competitive advantage to actually losing business; consider that Netflix went down for more than 48 hours when its Oracle database failed.

“We couldn’t risk that happening again,” says Christos Kalantzis, cloud database engineering manager, Netflix. “Oracle wasn’t built for the cloud and it doesn’t work in the cloud at the level we need it to,” he adds.

So what was Netflix, or many other companies who find themselves in similar positions, to do but to search for alternatives? Kalantzis says that his company considered databases that were built for a Big Data world — MongoDB, Riak, HBase and Cassandra — they also considered building their own sharded solution.

“Cassandra was our clear choice,” says Kalantzis. “It’s more elegant and easier to manage and scale,” he adds.

Today, Netflix stores 95 percent of its data in Cassandra, from customer account information to movie ratings, bookmarks, metadata and even logs. With more than 200TB of data in their production clusters, Netflix operates one of the largest cloud platforms in the world.

Has Oracle been totally eliminated at Netflix, we asked. Kalantzis says “no”; like many Big Data solutions, Cassandra has yet to be recognized by the government for being safe and secure. Certain financial data, according to Kalantzis, continues to be stored in Oracle.

Growing Pains in the Big Data World

Netflix is hardly the only company at which Datastax’s Apache Cassandra solution has replaced (or is replacing) Oracle. Quickly growing startup Ooyala, which powers multi-device video for the world's biggest media companies and brands, brought in Cassandra because it could predictably scale their massive and highly variable influx of data, which now reaches two billion data points each day.

Cassandra enables their database to run smoothly and scale without surprises because they can simply stand up a new cluster whenever they need more capacity.

“We quickly outgrew Oracle’s MySQL solution, and it became clear that relational database technologies were not an option because they could not support our analytics and broader big data initiatives,” said Sean Knapp, CTO and co-founder, Ooyala. “NoSQL big data solutions are becoming the default tools, and we chose DataStax because the Cassandra community was by far the most focused, had the most unified vision and demonstrated incredible execution.”

Why Go for an Open Source Solution, Anyway?

Sure, Open Source software is typically less expensive to obtain, (its basic versions are usually free) and use, but that’s not the only reason companies like it.

“Open Source tends to attract higher caliber engineers; code tends to be reviewed by others, and it’s a magnet for attracting the world’s finest talent,” says Kalantzis.

And when you’re pioneering a brand new field, the latter is a must; as is using software that was built for the age of Big Data rather than retrofitted.Is sticking with legacy databases a business killer, we asked Shumaker. "If all you need to is run CRM and ERP applications in a single location, maybe not,” he says. "But as soon as you start looking at global, multi-location, mobile and high volume, you may have a problems," he adds. “And while you can try to shove a square peg in a round hole, why would you want to?” he asks. “Especially when you have a square peg at your disposal?”

Monday
Jul292013

Just In Time & Perfect or Too Late & Just Plain Wrong: Oracle Adds to Big Data Appliance Family

Larry Ellison has a dream. And at the moment it might be less about winning new customers than keeping the ones he already has.

It seems that every time the overlord of the database blinks, there’s someone new talking to his Enterprise customers about faster, cheaper and better ways to gain insights from their information and to run their businesses.

While in the past, Oracle customers might have turned a deaf ear to such talk, these days they are more likely to be listening and listening closely.

Why?

Because in a world where hardware is becoming more and more commoditized and new, Enterprise-grade software is built via Open Source, Oracle wants to charge a pretty penny, Add to that, that Oracle was in Cloud-denial for much too long and being late to the game often suggests a lack of leadership.

And though Oracle is eventually likely to lose market share for these reasons, there’s a bigger one: technology vendors who used to build their solutions with Oracle in mind are doing so less and less often because they have other good choices.

SAP, for example, offers super-fast database HANA to its Enterprise customers and it’s becoming widely accepted as a formidable foe — many believe it’s a giant-killer.

Though it’s yet to be confirmed, there’s reason to believe that Documentum’s Next Generation Information Sever, which may very well be cloud-based, will sit atop of its own database and so on.

So, in an IT universe disrupted by Big Data and the Cloud, where does Oracle sit?

“On top,” that’s what Larry Elllison would likely say. Maybe he’d be radio-ing in from one of the planes he paid for when he purchased Island Air.

Or maybe he’d tell you about Oracle’s latest Big Data news.

Catching Up with Big Data

Today the company announced two new additions to its Big Data Appliance family. They were created to help customers get into the Big Data game more quickly and to easily and cost-effectively scale their footprint as their data grows.

For companies who are timid about getting started with Big Data, it’s a good solution in a land of many good solutions.

According to Oracle’s press release, the Oracle Big Data Appliance is comprised of the X3-2 Starter Rack and Oracle Big Data Appliance X3-2 In-Rack Expansion. Oracle Big Data Appliance X3-2 Starter Rack enables customers to jumpstart their first Big Data projects with an optimally sized appliance. The Oracle Big Data Appliance X3-2 In-Rack Expansion helps customers scale as their data grows.

The specifics, taken directly from Oracle’s press release are below.

The new configurations include Oracle Big Data Appliance X3-2 Starter Rack, containing six Oracle Sun servers within a full-sized rack with redundant Infiniband switches and power distribution units; as well as Oracle Big Data Appliance X3-2 In-Rack Expansion, which includes a pack of six additional servers to expand the above configuration to 12 nodes and then to a full rack of 18 nodes.

Both new systems include the existing software stack for Oracle Big Data Appliance X3-2: Oracle Linux, Oracle Hotspot Java Virtual Machine, Cloudera's Distribution Including Apache Hadoop (CDH), Cloudera Manager and Oracle NoSQL Database.

Integrating Hadoop into existing enterprises, Oracle also provides powerful SQL access to HDFS data. This enables organizations to leverage their existing Oracle SQL skillsets and tools to seamlessly query and analyze data stored in Hadoop.

Additionally, Oracle Big Data Appliance X3-2 (full rack configuration) is now available through Oracle Infrastructure as a Service (IaaS), where organizations can obtain Oracle Big Data Appliance X3-2 on-premise, behind their firewall, for a monthly fee. With Oracle IaaS for Oracle Big Data Appliance, organizations can now eliminate upfront capital expenditures for their Big Data needs."

The development and introduction of these appliances was a “must do” for Oracle. The company’s customers have probably been chomping at the bit to dip their toes into Big Data waters and those who have come to rely on Oracle and can rely on them some more. Conversely, if Oracle had no easy, sugar-coated approach to offer, its customers would, no doubt, go elsewhere.

Oracle’s decision to include Cloudera’s Hadoop distribution as part of the appliance is also a wise choice. Cloudera’s Chief Scientist Jeff Hammerbacher was recently heard describing what it takes to get started with Cloudera, and it sounded like a 15-30 minute plug ‘n play. It should be noted that Hammerbacher was probably talking about what it would take for him to get it done (we suspect it would take much longer for the uninitiated).If Oracle can indeed make Big Data easy, then the company’s future may very well be bright.

What’s the possibility of that happening?

Speaking from his own private island, Ellison would, no doubt, say it’s a stupid question