Author Archives: Matt Asay

The gold in the Big Data hills is fueled by open-source software

Who knew that there could be so much money in analyzing data?  According to IDC, the analytics market will be worth $50.7 billion by 2016.  And that’s without breaking a sweat.

The value in analytics scales with the mountains of data we amass.  It’s akin to something I wrote about years ago for CNET: 21st Century businesses thrive by driving abundance and then selling minimization of complexity inherent in that abundance.  It’s what Red Hat does with open source, what Google does with search, and what Facebook does with social.  It’s also what companies like Nodeable do with all the data your marketing/IT operations/sales/etc. systems throw off.

There are a few interesting stories buried in the growth of analytics, but perhaps the biggest is Hadoop.  While the overall analytics market grew by 14 percent in 2011, IDC has the Hadoop market growing by 60 percent each year through 2016.  Admittedly, that’s off a small base, but at that pace the Hadoop ecosystem, which Forrester already sizes at $1 billion per year, will be very, very big.

Big money for Big Data.

And for such a comparative pittance.  Hadoop, as I’ve argued before, has democratized data.  Big Data analytics used to be the province of expensive data warehousing systems, complete with proprietary software, expensive, proprietary hardware, and a smiling salesperson with their palm out, waiting for you to mortgage your house.  Not anymore.  Hadoop is open source and is run on commodity hardware.  It’s a game even cash-strapped organizations can play.

It’s not perfect.  Hadoop can’t do real-time, for example, which is why Nodeable buttresses its fantastic batch-processing capabilities with the real-time computation heroics of Storm.  Storm, of course, is also open source.

Which is what is so fascinating in this Big Data gold rush: it’s being driven by free and open-source software.  No wonder the market is growing in such dramatic fashion: it’s not being gated by vendors anymore.  That’s good news for all of us…including vendors.

Tagged , , , , , ,

Are we trying to fit square Hadoop pegs into round real-time holes?

Big Data is a big market, projected to top $16.9 billion by 2015, according to IDC.  The Hadoop ecosystem, alone, is worth $1 billion per year, according to Forrester, and is set to explode by most accounts.  What is less clear is for whom Big Data is big, and whether the workloads they’re currently running through Hadoop might not be best complemented (or, in some cases, replaced) with real-time analytics tools like Storm.

After all, given that 32 percent of Karmasphere’s survey participants are running Hadoop clusters smaller than 2 terabytes, and 55 percent are running clusters of 10 terabytes or less, the “Big” in “Big Data” really isn’t.  Not yet, anyway.  That will likely change as enterprises move from toe-dipping to diving into Hadoop and Big Data in a big way, but for now the workloads aren’t huge, and real-time tools like Storm might be ideal for managing them.

These workloads are also not necessarily being run behind the firewall.  While both Cloudera and Hortonworks are booming due to enterprises keeping their Hadoop jobs running primarily in their data centers, Amazon is already managing in excess of one million Hadoop clusters with its Elastic MapReduce service.  This is perhaps not surprising given that the majority of Big Data users tend to be business users, not hard-core IT people, according to Karmasphere’s survey.  These people are apparently very comfortable having their data processed in the cloud.

Interestingly, while there are numerous great applications for Hadoop, the majority seem to be using it for marketing-related functions:

All of which brings me back to the point I made earlier this week: some data are best analyzed in real time, not batch.  For many things, you’ll actually want both: a real-time view into what’s happening with your website, HR systems, etc., as well as a deeper, Hadoop-based analysis that is done in batch, after the fact.

Real-time analytics tools like Nodeable (based on the open-source Storm project) are not a replacement for Hadoop.  They’re complements.

Given that so much of the data being analyzed with Hadoop are still relatively small and marketing-focused, not to mention being analyzed in Amazon’s cloud, I’d argue that more of today’s data, not less, should be run through real-time analytics systems, and particularly hosted systems.  After all, while it’s useful to know how aspects of your online retail site are working hours or days or months after the fact, you actually want the “next click” to reflect real-time analysis, as Yahoo CTO Raymie Strata argues:

With the paths that go through Hadoop [at Yahoo!], the latency is about fifteen minutes.…[I]t will never be true real-time.  It will never be what we call “next click,” where I click and by the time the page loads, the semantic implication of my decision is reflected in the page.

Again, this isn’t to denigrate the importance of Hadoop.  At all.  It’s simply to suggest that for many applications, relying on batch-oriented Hadoop alone is an incomplete strategy.  Real-time is required for many applications, particularly those where Hadoop is being used today, and that real-time capability is delivered through Storm-based Nodeable or other real-time analytics systems.

It’s not Storm or Hadoop.  It’s Storm and Hadoop.

Tagged , ,

When Hadoop isn’t fast enough: The Argument for Storm

Big Data is a Big Deal, and Hadoop is arguably the driving force in Big Data.  But as awesome as Hadoop is – and it is quite awesome – it’s incomplete.  For many things, Hadoop’s batch workflow is just too slow.  You wouldn’t calculate trending topics for Twitter using Hadoop, nor would a hedge fund look for stock trends in real-time using Hadoop.  Because Hadoop doesn’t do real-time.  So the trick is to marry the powerful batch processing capabilities of Hadoop with a front-end preprocessing engine that works in real-time.

Like Storm, the project Twitter inherited when it acquired BackType in 2011.

At Nodeable we use Storm to surface real-time insights from system data, whether that system is GitHub or AWS or Salesforce.com or Twitter or an infinite number of data sources.  We operate under the assumption that users need real-time insights (Storm) and timely information (Hadoop).  It would be impossible to crunch all of a business’ data in real-time, and frankly not necessarily all that useful.  So Hadoop’s batch approach to data mining is great for a wide variety of jobs.

But not when you need to know something right now.  As Metamarkets CEO Michael Driscoll noted at a recent Churchill Club event, “Hadoop is not like having a conversation with your data.  Instead it’s like having a pen pal that you write from time to time.”  For things like clickstream analysis, IT early-warning systems, security and fraud detection, etc., that’s not fast enough.  So Storm is a great complement.

There are alternatives to Storm, of course.  Hstreaming, Streambase, and Yahoo S4 each offer real-time complements for Hadoop, though S4 is arguably the most like Storm.  We opted for Storm for many of the reasons Dan Lynn highlights in a presentation he gave at Gluecon. It’s open source.  It works really well.  And it gives our engineering team a great deal of flexibility.

But whether you use Storm or something else, you likely do need to figure out how to complement Hadoop with real-time.

Nodeable can help.  Nodeable provides real-time data streaming for Hadoop, which means that we provide front-end processing — summaries, counts, anomalies, status, trends — before data hit Hadoop and turns those data into useful information.  In other words, we not only give you real-time insight into your systems, but also normalize and enhance your data to make your Hadoop batch processing much more efficient.

Please sign up for beta access and tell us what you think.

Tagged , ,

Technology trends: Buy into realities, not hype

Call it irrational exuberance.  Call it hype.  Call it whatever you want, but understand that just because a big technology trend dominates the media doesn’t mean it’s The Right Thing for you or your company to embrace.  Not wholesale, anyway.

For example, consider just a few of the trends sweeping the industry, particularly those that are finding their way into job specifications.  HTML5, NoSQL (e.g., MongoDB), cloud computing, Hadoop, etc.  Spend a few minutes on TechCrunch and you’ll start to feel that YOU MUST BUY INTO THESE RIGHT NOW!

And, in some cases, you should.

But not always.  If you’re building a cloud application, for example, you’re likely going to want the scale-out capabilities of a NoSQL database like MongoDB.  But Oracle has a good point arguing that you wouldn’t want to build a checking application for a bank with NoSQL technology.   SQL has its place, and NoSQL has its place.  It just happens to be dominant for new school applications, which may not be the kind you’re developing.

Or consider Hadoop, which has dramatically lowered the (economic) bar to data mining/analytics.  Hadoop is fantastic technology, but it’s batch-oriented.  If you need real-time analytics, you’re likely going to want to couple Storm with Hadoop, as we do here at Nodeable.  Or maybe you should embrace Red Hat’s JBoss Data Grid 6, which “as an in-memory, key-value store…is much more optimized to handle the operations that Hadoop simply can’t: transactions like the kind found in e-commerce and financial trading systems.”

Does this mean you dump Hadoop and swap it for Data Grid or Storm?  Of course not.  But it does mean that developers need to look beyond the hype to determine what is the best tool for a particular job.

The same thing holds true in mobile, where HTML5 promised to be the end to mobile’s fragmentation problem.  Instead, it turns out native apps dominate the smartphone space, while content-friendly tablets are much more likely to be friendly to HTML5.

The list goes on.  Some swear by Node.js, but it’s best for server-side app development, and not a panacea, according to Nodeable CEO Dave Rosenberg.  Cloud?  You probably shouldn’t choose private or public, but both, declares Citrix’s Peder Ulander.  Etc. etc.

Each of these technologies is trending because it solves real needs in novel, useful ways.  But this doesn’t mean they’re right for you.  Of course, one of the great things about cloud and open source and other megatrends like these is that they tend to skew open, such that you can try before you buy.  As a vendor, this is a huge boon, as I’d much rather customers buy what they need and are happy, rather than getting duped into buying hype that doesn’t fit their needs.

That’s the old world: buy into the hype and regret is later.  The new world lets you regret your decision right away. :-)

Hadoop and Storm are shifting the industry toward Big Data-enabled cloud applications

Dave and I were fortunate to attend a Churchill Club event on Hadoop Tuesday night.  Hadoop sits at the center of the burgeoning Big Data universe, and so one might be tempted to conclude that it’s basically a finished product.  Not so, said the esteemed panel, which included representatives from Cloudera, Facebook, Metamarkets, MapR, and Oracle.  In fact, arguably the biggest opportunity in Hadoop isn’t Hadoop at all: it’s the cloud applications built on top of Hadoop.

Dave summarized the panel discussion on CNET, and highlights Cloudera CEO Mike Olson’s call to arms for Hadoop-based applications.  It’s something Olson has said before, including here on this blog, but it was particularly poignant against the backdrop of a deep, engaging discussion about Hadoop’s pros (powerful, open source) and cons (batch-oriented, complex, somewhat inefficient).

And it’s why I think Nodeable is a sign of the times.

We’re an application that depends upon Hadoop.  But we’re also a technology that improves Hadoop by front-ending it with Storm.  Hadoop is powerful but limited to batch-oriented processing of data.  Storm actually crunches data in real-time, in the stream.  The combination of the two is potent, and something that we only discovered while building out our application to ingest systems data and extrapolate insights via in-stream data analytics.

In the near future the back-end data processing via Storm and Hadoop will be managed behind the scenes by cloud applications, as Workday co-CEO Aneel Bhusri tweeted from the Churchill Club event.  For now, companies like Nodeable are helping to bridge the divide between complex infrastructure and simplified applications.

Tagged , , ,

Public vs private clouds: A matter of technology, politics, or culture

Is “hybrid cloud” or “private cloud” simply ways of saying that a company isn’t ready to fully embrace the “real” cloud? Cloudscaling co-founder Randy Bias arguesthat cloud computing requires a fundamentally different approach to sourcing and managing infrastructure, a point echoed by Amazon, which questions the very possibility of private cloud computing. There are surely different ways to embrace the cloud, some more advanced than others.

But the real question is whether an organization is culturally ready to embrace the cloud. If so, the necessary infrastructure follows and, importantly, it’s not necessarily always going to look like Amazon. As Mark Thiele writes:

For a legacy IT organization to adopt cloud solutions without significant organizational realignment and improved business participation, the benefits would largely be wasted. It’s akin to thinking you can put a modern 500-horse power engine in a 1970’s economy car and get all the same performance and protection characteristics you would enjoy in a 2012 model year luxury sedan.

In fact, the introduction of cloud without organizational improvements would likely increase enterprise risk and potentially cost. The real opportunity of a cloud operating model comes from the alignment of technical solutions, people, and process.

In other words, cloud isn’t something for IT to hatch in seclusion from the business side of the enterprise. Cloud is, in an ideal world, truly a function of what the business needs.

We’re starting to see this play out in the rise of DevOps, but ultimately the integration of the enterprise across functions will go even deeper. Cloud computing should demolish the walls IT has put up to protect its turf (and sanity). IT will need to work hand-in-hand with the business to build out the right cloud tools for a particular job, whether public, private, or a hybrid of the two.

Andy Jassy, senior vice president of Amazon Web Services (AWS), dismisses the notion of private clouds altogether:

If you look deep into what [private cloud vendors] are offering, you will see that it’s basically an internal data center that is virtualized and has some management tools. Organizations that have private cloud systems will have missed out on all the advantages and benefits of going into the cloud.

But this is easily said by the vendor best-positioned to capitalize on public cloud computing. Amazon doesn’t need to worry about a potpourri of hardware and software choices, built up over years. Bias argues that this is one of the great strengths of AWS and, indeed, of all big web companies like Google and Facebook that have been able to build their clouds from the ground up.

Within the average enterprise, however, years or decades of legacy hardware and software choices must be balanced against the new imperative of the cloud. And so they consider the cloud for resource bursting or carve out a private cloud for new applications. Will they run as efficiently as Amazon? Almost certainly not. But that’s not really the goal, is it?

IT can play an essential role in helping the business to rationalize its existing assets and complement them with cloud resources. I suspect one area that can help bridge the gap between IT and the business is better tools to express what is happening in cloud systems, and what this means for the business.

At Nodeable, we’re trying to build monitoring tools that go beyond mere reporting of what’s happening to express why things are happening, and how these cloud systems impact the business. One of the key reasons for our Twitter-like interface is that we want the system to be approachable to non-technical users. Because, frankly, cloud systems shouldn’t be isolated to IT folks.

The cloud has the potential to democratize IT, and bring the business into the IT conversation. Part of this cultural shift can be complemented by the right tools, tools that don’t drown users in arcane minutiae but instead present insights into how things are working, and why. This is the recipe for cloud success, and it’s something we as an industry are just now starting to figure out.

Tagged , , ,

IBM embraces the DevOps counterculture

When The Man embraces a counterculture movement like DevOps, does that mean it no longer counts as counterculture? We’re about to see because IBM is serious about DevOps, and not as some cheap way to co-opt a hippie-tech buzzword and make itself cool.  Yes, IBM, that enterprise technology vendor that in many ways epitomizes everything that the DevOps movement was set up to escape.

Until this week, I hadn’t realized just how serious IBM was about DevOps.  But on Tuesday I was fortunate to spend some time talking with IBM’s Bill Higgins.  As we talked, I started searching the web for more information on IBM’s involvement, and found it…everywhere.  Yes, there were the obligatory conference talks, but there was also smart discussion about how to pull off DevOps within traditional enterprise IT.

And a whole IBM blog devoted to the topic of DevOps.  Imagine that!

Yes, it just started.  But it’s great to see such a trusted enterprise brand do more than merely slap the DevOps label on a tired, old product line, hoping that customers won’t notice.  I’m sure IBM is doing that, too, but not Higgins and his team.  It sounds like there’s a very real, concerted effort to embrace DevOps within IBM and within its customer base.

I don’t know where this leaves the counterculture.  But I think it means DevOps is more than some passing fad.  And that maybe, just maybe, IBM is shrewdly embracing the counterculture again, just as it did with Linux and open source years ago.  That bet paid off big time for Big Blue.  I imagine its bet on DevOps will do the same.

UPDATE: Donnie Berkholz, part of Redmonk’s awesome analyst team, attended IBM’s Pulse conference earlier this year, and praised IBM for being out in front of its stodgy peers in terms of DevOps and other important trends:

IBM’s people really get it. They understand trends that are happening at the frontlines of tech today in startups and in open-source development. IBM is way out in front on enabling DevOps in big enterprises, and the teams working on DevOps inside Tivoli as well as Rational (which builds tools for developers) are outstanding. A lot of my experience with enterprises is that they’re slow-moving and often lagging trends by years, to the point where it’s nearly laughable, but in this case IBM is definitely a front-runner.

Tagged ,

Cloud computing may be the final nail in traditional IT’s coffin

The gods could be crazy, or they simply may not like IT very much.  Over the past decade, trend after trend has arisen to give power back to developers to get their work done, and away from the bureaucratic IT staff that want to manage that work.  IT, however, has the potential to claim a very valuable role in the changing enterprise, but requires a new mindset and mission.

In a new research report (executive summary available for free), Gartner articulates this shift away from IT-as-king to IT-as-facilitator, and how the cloud fosters the trend:

In 2010, organizations were compelled to consume IT services from external cloud providers to achieve their business, budget, and IT goals. But organizations realize that external cloud computing is not a panacea. They still need internal data centers to house critical applications and data. However, the use of external cloud providers has conditioned organizations to expect IT resources — whether internal or external — that are offered in an on-demand, self-service manner. Therefore, IT organizations are forced to offer IT services by using the same consumption model or otherwise risk extinction.

Poor IT.  First it had to deal with the rise of open source, and now cloud.  Both trends have forced IT to loosen its stranglehold on the software, hardware, and services used with the enterprise.  All of the arguments that failed against open source are also failing to stem the tide of cloud computing.

And rightly so.

But none of this means that IT is dead.  It just means, as Gartner points out, that IT’s role needs to change.  For example, DevOps is a very real phenomenon, but I suspect few organizations will have the know-how or brazen courage to take the Netflix route and embrace DevOps completely.  The trick is to give developers more flexibility without imposing on them the burden of setting up and managing all infrastructure themselves.  Yes, some will want precisely this.  But most just want greater influence over the tools they use.

They’re not going to get this from OpenView or any of the legacy tools from legacy vendors.  They’re just not going to blend private and public cloud resources, host their code on GitHub, and be bothered with clunky old management tools.  It’s not going to happen.

So IT needs to redefine its new role.  And it needs to get new tools for doing so.  Or it’s going to evaporate into obsolescence.

The problem with treating people like data: Learning from Autonomy’s mistakes

As much as we tout the importance of data in today’s fast-paced markets, Autonomy CEO Mike Lynch is a poignant reminder that people matter, too.  A lot.

HP bought Autonomy in late 2011 for $10 billion.  Autonomy was one of the UK’s brightest tech stars, but its CEO, Mike Lynch, was known to be somewhat of a difficult personality.  How difficult?  So bad that Autonomy employees gave Lynch a measly 20 percent approval rating. If the pundits think President Obama has a tough road ahead of him with a nearly 50 percent approval rating, imagine Lynch’s likelihood of getting elected.

No.  Way.

In fact, as Wired reports, the only way HP could maximize the value of its $10 billion acquisition was to fire Lynch.  This is ironic, given that Autonomy’s business is to “make sense of and process unstructured, ‘human information,’ and draw real business value from that meaning.”  The company that enables others to glean meaningful information from unstructured data was at pains to treat its employees as anything more than cogs in a machine, to be tightened and tweaked to force them to perform.

In other words, as much as we may want to boil business down to 1s and 0s, ultimately all business is about meeting human needs, not only as customers but also as employees.  Even Nodeable, which ingests machine data, processes it in real-time, and outputs useful insights is ultimately in the business of serving people, not machines.

Autonomy has built a good business based on serving customer needs. But it has started to decline as its employees struggled to enjoy apparently tyrannical working conditions.  By showing Lynch the door, HP has taken the first step toward treating both its customers and its employees with respect, which turns out to be very good business.

Tagged , ,

Survey: CIOs are confused on prioritizing IT projects, and especially on how to pay for them

CIOs are an optimistic lot these days.  According to recent survey data from InformationWeek, 61 percent of those IT executives surveyed indicate that their IT budgets will remain the same or shrink.  Yet the vast majority claim that important new projects for cloud, Big Data, security, and more will come from “new money” rather than “savings.”

How does that math work?

It’s not as if the proposed projects are useful.  As shown below, CIO’s seem to have a good handle on where money should be spent:

What they lack, of course, is a grasp on reality in terms of funding all these projects, as shown here:

And while we at Nodeable don’t have the be-all, end-all answer for how to fund these projects, we can suggest one: optimize efficiency of existing resources.  This actually fits IT priorities, generally.  After all, four of the top-five projects identified are “block-and-tackle” projects that improve existing systems rather than introducing a gee-whiz new line of business system.


The difference, of course, is that one can introduce a system like Nodeable and not only bring down costs (by tuning cloud systems based on our trending data, anomalies we flag in how resources are being used, etc.), but also drive one’s business by analyzing how resources are being used at a macro level.

I can see, for example, who is most active in handling JIRA tickets.  I can see which of my developers show up most often in the GitHub activity stream.  And while I can of course track waste in my use of AWS, I can also benchmark how my company manages its storage and compute resources against how others do.

Ultimately, what needs to be done is bring IT in better alignment with business goals.  The DevOps trend does this by reducing bureaucracy, allowing developers to get work done with a minimum of overhead.  This is the crowd Nodeable hopes to enable.

Otherwise, we end up with a mismatch of resources with goals, as InformationWeek points out:

What about hybrid clouds and cloud bursting, an activity that promises to dramatically change the face of IT spending and human resourcing as we know it? Marquee names like Zynga and DreamWorks are just two pioneers that have managed to optimize their internal infrastructure spend by balancing private and public cloud. Yet only 10% of our survey respondents identify private cloud as a top priority.

Worse, the project that came in at No.12 of 12, with a whopping 2%—launching or upgrading an enterprise social networking platform—is one that has the attention of non-IT partners….We guarantee you that if we had surveyed CMOs and their direct reports instead of CIOs and their reports, social would have been near the top.

Enterprises need to figure out how to do more with less, and that “less” means getting to more productivity with “less money,” which often will necessitate less cumbersome and costly bureaucracy.  Nodeable offers one way to accomplish this, and no doubt you can think of others.  It’s only as IT becomes more agile and joined-at-the-hip with business requirements that it’s going to be a hero in 2012.

Tagged , , , , ,
Follow

Get every new post delivered to your Inbox.

Join 53 other followers

%d bloggers like this: