Filed under Cloud

Are we trying to fit square Hadoop pegs into round real-time holes?

Big Data is a big market, projected to top $16.9 billion by 2015, according to IDC.  The Hadoop ecosystem, alone, is worth $1 billion per year, according to Forrester, and is set to explode by most accounts.  What is less clear is for whom Big Data is big, and whether the workloads they’re currently running through Hadoop might not be best complemented (or, in some cases, replaced) with real-time analytics tools like Storm.

After all, given that 32 percent of Karmasphere’s survey participants are running Hadoop clusters smaller than 2 terabytes, and 55 percent are running clusters of 10 terabytes or less, the “Big” in “Big Data” really isn’t.  Not yet, anyway.  That will likely change as enterprises move from toe-dipping to diving into Hadoop and Big Data in a big way, but for now the workloads aren’t huge, and real-time tools like Storm might be ideal for managing them.

These workloads are also not necessarily being run behind the firewall.  While both Cloudera and Hortonworks are booming due to enterprises keeping their Hadoop jobs running primarily in their data centers, Amazon is already managing in excess of one million Hadoop clusters with its Elastic MapReduce service.  This is perhaps not surprising given that the majority of Big Data users tend to be business users, not hard-core IT people, according to Karmasphere’s survey.  These people are apparently very comfortable having their data processed in the cloud.

Interestingly, while there are numerous great applications for Hadoop, the majority seem to be using it for marketing-related functions:

All of which brings me back to the point I made earlier this week: some data are best analyzed in real time, not batch.  For many things, you’ll actually want both: a real-time view into what’s happening with your website, HR systems, etc., as well as a deeper, Hadoop-based analysis that is done in batch, after the fact.

Real-time analytics tools like Nodeable (based on the open-source Storm project) are not a replacement for Hadoop.  They’re complements.

Given that so much of the data being analyzed with Hadoop are still relatively small and marketing-focused, not to mention being analyzed in Amazon’s cloud, I’d argue that more of today’s data, not less, should be run through real-time analytics systems, and particularly hosted systems.  After all, while it’s useful to know how aspects of your online retail site are working hours or days or months after the fact, you actually want the “next click” to reflect real-time analysis, as Yahoo CTO Raymie Strata argues:

With the paths that go through Hadoop [at Yahoo!], the latency is about fifteen minutes.…[I]t will never be true real-time.  It will never be what we call “next click,” where I click and by the time the page loads, the semantic implication of my decision is reflected in the page.

Again, this isn’t to denigrate the importance of Hadoop.  At all.  It’s simply to suggest that for many applications, relying on batch-oriented Hadoop alone is an incomplete strategy.  Real-time is required for many applications, particularly those where Hadoop is being used today, and that real-time capability is delivered through Storm-based Nodeable or other real-time analytics systems.

It’s not Storm or Hadoop.  It’s Storm and Hadoop.

Tagged , ,

Hadoop and Storm are shifting the industry toward Big Data-enabled cloud applications

Dave and I were fortunate to attend a Churchill Club event on Hadoop Tuesday night.  Hadoop sits at the center of the burgeoning Big Data universe, and so one might be tempted to conclude that it’s basically a finished product.  Not so, said the esteemed panel, which included representatives from Cloudera, Facebook, Metamarkets, MapR, and Oracle.  In fact, arguably the biggest opportunity in Hadoop isn’t Hadoop at all: it’s the cloud applications built on top of Hadoop.

Dave summarized the panel discussion on CNET, and highlights Cloudera CEO Mike Olson’s call to arms for Hadoop-based applications.  It’s something Olson has said before, including here on this blog, but it was particularly poignant against the backdrop of a deep, engaging discussion about Hadoop’s pros (powerful, open source) and cons (batch-oriented, complex, somewhat inefficient).

And it’s why I think Nodeable is a sign of the times.

We’re an application that depends upon Hadoop.  But we’re also a technology that improves Hadoop by front-ending it with Storm.  Hadoop is powerful but limited to batch-oriented processing of data.  Storm actually crunches data in real-time, in the stream.  The combination of the two is potent, and something that we only discovered while building out our application to ingest systems data and extrapolate insights via in-stream data analytics.

In the near future the back-end data processing via Storm and Hadoop will be managed behind the scenes by cloud applications, as Workday co-CEO Aneel Bhusri tweeted from the Churchill Club event.  For now, companies like Nodeable are helping to bridge the divide between complex infrastructure and simplified applications.

Tagged , , ,

Survey: CIOs are confused on prioritizing IT projects, and especially on how to pay for them

CIOs are an optimistic lot these days.  According to recent survey data from InformationWeek, 61 percent of those IT executives surveyed indicate that their IT budgets will remain the same or shrink.  Yet the vast majority claim that important new projects for cloud, Big Data, security, and more will come from “new money” rather than “savings.”

How does that math work?

It’s not as if the proposed projects are useful.  As shown below, CIO’s seem to have a good handle on where money should be spent:

What they lack, of course, is a grasp on reality in terms of funding all these projects, as shown here:

And while we at Nodeable don’t have the be-all, end-all answer for how to fund these projects, we can suggest one: optimize efficiency of existing resources.  This actually fits IT priorities, generally.  After all, four of the top-five projects identified are “block-and-tackle” projects that improve existing systems rather than introducing a gee-whiz new line of business system.


The difference, of course, is that one can introduce a system like Nodeable and not only bring down costs (by tuning cloud systems based on our trending data, anomalies we flag in how resources are being used, etc.), but also drive one’s business by analyzing how resources are being used at a macro level.

I can see, for example, who is most active in handling JIRA tickets.  I can see which of my developers show up most often in the GitHub activity stream.  And while I can of course track waste in my use of AWS, I can also benchmark how my company manages its storage and compute resources against how others do.

Ultimately, what needs to be done is bring IT in better alignment with business goals.  The DevOps trend does this by reducing bureaucracy, allowing developers to get work done with a minimum of overhead.  This is the crowd Nodeable hopes to enable.

Otherwise, we end up with a mismatch of resources with goals, as InformationWeek points out:

What about hybrid clouds and cloud bursting, an activity that promises to dramatically change the face of IT spending and human resourcing as we know it? Marquee names like Zynga and DreamWorks are just two pioneers that have managed to optimize their internal infrastructure spend by balancing private and public cloud. Yet only 10% of our survey respondents identify private cloud as a top priority.

Worse, the project that came in at No.12 of 12, with a whopping 2%—launching or upgrading an enterprise social networking platform—is one that has the attention of non-IT partners….We guarantee you that if we had surveyed CMOs and their direct reports instead of CIOs and their reports, social would have been near the top.

Enterprises need to figure out how to do more with less, and that “less” means getting to more productivity with “less money,” which often will necessitate less cumbersome and costly bureaucracy.  Nodeable offers one way to accomplish this, and no doubt you can think of others.  It’s only as IT becomes more agile and joined-at-the-hip with business requirements that it’s going to be a hero in 2012.

Tagged , , , , ,

Even developers like simplicity

How is it that we can manage to follow an average of 245 friends on Facebook and 350 people on Twitter, yet we struggle to effectively manage a handful of cloud resources?  Some will argue that it’s because social information is less important and hence requires less vigilance.  We can manage more because we actually manage less.  After all, we’re unlikely to be fired if we miss our uncle’s post about how many miles he ran today, but we could if we allow the website to crash and burn.  But the problem may also derive from the user experience.

Systems management tools get a bad rap, and for good reason.  The user experience can be less than appealing.

There’s a belief – a false one, in my experience – that technical IT folks must necessarily love complex ways in which to manage their systems.  I’m sure there are über-geeks for whom complexity is an end in itself.  But they’re the exception, not the rule.

In a conversation with an IT team at a Fortune 100 company earlier this year, one of the system administrators said that he’d buy a tool that “did whatever John would do.”  John was their Nagios expert, a system that no one else on the team could decipher.  As the sysadmin lamented, however, John sometimes goes on vacation or is unavailable while he (gasp!) spends time with family, etc.  So he wanted to receive alerts on his phone when things went awry, with one button: “Do what John would do.”

He’d click that button early and often.

Nodeable isn’t yet at the point where we learn John’s behavior in given situations and make it easily replicable by others in his absence, but as an industry I suspect we’re not terribly far off from being able to approximate this.  What we can do is simplify IT management by surfacing trending issues/anomalies/etc. so that the heavy lifting of managing cloud systems is done by Nodeable, not the developer or her operations team.  It’s not exactly “what would John do?” management, but it’s a headstart on seriously reducing complexity so that IT can focus on tailoring systems to improve business, rather than performing root cause analysis.

And, no, it’s not necessarily an easier in the cloud, even though the magic of the cloud can be the hiding of infrastructure complexity.  But the real complexity is in deciphering what’s happening in real time as apps are updated, systems are tweaked, etc.  Any changes are made on a granular level, not on a “system” level, as Enstratus James Urquhart argues,

If something goes wrong with an application, developers are on the hook to fix it, change it or kill it….However, developers and engineers can only make those changes one, or a few, components at a time. Nobody can configure the “system” to work an expected way. All you can do is constantly monitor the success and effectiveness of the technologies you deploy into the cloud, and constantly tweak them to make them as useful as they can be in that environment.

For developers to be most effective, they need to spend most of their time writing and optimizing their applications, not deciphering archaic error messages, constructing search queries in Splunk to search out root problems, or other traditional IT tasks.  A good system will surface insights into trending issues in real-time, based on continuous tracking of machine data that gives clues as to whether the changes to the system are helping or hurting.

In sum, a system is powerful not only in the various features it claims to have, but also in how well it obscures the complexity behind-the-scenes to let developers focus on writing apps.  It’s not yet “what John would do,” but it’s getting close.

Tagged , , , , ,

The cloud shifts the CIO’s role to “Chief Data Officer”

The longer I’m in tech, the more inclined I am to accept the truth of William Gibson’s quote: “The future is already here — it’s just not very evenly distributed.”  I saw this firsthand with a wealthy friend, who could afford to “see the future” by buying essentially unlimited broadband, powerful servers/computers, and more, and figuring out what the world would look like when average consumers could afford the same.

Sometimes, however, cost isn’t the gatekeeper to the future, but a willingness to risk is.  Such is the case with the cloud.

Google CIO Ben Fried thinks we’re nearing the tipping point for cloud computing, when CIOs determine that cloud computing’s cost and simplicity advantages outweigh other concerns like a lack of customizability, and jump in with both feet.  Sure, enterprises are already using cloud services: 86 percent according to Cloudability data; 81 percent by KPMG’s survey count; and 48 percent for SMBs, according to a recent survey.  But few big enterprises are using the cloud to handle the majority of their workloads.

In the future, according to Fried’s thinking, that will change.

Amazon is destined to displace big iron vendors like IBM and HP as the cloud becomes the preferred destination for enterprise computing, including mission critical workloads.  Just as Linux used to be relegated to the fringe of computing but came to dominate the heart of the data center so, too, will the cloud wreak havoc on the traditional data center business.

By taking technology out of the IT equation, Fried argues that cloud computing changes the way businesses structure themselves and do business, and may even force them to change the business they’re in altogether.  In many ways, cloud computing lets enterprises focus on the data that results from IT, rather than the IT itself.  This is a huge shift.

This isn’t to suggest that enterprises will completely forget about servers and such, but it does mean that they’ll think about compute resources differently, and will almost certainly have to think of new ways to keep tabs on them.  Companies like Boundary and Nodeable were built in the cloud for cloud resources, and focus more on surfacing actionable insights than on giving administrators or developers the tools to “spelunk” for themselves, which is inefficient and largely unnecessary in a world of semi-structured data accessed through APIs.

All of which would be a really bad idea if the cloud were just a fad.  But it’s not.  It’s how IT gets done going forward.  And it means that the Chief Information Officer is going to need to recalibrate the way she thinks about “Information.”  Namely, more a matter of “data” and less a matter of “technology.”

Tagged , , , , , ,

Channeling Dr. Seuss to explain real-time system analytics

Most companies have a UI/UX person.  Very few companies have a UI/UX person who doubles as a cartoonist and satirist.  Well, Nodeable does: Mike Evans.

On the downside, Mike rides a hipster bike and wears Bono-like sunglasses when he rides. He also has terrible recommendations on where to find good hot chocolate in San Francisco.

On the upside, he’s an award-winning film producer.  Not that Nodeable is in the habit of making movies.  But if we SERIOUSLY pivot, he’s the guy to make our My Own Private Utah movie.

Here’s one of his recent graphics for a presentation Dave is due to give in a few weeks.  Yes, it looks like it comes from a Dr. Seuss book.  But how many Dr. Seuss books do you know that deal with the hot topic of chewing through massive piles of system data to deliver actionable insights into how to optimize your infrastructure, and helping you resolve issues before they become problems?

Yes, that was a short infomercial.  But no, there are no such Dr. Seuss books.  Thankfully.

Vendor lock-in may well be the least of a CIO’s concerns: Defining openness in the cloud

There are many great reasons to use open-source software. Avoiding vendor lock-in perhaps one of the weakest. It’s not that lock-in isn’t a real concern, but it’s generally not a CIO’s first consideration. The first consideration is getting stuff done.

Which is why Rackspace CEO Lanham Napier is almost certainly wrong to castigate Amazon over proprietary lock-in.  Not wrong because he’s incorrect.  Wrong because it’s an ineffective strategy.

It’s also why Red Hat’s Gordon Haff is likely wrong to take on VMware using the same argument.  VMware’s Matthew Lodge tweaked his open-source peers over the “I’m the most open” discussion, arguing that “While the ugly sisters were squabbling, customers were getting on with business and choosing their Cinderella as VMware.”  What irked Haff most, however, was Lodge’s follow-on comment: “Openness is not about how you write software, it’s about what you allow your customers to be able to do.”

Haff responds:

He has a very good point, but again, it doesn’t really go very far, which is why Red Hat for years has emphasized value, not fluffy intangibles in its field marketing.  Yes, the company will talk about vendor lock-in for its high-level marketing messages, but the salespeople walking in to talk with a CIO?  They’re talking about performance-to-cost ratios over competitors like IBM and HP.  When it comes to talking business, Red Hat is all business.

And rightly so.

When a CIO reports to the CEO, she can’t point to “but look at all the freedom I gave us.”  She knows she’s going to have to deliver tangible results.  Which is why ex-JBoss veteran (and Cloudbees board member) Bob Bickel is right to point to alternative ways to be open in the cloud:

Again, this isn’t to fully deprecate the value of open source.  But it is to suggest that there are various ways to define openness in cloud computing, and source code is just one aspect among several, and perhaps not even the most important one.

At Nodeable, we feel that a nuanced approach to openness is the right one.  We use some great open-source software at the heart of our cloud analytics service, including Storm and Hadoop, but we don’t yet see how it would make much sense (or be of any real use) to anyone to open source our platform.  We do, however, see some value in open-sourcing our agent technology, and are exploring this.  Most importantly to us, however, is the ability for our users to easily get their data into and out of our analytics service.  And we do.

Data, source code, APIs, etc.  All factor into the new world of open.

If the cloud is all about simplicity, why is it so complex?

Software now leads hardware in terms of overall tech spending, and the biggest growth in software spending is actually not software at all, according to Forrester.  No, the biggest growth drivers in tech today are SaaS applications, general business intelligence products, and specialized analytical tools.  Sure, some of these run behind the firewall, but increasingly businesses are following consumers to the cloud.  Where businesses don’t seem content to follow consumers, however, is in the simplicity of their cloud products.

Some enterprise vendors get this.  Take Box, for example, whose cardinal rule for its content collaboration services is simplicity.  If real human beings don’t want to use the software/services, then why should enterprises waste money trying to force them to do so?

But far too many cloud vendors mire themselves and their users in complexity.  As Charles Babcock writes, “I am struck over and over again how easy it is to discuss cloud computing in high sounding terms, while those plunging into the cloud are thrust into a welter of new technology processes and complex responsibilities.”  Bingo.

Cloudscaling’s Randy Bias goes even deeper, picking apart the problems Infrastructure as a Service vendors have had in making it super-easy to run systems in the cloud.  As he notes, “[M]any engineers see understanding and developing complex systems as a rite of passage.  In reality, the true test of a great engineer is their ability to make things simpler, not more complex.”  As he goes on to say, complex systems tend to fail, but simplicity enables scalability.

Just ask Amazon, whose public cloud is by far the most widely used in large part because it’s comparatively easy for developers to use.

I see this in the systems management world, which actually tend to compound the problem of complex cloud systems with complex tooling that exacerbates the very problem it could help to solve.  Bias talks about the problems inherent in presenting users with too many choices (e.g., multi-hypervisor IaaS offerings), but the same is true with the language used to describe systems.

For example, we talk a lot about real-time, but the best description may actually be this one that I found: human real-time.  Ultimately, a user really doesn’t care how the vendor goes about processing and delivering insights into systems so long as it’s happening as fast as they need.  Nodeable uses continuous computation.  Other vendors may prefer an alternative.  But, again, I suspect very few users actually care.  They just care that the analytics serve up insights that are timely and germane to their jobs.

Cloud was supposed to make life easier for the enterprise, and I think it’s accomplishing that goal, on balance.  But we have a long ways to go toward simplifying the cloud for users, and not just in how we explain it.  Recent history suggests that the companies who do best at simplifying complex systems – think Apple, Microsoft, Amazon, and others – are those that win big in terms of revenue.  It turns out simplicity pays.

Tagged , , , ,

Big Data turns farming into data science

You know Big Data has gone mainstream when Middle America starts buying into it. As Ashlee Vance writes, two ex-Googlers are putting data to work for farmers to help them figure out how to plan for adverse weather and optimize yields.

In other words, Big Data is no longer just a Silicon Valley or New York finance thing. It’s becoming an Everyman sort of trend.

Staffed by an army of data scientists, the company is bringing data analytics to rural America and helping farmers reap more consistent profits from their fields. It’s an example of how cloud computing, modeling, and other technologies that have reshaped the Web and business are now revolutionizing more traditional industries.

Just as this company, Climate Corp., is doing for rural farmers, the next big trend is not merely aggregating data but rather interpreting it. Making it small, in other words.

At Nodeable we do this for systems data by making a real-time information network that app developers use. But there will be many Nodeable-like companies for different data sets. That’s where the industry is heading, and fast.

Tagged

Why Walmart should mimic Amazon.com and ‘open source’ its supply chain

Amazon.com didn’t get into the business of selling cloud computing services to make money from excess capacity.  That’s a myth.  Even so, Amazon.com did get into the cloud computing business because it knew quite a bit about how to manage infrastructure at scale, and made a bold bet that it could become the center of cloud developers’ universe just as it was increasingly the center of the retail universe.

So why isn’t Walmart, master of a hyper-efficient supply chain, peddling access to its supply chain expertise?

Walmart still needs to come up with a credible answer to Amazon’s dominance online, but there’s a great deal of retail business that will persist offline in brick-and-mortar stores.  Even Amazon recognizes this, and has been experimenting with offline retail.

Walmart, for its part, has been scrambling to innovate online, and most recently has been talking up an open-source Big Data strategy.  This strategy involves open-sourcing tools that Walmart is building to move data from legacy tools into Hadoop clusters, and should be a boon to the countless others that will have similar data management needs.

It sounds like smart strategy, and it is.  But it doesn’t get to the heart of what Walmart could, and perhaps should, be doing.

Back to Amazon.  Amazon Web Services was never about selling Amazon.com’s excess capacity.  That’s a myth, and one that Amazon CTO Werner Vogels rejects:

The excess capacity story is a myth. It was never a matter of selling excess capacity, actually within 2 months after launch AWS would have already burned through the excess Amazon.com capacity.  Amazon Web Services was always considered a business by itself, with the expectation that it could even grow as big as the Amazon.com retail operation.

What isn’t mythical, however, is that Amazon understood web applications at scale, and knew how to build the necessary infrastructure to drive them.  As Amazon CEO Jeff Bezos explains, Amazon both had the knowledge and the need to create this infrastructure for its own use, and figured it could then turn this into a serious business:

Approximately nine years ago we were wasting a lot of time internally because, to do their jobs, our applications engineers had to have daily detailed conversations with our networking infrastructure engineers. Instead of having this fine-grained coordination about every detail, we wanted the data-center guys to give the apps guys a set of dependable tools, a reliable infrastructure that they could build products on top of.

The problem was obvious. We didn’t have that infrastructure. So we started building it for our own internal use. Then we realized, “Whoa, everybody who wants to build web-scale applications is going to need this.” We figured with a little bit of extra work we could make it available to everybody. We’re going to make it anyway—let’s sell it.

Now think about Walmart.  While Walmart’s tagline is (or was) “Everyday low prices,” with an emphasis on delivering reasonable quality for market-beating prices, the way Walmart achieves this is through its legendary supply chain.  No one beats Walmart in terms of managing the process of filling shelves.  As one commentator notes:

Basically Wal-Mart runs on an entirely different road than everyone else, a sort of information data superhighway. Wal-Mart knows literally everything that any retailer could ever want to know about one of its products.  It’s been said that if the US was ever in World War III, the first thing to be taken over by the government would be Wal-Mart’s supply chain. It has THAT kind of performance power.

So if the supply chain is so good, why doesn’t Walmart “open source” it?  Amazon recognized early that it could build a significant business by outsourcing its expertise in infrastructure.  Why can’t Walmart do the same with its supply chain?  And just as web application developers have crowded into the shadow of Amazon Web Services, I suspect we’d see an army of brick-and-mortar retailers of all sizes happy to tap into Walmart Supply Chain Services.

Ultimately, retailers aren’t in the business of supply chains any more than application developers are in the business of infrastructure.  Walmart should be thinking of how to leverage its supply chain expertise to become the center of an ecosystem, and not simply the center of its own P&L statement.

And while I’m dispensing all this free advice, I’ll just add: Nodeable would be happy to track and analyze all the systems that feed into the supply chain, giving users a single pane of glass to see what’s happening throughout the supply chain.  Because, hey!  We’re generous like that.  :-)

Tagged , , , ,
Follow

Get every new post delivered to your Inbox.

Join 52 other followers

%d bloggers like this: