PwC’s Technology Forecast recently addressed the topic of data lakes. The coverage included research and interviews on data lakes and how they can help enterprises remove integration barriers and clear a path for more timely and informed business decisions.
To continue the discussion and look at some of the challenges enterprises can face in implementing a shift to data lakes, we are sharing an excerpt of a conversation between Technology Forecast’s Alan Morrison and Terry Retter, president of small business consultancy BrightZone, in Reno, Nevada; a former VP/CIO of Grubb & Ellis, and a PwC alumnus.
AM: Terry, you were a CIO. Some companies say they’ve created a data lake. In reality, they’ve built a single-purpose sandbox. How can CIOs get their organizations to commit to the strategic, long-term vision of a true data lake?
TR: By dealing with real problems and real users. They should focus on a service or a perception problem among customers they must resolve to avoid losing profits or market share. They should start small, but think big, in data lake terms. They shouldn’t collect data just around a single process. Instead, they should gather everything they can think of while using the lake at first to solve a particular problem.
They could hire the smartest data scientists they can find and let them loose. The CIO probably wouldn’t keep the project a secret, but he wouldn’t make it a mainstream initiative to begin with either. Likely, he would start in one department, with some sort of collaboration or cooperation with IT.
AM: A lot seems to hinge on the perspective of the teams they’ve hired.
TR: And a lot depends on sponsorship. Back in the day, when we started an experiment, we didn’t have carte blanche, but we did have the blessing of the CEO. If you’re doing something nobody’s done before or might rub some senior executive the wrong way, you need a sponsor.
AM: Data lakes are unique because they can retain the most complete context of files in their original formats and make it possible to capture, share and preserve the semantics of new contexts. How can the CIO translate that capability into an essential business mission the C-suite and the rest of the enterprise will care about?
TR: The mission of a data lake is to find clues to help the company answer high priority questions. The questions we have today are not the same questions we had to answer yesterday. And the questions we’re going to get tomorrow are not the same questions we have to answer today.
Traditional IT says to identify what data should be collected to answer those questions. The data lake concept says you don’t know what to collect because you don’t even know what the questions are, now or in the future. To begin with, you need the right questions to surface clues to the answers.
AM: What’s the biggest technology roadblock companies are confronting with the lake?
TF: The technology emerging from web-based enterprises is very different than what Fortune 500 companies perceive to be useful. Theorists, tech futurists and tech researchers are creating and envisioning how some of this stuff is going to work. They hope it will be not too far in the distant future. In reality, it’ll probably be five to ten years. Maybe 80 percent of companies are focused on what they need to do today to solve the problems that their board of directors and their users are asking them to solve. They can’t look at tomorrow.
AM: What if the CIO gets a data lake project dumped in their lap?
TR: The risk is CIOs will do the minimum without really committing to the vision. Maybe they’ll consider it a side project, or treat it as an experiment by not putting their best talent on the project.
They could say something like, “We have lots of data available already. Is this another data warehouse-like activity? How will this tie into what we already deliver when it comes to inventory, HR, accounting and the other core applications? The data is already there, why not just use it better?”
In many respects, that’s a very similar scenario to what transpired in data warehouses and data marts. A lot of companies implemented the data warehouse/data mart as either an adjunct to or outside of core IT. Why? The data warehousing advocates could get data architects and technical gurus who knew all these details or thought they knew it and put them in a department and they could go to work and do stuff.
At the time, IT was working on core applications and ERP. IT didn’t have time for another major project. As a result, the data warehouse became disconnected from the business, and IT couldn’t respond to business questions quick enough and change fast enough.
Data marts became segregated, narrow solution sets for one user or two and very expensive with minimal or no return on investment. All of them were trashed. I know companies that spent $100 million on data warehouses and got what they perceived was $0 return. Not that they broke even, but they spent $100 million and didn’t get $1 back on their investment.
So the questions many CIOs ask is “Are we headed there again?” That’s a concern. I don’t necessarily share that concern, but that’s another discussion.
For more details on the benefits and challenges of data lakes, see “Data lakes and the promise of unsiloed data”.