I’ve seen the progression of big data over time, with each wave advancing further up the beach, turning into an eventual flood of raw information. While there have been many small but crucial steps along the way, I generally think of the evolution of big data occurring in three major phases. The first phase reflects the initial wave of when big data really became, well, big, the second is where we are now, and the third is what we’re on the edge of — the near future.
First Things First – Defining the Term
When I refer to the term big data, I’m referring to high-volume, variable, rapidly generated and gathered pieces of information that, once processed, lead the way to insights, spectacularly detailed analysis and focused decision making. For example, when you open a browser on your computer, perform searches, visit a few sites, check email, grab an address for your lunch date and then call the restaurant on your phone, all of that creates granules of information that, when compiled, makes a massive sand pile of data.
- Pushing the Limits: Storage
The physical limitations of storage were once the bane of data collection. You can only fit so many boxes in a room, and initially the rooms were small. Soon the rooms got bigger and the boxes became more efficient. As access to faster, higher-capacity and physically smaller storage became possible, the world of big data also became possible. The capacity of tech storage has reportedly grown at a rate of 175% annually, and we create masses of information (although just how this compares to our past is a moving target) in much shorter chunks of time. The effect has been a positive feedback loop with the capacity of data collection and information storage stepping forward in tandem. Physical limitations will always be a component to the efficacy of big data, but it’s no longer the main challenge.
- Where We Are Now: Processing
While the term data processing doesn’t typically set hearts a-flutter, I think it should. Database building is an exciting field. We currently have an ocean of data, but the pipes we’ve been using to process it were built for a different system, and the flow has only increased. Software companies have answered the call, but we are just now entering a realistic shift to enterprise systems for big data analysis.
If you’ve dealt with big data, you’ve likely dealt with Apache Hadoop, the leader in open-source, large-volume information processing. Hadoop allows businesses to gather and process mounds of data with no formatting requirements, relative stability, and wide flexibility. Hadoop has been key in moving the industry forward. Yet, it’s kind of like a giant hammer. Like Maslow’s law of the instrument, when you’re a hammer, every issue starts to look like a nail. But data processing is more nuanced than that — it may be different kinds of nails or screws. It may be painting or detail work. That’s the point: it’s all of those things and more. Hadoop is a great hammer and securely knocks in the nails, but we need more sophisticated tools to handle those other pieces.
- Near-Future Challenge: Accessibility
I remember a world before Windows was widespread and Mac OS had yet to corner the market, the cursor on the PC screen would wait patiently, blinking, until I told it to do something. In those days, you had to know MS-DOS commands to make programs run. Unless you know the commands and understand what you want to do with the big data systems that are currently available, it can be a lot like facing that blinking cursor. The shift from a DOS system to a framework like Windows or Mac OS changed the world; similarly, I believe access to new levels of functionality and user-facing frameworks could provide a bridge to big data for non-specialists, the way operating systems did for the home computer user.
When Steve Jobs would speak about the importance of the personal computer and what it would mean for humankind, he often used a bicycle analogy. The PC, he said, was an exponential lift in terms of powering humans. The interesting thing about the big data space is that we are there again, grasping the handlebars of technology. It’s difficult to fathom what comes next because we’re in this really early stage, but the nexus of big data generation, processing — and now accessibility — are moving us toward it.