Eleanor Roosevelt: Big Data Guru
Eleanor Roosevelt once said, “The future belongs to those who know the beauty of their data.”
We are taught that Eleanor Roosevelt was a feminist, but so few know that she was a futurist, having completed an advanced degree in mathematics.
She also said, “Great minds discuss analysis; average minds discuss data; small minds discuss SQL.”
And to me, this is her greatest quote: “Relational databases and ANSI SQL are based on sets, not objects, and relations in a database are based on matching values, not pointers. This is Object-Relational Impedance Mismatch; SQL should be scorned and seen as legacy!”
Ok, you get the point. Complaining about SQL is silly, as silly as my ability to quote great leaders, because, in the end, it will not matter. We, technical data people, those who should be guiding the Big Data way, get caught up in debating the wonky, and we shut off accessibility to so many who need it.
I want to focus my career on making data access simple and flexible for non technical people and leave the ideological imperatives to the geeks.
I know that a wall exists between the Marketer and his customer data, and it has nothing to do with programming languages, and everything to do with simplicity and attitude.
I want to tear down the wall so that the future will belong to Marketers who know the beauty of their data.
Programing languages widen the gap!
All programming languages were/are designed to do one thing: structure, visualize, manipulate, parse, persist, or do something with data. One of the easiest forms of a language for people to use is Structured Query Language or SQL. SQL was originally designed to be used by non technical people to ask questions, AKA queries, of their relational data and get back answers. The only problem is that as your questions got more complex so did your SQL query. If your data did not conform to the relational structure and was more hierarchical, or poly-structured, the query complexity became exponentially more difficult to author. Let’s get something straight! None of this is resolved by using Java, C++, Python, or any other syntactical language. Even if you used a human, spoken or written language it would still not suffice due to subjectivity and other data governance matters.
The questions are getting more difficult to answer!
IT departments have accomplished great feats with respect to Business Intelligence and Dashboards. Dashboards contain ‘Key Performance Indicators’ which are defined to measure what happened up until the last data load. SQL generating tools like Microstrategy, Cognos, and Tableau were developed to house dashboards and canned reports. BI tools also hide the complexity of languages described in the preceding paragraph. People use these tools to find out what happened: Gross Sales, Churn Count, Profit/Loss, etc. Business however was not satisfied, they want to know ‘why’ things happen (factors/features/attrition) and more importantly what is going to happen ‘next’ (predictions). Our appetites for higher learning of our customers grew and so did the birth of Modern Data Science. Data Science is here to stay but it has only widened the gap. People are struggling to understand the new vernacular, tools, and processes of advanced analytics. There are incredible ethical concerns! Artificial intelligence is taking on more cognitive and decision making roles that were once thought to be people only domains. Sometimes humans are reluctant to that fact, thus widening the gap.
More than just transactions!
Customer transactions are a minimal percentage of the overall data landscape. The data of size comes into play when we start to look at interactions. Interactions include behavioral data such as clicks on a website, calls to a call center, voice and video, social media, mobile apps, email, and various other forms. Unlike transactions, interactions don’t fit nicely into rows and columns without some form of tricks and transformations. Interactions exist whether the transaction happens or not. This isn’t just ‘Big Data’ but it is ‘HUGE DATA’ and it also widens that gap. It is almost cliche to quote the predicted volumes of data in the next 5-10 years, so I won’t. Just understand that in the future almost everything will have some data coming off of it that could be stored and related to a customer. The point is how do we find the needles in the haystack when I have many haystacks in my customer data ecosystem.
My rules to tear down the wall:
In the past we would have added this data into a data warehouse and waited for queries to come back. Sometimes they didn’t come back at all. This led to many open source initiatives like Hadoop, Spark, and other big data tools. With all of these mentioned gaps we must consider alternative approaches to use and delivery. Data innovation will be the primary means to stay competitive and below we will review some ways to think about data differently:
- A Graphical User Interface: Users shouldn’t have to use a language, human or computing, to construct the questions in order to get answers. The Graphical User Interface should define 90% of the required questions and it should be simple and intuitive. Using a programming interface language should be the last choice of the user. We want to empower all users regardless of technical capabilities.
- Answers in Seconds: Have you thought about this question: What is the time value of data within your customer enterprise? Users shouldn’t have to wait more than seconds to receive the answer to their data questions. The rate at which answers from a query come back directly impact what business challenges you can solve. Never again should you say: “The database won’t come back fast enough for us to solve that problem.” Run-time velocity also allows the user to change directions faster and iterate more frequently.
- Reduce Extract Transform and Load (ETL): Data should be leveraged in its most natural and human usable form with as little transformation as possible. The most expensive part with respect to data is all the transformations I have to do to it in order to make it usable. Data transformation should be done at the time of the question or need. Legacy technologies required Star/Snowflake schemas in order to make up for performance issues.
- Define what data you need just in time(JiT): Legacy solutions took as much of the data and transformed it into usable form. Data was landed, transformed, and competed for resources even if it was never used. This is a misuse of resources. Semantic layers that describe data should be within the control of the business users of that data. We should use only what we need and discard when not needed.
- Data cannot be monolithic: We need to reduce our dependence upon rigid data models. As new data sources are functionally available they should be made available to the user as quickly as possible. Having a data model greatly reduces that agility. New sources of data are going to enter and exit the enterprise rapidly. Providing business with the ability to exploit data is a competitive advantage over competition.
I chose to work for ActionIQ because I am surrounded by people and customers who share similar beliefs about the future of data.
Forgive me, Eleanor Roosevelt!
These are the actual quotes, which are truly inspiring: “The future belongs to those who believe in the beauty of their dreams.” and “Great minds discuss ideas; average minds discuss events; small minds discuss people.”