I read the other day that “Data is like water” and thought what a great analogy. Most living things cannot live without water very long. Thank goodness, for most of us, water is abundant. It falls from the sky. If not treated right, water can get contaminated, and harm those who consume it. I concluded that data is a lot like water. Every organization needs data to survive and if it’s not treated right, it can definitely cause harm.
Now I’d like to share a few fireside stories about data and the importance of treating your data with love.
ETL and the SLA:
7:52AM used to have a special place in my mind. It haunted me! Imagine a hockey mask wearing, machete wielding lunatic chasing you through the woods. That is how I felt six days a week. It was a first world problem and I have since moved on from 7:52AM but here is the rest of the story.
Seven years ago I ran a for profit reporting and data warehouse system for one of the world’s largest banking and financial services organizations. It was my first week on the job. There was no ramp up time, no honeymoon, just the pressure of 200+ retail banks and credit unions and poorly written Service Level Agreements. Data and process owned me. Six days a week I was owned by nasty data and its SLA’s. One SLA in particular had tens of thousands of dollars worth of fines if broken three times in a given month. One fine and we weren’t profitable.
The SLA was simple, “data loads must finish by 8AM Monday through Saturday for all clients.” I will never forget that first week because the SLA alarm went off. It was Tuesday, 7:52AM, and the data loads weren’t finished. We aren’t going to meet today’s SLA and the phones are ringing. Time to deliver the news but first I have 5 minutes to figure out why and what I was going to do to prevent it from happening in the future. I get three strikes a month and I just may have burned one. This data is nasty, no this data was scary. The 8AM bell tolled and time marched on; bring on the challenge.
It turned out to be an upstream internal system delay impacting the original data drops. I was off the hook this time but I was not going to live this way. ETL was not going to own me! So we strapped on our work boots and took 3 hours out of entire load time and we placed monitoring and notification on every major stage of the load process. So this real life scenario turned out to be a nice data story.
I made sure that this would never happen again and it didn’t; but it haunted me six days a week. ETL can be complex, expensive, and frustrating. We need to think differently about ETL, ELT, or whatever acronym you want to use!
Governance is a four letter word:
Many years ago I worked for an insurance claims company that tried to move patients from ‘out of network’ care to ‘in-network’ care. Health insurance claims data is nasty enough but couple that with a source system that was 25+ years old and had no sheriff watching over how it was used and you have a very naughty set of data. Throughout those years there were many business, personnel, and program changes that were injected into the data. Almost every element of the data was inconsistent in some way.
The technology used to house the data was file system based and allowed for the data definition and the application data to get out of synch. The system was not type safe, so an integer today could easily become a varchar tomorrow. Sounds like hadoop but it wasn’t. I was asked to put together a data warehouse based on this system and it was nasty. This project taught me a lot about the need for data governance and just how important it was.
You thought this part was going to be about implementing governance, which is definitely scary. Sorry, this part is about the opposite: not implementing governance. There was one person in the organization that held the keys to that company’s data kingdom. She was in full control and she knew it and so did I! We did everything on her schedule and her whim. The project took a lot longer than it should have and was very expensive. Getting all of that data in a useful format was tedious and frustrating.
Nasty data rules the day, especially when you have to load inconsistent data into a very structured data model. Governance is a four letter word to some, but proceed with caution as the pendulum swings violently in both directions.
It seems like 80% of any data project’s time is used to prepare data and then 20% is used to develop analytics or just do something valuable for business. It doesn’t always have to be this way. Times are changing. Tomorrows systems will allow for self service all the way from transaction sources to analytics. This is exactly why I come to work everyday. It all comes down to how naughty or nice your data is, the difference is significant, expensive, and could cost you your customers. Think about this for a moment: ‘If your data was water, would you take a drink?