Big data is here and unavoidable
For years, we’ve written about big data and showcased the progression of business intelligence available now to brands of every size, in fact, most businesses have a feel for this type of data – open a spreadsheet of your sales data and you already know it’s just a bunch of numbers unless they are analyzed and filtered. Today, I want to review what big data is, how it is currently being used, what this means for the future, and most importantly, how it can be cherry picked and why it can upset entire industries.
Let’s talk about how BIG this really is
Let me illustrate. The University of Nebraska physics department has 1.6 petabytes of data – that’s 1.6 million gigabytes in one department at one school. Boeing jet engines can produce 10 terabytes of operational information for every 30 minutes they turn. As of 2012, the average smartphone user has 736 pieces of personal data collected every day, stored for one to five years by service providers.
Big data is already being used today in a big way
Big data is a big deal and it’s not just because there’s a lot of it. In fact, today alone, SumAll raised $4 million and DataSift raised a whopping $42 million to help businesses make sense of their data as it relates to social media.
Predicting the future with big data
But it’s not just that data is having a tremendous impact on life today, it is still a young sector with many startups yet to pop up to solve the data conundrums. SiftScience fights fraud using machine-learning that learns from data to recognize patterns of fraudulent behavior based on past examples, and Hadoop helps companies analyze massive amounts of generating about user behavior and their own operations while Recorded Future uses algorithms that unlock predictive signals based on web chatter to determine a brand anticipate risks and capitalize opportunities.
But big data has some really big problems
First, and least upsetting, is that there are big problems with demographics, leaving brands with a lot of data that doesn’t yet mean much. Why? Incomplete self reporting is a huge issue because brands are still focused on using social networking profile data to gather intelligence on their site users, fans, and the like, but when they rely on this data, people may not be completely truthful (they may say they are 32, but they’re 12, and so forth). Additionally, privacy does protect users to a certain extent, blocking intelligence gathering by brands. Lastly, data is still largely inconsistent and unconnected – you may have a Twitter account and Facebook account, but a third party doesn’t know that unless (a) you use the same username consistently or (b) you grant access to both accounts through that third party.
While other problems exist (like how will we ever store all of this data, disseminate it, and make sense of it, and does it all really matter?), the biggest one we see is the potential for cherry picking, because when you look at a data set, it still takes a human to actually determine what is important to garner from that data set.
Taleb addresses something that could lead one to think that big data is faulty and bad, but perhaps Taleb is really pointing out the human nature that is still required in some instances of analyzing big data – and most people would not typically question a researcher or their methods, leaving analysis in its youngest phase subjective.
Chris Treadaway, CEO and Founder of Polygraph Media which is famous for data-driven analytics said, “To analyze big data, you have to know when you have enough data, know that you’re looking at the right data, and know how and when to draw conclusions from the data using methods developed from statistics theory and data science. That’s the great irony of “big data” – it’s as much of an art as a science, which is why the best efforts are multidisciplinary.”
“Big data can find tremendous hidden relationships,” Treadaway continued, “but you have to make sure your bias isn’t to find conclusions that don’t exist. Bias can cause the situation Taleb describes, and will cause disinformation as he says. If you’re cautious, discerning, and careful, you can make the most of big data. But there are pitfalls for the careless.”
And the coup de gras
Several industries are seeing data about them individually, their performance, their company, their finances, all analyzed and repackaged for public consumption or monetization.
Imagine a site launches tomorrow based on publicly available data and you’re a social media consultant. Let’s say that this new site looks at who has recommended you on LinkedIn, Yelp, Angie’s List and so on, and has determined that the people recommending you are clients of yours, based on the assumption that it is the only reason they’d recommend you or review you. The new site also analyzes words and pictures used in your online bios to determine characteristics about you.
Then, they take those reviews and characteristics and quantify you into a score, giving you more points if someone from Coca Cola reviewed you than if the local dentist reviewed you, implying that you’re a higher quality consultant if you’ve worked with a major brand like Coca Cola than if you worked with a local dentist (God forbid you specialize in social media for independent medical professionals).
Then, Google gets interested in this new site and they invest, and later, they want to use that data to populate your Google+ profile, so now you, the social media consultant, has a score next to their face to determine how good you are at your job.
What’s wrong with that?
This scenario is fake. For now. But with every human generating billions of data points every year, evaluations are just the first of many steps in what is to come with big data – the data is now generated, and it is a race to see what can be displayed about you and your business so that companies can sell to you or repackage your data and sell it to someone else. Even your brand will be using big data to gain insights into your customers so you can better serve them.