The problem with being data-driven

I first entered the tech industry right around the time that Facebook started taking over. I'm sure the idea of being "data-driven" existed before them, but they seemed to really popularize the concept, and I 100% bought into it.

The data-driven philosophy goes something like this: businesses tend to make bad decisions because they aren't rigorous about validating whether the decisions are actually correct. Instead of just trusting intuition, decisions should be evaluated scientifically. Just like with the scientific process, businesses should (1) create a hypothesis, (2) set up an experiment where they can test the hypothesis, and (3) do whatever the data says is "correct".

For example, maybe a company wants to redesign their web page, logo, etc. The old model might have been: our brand marketing team thinks our design is giving customers the impression we're outdated, so we need to modernize our brand. The new data-driven model would be something like: let's create five different variations of our home page, show them randomly to different visitors, and measure which one converts the most people.

This sounds like a no-brainer. Of course you should use data to validate that your decisions are correct. Back in the ~2007 when I first started learning about data-driven decision making, it seemed like the one absolute truth. There was nothing to argue with.

So of course when we were building Less Annoying CRM there was no question that we'd want to be data-driven. One of our very first hires was a data scientist (before we even had a full-time developer). We wanted to measure everything, build sophisticated reporting, and use data to drive every decision we made as a company.

Fast-forward to today: we hardly make any truly data-driven decisions.

Why?

Problem #1: It's easy to measure something. It's hard to measure the right thing

One of the core ideas behind the data-driven philosophy is that you can identify metrics that correlate with the outcome you want. Like in the example above, if you're redesigning your home page, it seems natural that you can evaluate the results by looking at the conversion rate.

This works in some cases, but most important decisions have nebulous and far-reaching effects, most of which are difficult to measure.

For example, it's common in the startup world to hear people suggest that you should test your pricing. Should your product cost $10/month, $25/month, or $50/month? Should you charge by user? Have tiers? Require annual contracts? Just test it and find out! Randomly show different pricing models to different visitors, measure how many people sign up for each, multiply that number by the prices, and you can easily identify the "optimal" price.

Sure, you can do that, and you'll get an answer, but is it the right answer? Does pricing really only matter during this immediate "pay or don't pay" decision a prospective user is making while on the pricing page?

Of course pricing impacts so many other things. Maybe a customer isn't ready to buy no matter what, but if your price is acceptable they'll plan on coming back later to sign up. Maybe they're on the fence about whether your product is worth it and they'll cancel after a few months if it's too expensive. Maybe they like your product but they're somewhat embarrassed by how expensive it is and so they won't suggest it to any of their friends.

Your pricing model is fundamentally connected to a customer's long-term perception of your brand, product, and company. If you think you can measure all of that with an A/B test on the pricing page, you have a very shallow view of UX.

Problem #2: Little wins add up to big losses

I mentioned above that I've always viewed Facebook as the data-driven gold standard. I think they rode it to huge success, but I also think that after they lose their market dominance, they'll be the perfect case study of data gone wrong.

There are countless examples of this just with Facebook, but here's one: back in the day, Facebook had this handy "notification" section which would show you things you need to know about like someone posting on your wall or replying to one of your comments. Because these things are really critical to one's use of Facebook, I would check notifications religiously.

But then Facebook's growth team realized that people wouldn't ignore notifications, and so they ran a test: what if they added more types of notifications that weren't as essential? "A friend posted for the first time in a while" or "12 of your friends liked this post" or "there's an event happening near you". It's clear that over the last decade Facebook ran countless experiments like this, and the results were presumably clear: sending more notifications increases user engagement which in turn increases revenue. The data proves that this is the right decision!

Each of those experiments in isolation were correct. Any individual notification would increase my engagement. But what they didn't measure is that I was becoming increasingly frustrated with the distraction. Eventually I got one notification too many and I snapped. I deleted the app from my phone and turned off all other forms of notifications. This was the beginning of the end for me. Since I wasn't getting any notifications, I basically stopped checking Facebook, and eventually closed my account entirely.

Facebook probably didn't make a single bad decision in this entire story. Everything they did was optimal according to the data. But the experiments weren't able to evaluate the impact these decisions would have in aggregate. At some point, a human needs to say, "I don't care what engagement numbers look like, this is messed up" (see also: Youtube's algorithms suggesting white supremacy videos all the time because of "engagement")

Problem #3: Data helps you find the local maximum. It's up to you to find the global maximum

If you have something that's already good and you want to make it a little better, running a data-driven experiment might be the way to go. It probably can answer questions like "which of these two headlines works better" or "should our onboarding process use a video or text tutorial?"

But those decisions normally aren't the most important ones at a business, especially in the early days when you don't have enough customers to collect a statistically significant amount of data. If you're just getting started and you're fighting for every customer, chances are (a) you can't run a good experiment about most things and (b) even if you do, it will lead to such a small incremental change that it won't matter.

The decisions that matter are things like "should we be selling to small businesses or enterprises" or "who's the first type of employee we should hire". Good luck answering those questions purely with data. You can probably set up some kind of experiment, but like problem #1 describes, it almost certainly won't be measuring the full impact of the decision and you'd be better off just using your judgement.

I learned this lesson the hard way. As I mentioned above, one of our very first hires was a data scientist. He was a brilliant guy who helped collect data and generate reports so that we have all kinds of insights at our fingertips.

After about 2.5 years he decided to leave the company which posed a big problem. No one knew how to maintain his code or infrastructure. We didn't have the resources at the time to ask someone else to figure it out, so we made the tough decision to just shut down everything he built. And then...nothing changed. Seriously. I'm sure that with the help of the data we were making minor iterative improvements, but they were completely drowned out by everything else going on at the company.

I'm sure that this stops being true as a company grows. A 0.1% improvement doesn't mean much to us, but it probably means a lot to Facebook. So at least while a company is small, the improvements you can get from being data-driven might matter, but they almost certainly matter less than other things you could be doing with your time.

That doesn't mean you should never look at data...

I don't want to come off as anti-science. If you're facing a decision that really can be informed by easily-measured metrics and you have a statistically significant amount of data to work with, then by all means, go for it. But at some point data stopped being a science and started being a religion. Some entrepreneurs have completely abdicated their responsibility to set the vision by letting data do it for them. But a machine learning algorithm isn't going create your vision or strategy. That's your job.

Have thoughts on this post? I'd love to hear from you! I'm @TylerMKing on Twitter.