The Data Adversary

By | Published: November 27, 2012

“Each film is only as good as its villain. Since the heroes and the gimmicks tend to repeat from film to film, only a great villain can transform a good try into a triumph.”

–  Roger Ebert

Data scientists are really on a roll. Their name has changed from “analyst” in the vernacular, and at the same time they have gone from sniffling egghead to white hatted hero played by Gary Cooper. Or maybe 007, extolled as the Sexiest of the Century. Or an even better archetype might be The Girl with the Dragon Tattoo, with a dark side and revenge on her mind.

The movement of course is truly significant at the grass roots level where thousands of data scientists work without fame or fortune. Every big trend has its famous faces, though, and a few data scientists and their adherents have emerged from obscurity (baseball’s Billy Beane  was one of the first to become a household name, especially when he went from book form to Brad Pitt incarnation in Moneyball). By far the hottest name in data lately—and maybe ever—has been Nate Silver, who took his predictive analytic talents from Beane’s world of baseball to politics, and started predicting election outcomes with exceptional success.

For all the recent success of the data scientist née analyst, the plot only gets interesting with the entrance of a capable foil. Enter the “data adversary.” The favorite quote of the data adversary is Mark Twain’s “There are three kinds of lies: lies, damned lies, and statistics.” He knows what he sees with his own two eyes; he knows what his years of experience tell him, and your numbers don’t sway him. The same trends that have brought fame to some of our protagonists have also drawn high profile data adversaries to the stage. Many baseball fans know of a recent drama that played out pitting the data nerds vs. the Luddite naysayers—the American League MVP vote.

For you growing legions of non-baseball fans, allow me to provide a synopsis. Baseball fans familiar with the story can skip down to the paragraph starting with “Outside of Angels and Tigers…”

Rookie phenom Mike Trout, centerfielder of the LA Angels, and veteran 3rd baseman Miguel Cabrera of the Detroit Tigers were clearly the best two candidates for the Most Valuable Player award this year. That these two stood out above the rest was beyond debate. Which of them was more deserving, on the other hand, was a matter that generated hot controversy.

Cabrera was the first player in over forty years to win the American League Triple Crown, meaning he led the league in the three traditional batting statistics of batting average, runs batted in (RBI’s), and home runs. Such a rare and high profile feat would be a virtual lock for MVP (Boston old-timers would bring up Ted Williams and Joe Dimaggio in 1947, but that’s another story entirely). Baseball’s dataphiles, however, had a different take.

The statistical nature of baseball has made it a magnet for data junkies for a long time. So much so that baseball “data science” has its own name, “sabermetrics,” and its practitioners are “sabermetricians.” They even have a founding father, the venerable Bill James, a man whom any data scientist—baseball fan or not—should get to know.

One of the many data driven insights brought to the fore by sabermetricians has been the fact that Triple Crown components, especially RBI’s and batting average, are overrated and a poor representation of a player’s value to the team. Rather, a slew of additional statistics have been identified that show much better correlation to a team’s wins and losses. That’s where Trout comes in. While he trailed Cabrera in the Triple Crown stats, Trout dominated Cabrera in most of the other statistics (the kind that instantly roll the data adversary’s eyes) that truly predict a player’s contributions to team wins and losses.

Outside of Angels and Tigers fans, the support for Cabrera vs. Trout broke down fairly cleanly between traditionalists (data adversaries) vs. “stats geeks” (data scientists), respectively. Cabrera won easily. A good representative of the data adversaries in this case came from the plume and inkwell of Mitch Albom of the Detroit Free Press, Miguel Cabrera’s award a win for fans, defeat for stats geeks. A sampling of his hands-over-ears-screaming:

Which, by the way, speaks to a larger issue about baseball. It is simply being saturated with situational statistics. What other sport keeps coming up with new categories to watch the same game? A box score now reads like an annual report. And this WAR statistic — which measures the number of wins a player gives his team versus a replacement player of minor league/bench talent (honestly, who comes up with this stuff?) — is another way of declaring, “Nerds win!”

We need to slow down the shoveling of raw data into the “what can we come up with next?” machine. It is actually creating a divide between those who like to watch the game of baseball and those who want to reduce it to binary code.

Apparently Mitch never bothered to ask any of these “nerds” if they like to watch the game of baseball in addition to paying attention to the statistics. If he had he would have heard them all say that they love both.

This little battle about awards to men playing a boys’ game is truly instructional to all of us analysts/data scientists. Data adversaries like Mitch Albom are everywhere, of course. Every organization has them, and they aren’t dumb. They can make good points, and they are in positions of power. They’re persuasive, and they know the power of a story. Being the world’s best number cruncher who can predict outcomes with the highest percentage accuracy does not alone make one an effective data scientist. Particularly in the real world where situations are organic and outcomes are gray.

The best data scientist remembers that predictive analysis is only as good as the decisions made and actions taken based on its findings. Decisions are made by humans. Humans—unlike numbers and algorithms—are political, irrational, and are persuaded by stories much more than by numbers. There is usually common ground between the data scientist and the data adversary, and it is in both of their interest—and the interest of the signer of both of their paychecks—to find it.

Incidentally, Miguel Cabrera was a gracious winner with much better perspective than many who voted for him, including Albom when he quoted Cabrera at the end of his article:

“I think they can use both,” Cabrera said when asked about computer stats versus old-time performance. “In the end, it’s gonna be the same. You gotta play baseball.”


LiveLogic Makes 2010 Inc. 5000 List

By | Published: September 9, 2010

NEW YORK, August 24, 2010– Inc. magazine ranked LiveLogic number 2184 on its fourth annual Inc. 5000, an exclusive ranking of the nation’s fastest-growing private companies. The list represents the most comprehensive look at the most important segment of the economy—America’s independent-minded entrepreneurs. Music website Pandora, convenience store chain 7-Eleven, Brooklyn Brewery, and Radio Flyer, maker of the iconic children’s red wagon, are among the prominent brands featured on this year’s list.

LiveLogic makes Inc. 5000 list two years in a row

“The leaders of the companies on this year’s Inc. 5000 have figured out how to grow their businesses during the longest recession since the Great Depression,” said Inc. president Bob LaPointe. “The 2010 Inc. 5000 showcases a particularly hardy group of entrepreneurs.”

LiveLogic Has Delivered Advanced Analytics For Over a Decade

LiveLogic has been building dashboards and custom analytical applications for companies like EDS, Reddy Ice, HP, LSG Sky Chefs, Greyhound, Sally Beauty, and the Dallas Mavericks for the past 15 years.

LiveLogic dashboards are delivered either through the cloud (Software as a Service, or SaaS), spreading the cost of hardware, software, IT headcount, and development across all subscribers, or more traditionally, within a corporations data center, leveraging existing infrastructure and maintenance skills.

This is proven technology that makes a dramatic difference. Whether in the cloud or in a data center, LiveLogic dashboards make it easy to see the overall trends affecting a business with full interactivity, allowing drill-down and slicing-and-dicing to get to the details behind the trends.

“Companies need to know how they are performing, need to know what their customers and suppliers are doing, and need the information fast enough to be able to act decisively. Our dashboards tame the data beast, putting companies in charge of the massive amounts of data they are collecting,” said LiveLogic President Jon Crowell.

From the official Inc. press release:

The 2010 Inc. 5000, unveiled today on, serves as a unique illustration of the profound changes taking place in the U.S. economy.

Despite the fact that most of this year’s measuring period of 2006-2009 took place during the latest recession, aggregate revenue among the companies on the list actually increased to $321.6 billion, up more than 50 percent from last year. The effects of the recession are seen, however, in the median three-year growth rate, which dropped to 96 percent from last year’s 126 percent. This year’s Inc. 5000 employ a record 1.4 million people, up from one million on last year’s list. With unemployment remaining stubbornly high, policymakers and business leaders will do well to look to the Inc. 5000 companies for fresh ideas on achieving growth and creating jobs.

Complete results of the Inc. 5000, including company profiles and an interactive database that can be sorted by industry, region, and other criteria, can be found on the 2010 Inc. 5000 List. (LiveLogic’s Inc. 5000 Profile)

Inc. 5000 Methodology

The 2010 Inc. 500|5000 is ranked according to percentage revenue growth when comparing 2006 to 2009. To qualify, companies must have been founded and generating revenue by June 30, 2006. Additionally, they had to be based in the United States, privately held, for profit, and independent—not subsidiaries or divisions of other companies—as of December 31, 2009. (Since then, a number of companies on the list have gone public or been acquired.) The minimum revenue required for 2006 is $80,000; the minimum for 2009 is $2 million. As always, Inc. reserves the right to decline applicants for subjective reasons. The top 10 percent of companies on the list constitute the Inc. 500, now in its 29th year.


The 4-Hour Customer Retention Workweek

By | Published: August 17, 2010

Timothy Ferriss, in his runaway bestseller The 4-Hour Workweek, describes how he transformed himself from being a miserable, unhealthy, workaholic, into a globetrotting, cage-fighting, tango-dancing, best-selling author. While most of us will never get to work 4 hours or less each week, we can learn a lot from some of the steps Ferriss took, specifically with regards to how we manage our customers.

Ferriss is the founder of BrainQuicken, who according to their website is the “leading developer and distributor of bioactive and pharmaceutical-grade neural acceleration products.” Before his short-week transformation, Ferriss worked around the clock, waking up at all hours to call on overseas customers, sacrificing his health and personal life in an effort to make his business succeed, until one day he realized he was miserable and depressed.

Being abused by your customers isn’t worth it

Begging bad customers to stay is a
bad strategy that many companies follow.

Once he reached rock bottom, he decided he had to change how he did business. Success simply wasn’t worth the price he was paying. He started by analyzing where his business was actually coming from, and realized that over 90% of his orders came from about 5% of his customers, and that the customers in the top 5% required very little maintenance. He was killing himself by trying to milk every last dime out of the bottom-feeders and complainers, customers that weren’t adding much to the bottom line. He quit calling on them, and most of them continued to order, but he didn’t care if they left. They weren’t contributing enough to count anyway.

Of the top 5%, a few of the customers were extremely squeaky wheels. Ferriss describes having “taken their browbeating, insults, time-consuming arguments, and tirades as a cost of doing business.” He let these customers know that if they wanted to fax in their orders, he would be happy to fill them, but that he would no longer tolerate any abuse. About half of these customers left, but the other half played by his new rules. He describes immediately feeling 10 times happier with minimal revenue loss.

Ferriss had about 120 wholesale distributors that he was dealing with, so he was able to analyze his data without sophisticated tools. The steps he followed:

  1. He organized his customers’ data
  2. He segmented his customers
  3. He decided how to treat each segment
  4. He acted on his decisions

Customer retention for customer files with millions of customers

If you have hundreds of thousands, or millions of customers, you can follow the same basic steps he took, but you will in fact need the help of data warehousing, analytical tools, and predictive models. Even so, the steps themselves are simple, and very similar to the steps Ferriss followed:

  1. Organize your customers’ data
  2. You will need to identify the different sources for customer data, including marketing, sales, and support databases (frequently several of each). If you have different divisions or have recently acquired another company, you’ll need to merge and deduplicate the records in these databases.

    Trying to run reports against your operational systems won’t work. You’ll bog down the systems that are needed to run your business, and you will be frustrated and confused by data that is designed for computers and not for humans.

  3. Segment your customers
  4. Once you have the data moved into an analytical application, you’ll need group them into meaningful segments. RFM Analysis is a great way to do this, and involves assigning a score to each customer for how recently (R) they have purchased, how frequently (F) they purchase, and how much they spend, or monetary value (M) of their transactions. Learn more about RFM Analysis for customer segmentation. Note that the segment a particular customer is in will change over time — this isn’t a one-time exercise.

  5. Decide how to treat each segment
  6. Most companies treat all of their customers the same, meaning they spend way too much time, energy, and money on bad customers, and not nearly enough on their good customers. If you have properly segmented your customers, you can do much better.

  7. Identify churn-ready plum customers
  8. Predictive analytics, when applied to your customer segments, can help you understand which of your best customers are going to leave before they leave. This is critical. If you reach out to these customers before they leave, you have a much better chance of addressing whatever issues they have. If you reach them after they have already gone, they will probably have a new supplier, and are probably too irritated with you to go back anyway.

  9. Act
  10. Execute the strategies you created in steps 3 and 4, communicating appropriately to each segment. Make sure that you A/B Test as you go, trying different approaches on subsets of each segment, identiying which of your actions has the greatest positive effect, and continuously refining your approach.

  11. Rinse and Repeat

The steps described here are not a one-shot fix. They are a process, and as each cycle of interactions takes place with your customer, if you continously measure what is happening, customer satisfaction, and consequently your profits and employee morale, will continously improve.

I’ve got 2 bonus steps for you, that aren’t customer retention steps, but that are low-hanging fruit if you are following the 6 steps above:

  1. Acquire more customers that look like your best customers
  2. Once you have properly segmented your customer base, profile the kinds of customers that make up your best segments, and target similar prospects for your customer acquisition efforts.

  3. Migrate unprofitable customers into your top segments

By applying the scores you’ve assigned to your customers historically, you can see what your best customers looked like years or months before they became your best customers. Find customers that you have today that match what your best customers used to look like, and cultivate these into your better segments.


There Are Members of the Flat Earth Society on Your Management Team

By | Published: August 10, 2010

Do you want to join Amazon and Netflix, or would you rather align yourself with Borders Books and Blockbuster?

Clive Thompson, in his Wired Magazine article in May, describes what happened after the massive snowstorm hit Washington D.C. this past February. Unbelievers pounced on the opportunity to ridicule the idea of Global Warming. “How can Global Warming be real when there’s so much snow? Hearing that question — repeatedly — this past February drove Joseph Romm nuts.”

“Joseph Romm — a physicist and climate expert with the Center for American Progress — [explained that] Climate change is all about trend lines. You don’t observe it by looking out the window but by analyzing decades’ worth of data.

Whether or not you believe that Global Warming is a problem, I’m sure you agree with Mr. Thompson that looking out the window is not an accurate way of assessing whether it is or not. And yet many companies run their businesses by figuratively looking out the window. After all, it’s the way they have always done business, and it has worked so far.

Does a blizzard in Dallas mean Global Warming isn’t happening?

Global Warming, Flat Earth, and Sweaty Underwear

The Global Warming example is a controversial one, with many people on either side of the argument, so let me go back a little further in time to an idea that used to be universally accepted and now is rejected by everyone, with the possible exception of a few members of the Flat Earth Society.

Spontaneous Generation was a theory that suggested living things could be spontaneously created from non-living objects. Frogs were spontaneously created by the mud on the banks of the Nile. Sewage created rats. Meat hanging in butcher shops created flies.

There were even recipes that would produce certain creatures. Mice were spontaneously created by combining sweaty underwear and husks of wheat in a jar. Over the next 21 days, the sweat from the underwear combined with the husks, and voila! Mice appeared.

How to Make Bees

My favorite recipe was probably the one for bees. You could kill a bull, bury it upright with its horns sticking out of the ground, and in a month a swarm of bees would emerge from the carcass of the bull.

It’s fun to look at those examples and laugh at how silly they seem, yet they were believed wholeheartedly for hundreds of years. Spontaneous Generation wasn’t definitively refuted until a Louis Pasteur experiment in 1859.

Don’t use mud to make frogs. Use your data to make decisions.

Companies that are run based on gut feel and intuition are just like the people who think that sticking a basil leaf between two bricks in the sunlight will create scorpions. Back when nobody was able to harness their massive amounts of data, intuition was the best you could do. Now that we have modern, scientific ways of collecting your data, of slicing and dicing it, you are able to see what is happening now within the context of millions of data points gathered over the past several years. You simply cannot survive if you insist on doing business the way you always have. Just because it has worked for the past 30 years doesn’t mean it will continue to work. Why? Because your competitors are adopting modern, scientific approaches to decision-making, and they will destroy you if you don’t follow suit.


Talk to different customers differently

By | Published: July 2, 2010

Don Peppers of Peppers & Rogers spoke during an awards session at the Gartner Customer 360 Conference last week, and one of the things he talked about was the concept of speaking to different customers differently.

Use RFM Analysis to talk to your customers intelligently

This man swears loudly in front of children and takes advantage of your free wi-fi

Doesn’t Everybody Do This?

At face value, it sounds like a pretty silly comment. If you owned an ice cream shop, you would be very friendly to the polite, well-behaved, ice-cream loving, 7-kid family, but you would probably be a little less pleasant to the loud, inconsiderate man that swears into his cell phone at full volume, regardless of who is in the shop, and then buys a 50-cent soft drink so that he can use your free Wi-Fi all afternoon.

The problem is that you don’t own an ice cream shop, or if you do, you own hundreds or thousands of them, and the only way you can tell your customers apart is by looking at how they behave towards you through their interactions. These interactions might include what you sell them, how you market to them, and how they interact with you when they call a support line to complain about something.

Your best bet at knowing the appropriate way to interact with your customers is by combining these interactions with any other information you can gather about them, including where they live, their age, their birthday, as well as any other factors that may influence what you should be saying to them.

This first step of getting all the data together is a big one, and if you have a data warehouse, you may be well on your way to having some or all of it.

Once you know all there is that you can know about your customers, you need to segment them, allowing you to talk to them in the right way at the right time. You probably won’t have a swearing, wifi hogging segment, but you will be able to identify undesirables and desirables if you go about things in the right way.

Behavioral Segmentation

RFM Analysis is a powerful tool that can help. RFM stands for Recency, Frequency, and Monetary value. It is sometimes referred to as RFI, where the “I” stands for intensity, which is the term Ralph Kimball prefers. (Ralph Kimball is responsible for designing and evangelizing a special database structure that allows fast, understandable access to vast amounts of data.)


You assign each of your customers a score, typically from 1 to 5, based on how recently they have purchased from you. A score of 1 indicates somebody who has never purchased, and a 5 indicates somebody who purchased very recently.


You do the same thing here, assigning a score based on how often a given customer buys. If they buy every day, or week, or year, depending on your business, they will be a 5. If they purchase once or twice in an unpredictable way, they might be a 2.

Monetary Value

Finally, give each customer a score based on how much they spend.

This RFM segmentation, or customer behavioral segmentation, gives you 125 segments. Here are a few of the incredibly valuable things you can do with these scores:

  1. Customer Retention:
  2. Identify which of your “plum” customers are at-risk of leaving. You can feed the segments that contain your best customers into predictive models that can use patterns of good customers that have already left to identify others who are exhibiting similar behavior while there is still time to save them.

  3. Customer Acquisition:
  4. Look back in time at your good customers and identify what they used to look like. Go after prospects that fit the mold.

  5. Customer Migration:
  6. Using a similar approach to step 2, look back in time at what your best customers looked like when they weren’t all that great. Find customers who fit that mold and identify what works best to move them quickly into “plum” status.

  7. Customer Pruning:
  8. You don’t want that swearing, Wi-Fi hog. He’s taking advantage of your excellent service, damaging employee morale, while costing you money. If you know who he is, you can change the rules so that he either becomes a good customer or leaves.

All of these are examples of how you can talk differently to different customers. There are many other examples, but in all cases you have to know who they are and what makes them different before you can determine what you are going to say to each of them.