One way of thinking about the Internet is as a giant
matching machine. You have a question, it finds you an answer. You want a
flight, it finds you a good deal. You want a date, it can find that for you
too. Lots of them in fact.
But is this the whole story? Not exactly.
A fairly simple problem/solution scenario is how things
worked in the days of Web 1.0, a pre-data collection web that hadn’t yet
developed, let alone mastered, micro-targeting by such attributes as
demographics, psychographics, and location. And before you cry “surveillance!”
bear in mind that it is the advertising-supported, data slicing and dicing web
that brings so much to all of us each day in the form of news, entertainment,
and productivity tools. Not to mention that the systems that optimize marketing
done online also help filter out that which could be called ‘noise’; i.e. if I
don’t have kids I won’t get daycare ads on Facebook, if I don’t have a dog or a
cat I won’t get coupons for kibble popping up alongside YouTube videos I watch.
Is this all for the better? As with many things, it depends
how you look at it, and it depends who you ask. If you ask mathematician Cathy O’Neil, author of Weapons of Math Destruction,
the answer would be no.
At a recent talk held at Microsoft Research O’Neil began by describing what an algorithm is. “It’s
something we build to make a prediction about the future…and it assumes that
things that happened in the past will happen again in the future.” O’Neil explained that algorithms use things
such as decision trees, which contain if/then and yes/no statements and then
use historical information, pattern matching, and machine learning to build models that can make thousands to millions of
predictions, and can do so in a fraction of the time of a human being with a
calculator and a scratch pad.
So what’s not to like? The problem is, according
to O’Neil, that the agenda of the algorithm is decided by the builder of the
algorithm. What goes into the algorithm is, necessarily ‘curated’, and when
some variables are selected while others are left out, then a value system is
embedded in the algorithm.
These value systems in turn can affect decisions now made by
machines that used to be made by humans, such as hiring, credit worthiness,
professional evaluation, and insurance eligibility. Researchers, including O'Neil herself, have attempted to find out the rules that inhabit some of these algorithms, using Freedom of Information requests, but according to O'Neil many such requests have not been successful. Furthermore, many of the data-driven systems responsible for making millions of decisions are built on proprietary, or 'black box' software architectures, that are extremely difficult to reverse engineer.
But let's bring things back to how data interfaces with you in your daily life. If you’ve ever wondered, for example, why you often spend a half hour on hold when you call customer support and your friends say they get through right away the explanation may be more than “we’re experiencing larger than normal call volumes.” Maybe they are, but maybe, as O’Neil points out, it's something else. She cites the example of how it is a common practice for customer service lines to pre-determine if you’re a high value customer or a low value customer based on the purchase and credit information cross-referenced with your phone number. And, well, you can figure out who gets put through to a real live human operator and who has to listen to extended musical accompaniments of flutes and vibraphones.
But let's bring things back to how data interfaces with you in your daily life. If you’ve ever wondered, for example, why you often spend a half hour on hold when you call customer support and your friends say they get through right away the explanation may be more than “we’re experiencing larger than normal call volumes.” Maybe they are, but maybe, as O’Neil points out, it's something else. She cites the example of how it is a common practice for customer service lines to pre-determine if you’re a high value customer or a low value customer based on the purchase and credit information cross-referenced with your phone number. And, well, you can figure out who gets put through to a real live human operator and who has to listen to extended musical accompaniments of flutes and vibraphones.
O’Neil calls such processes “systematic filtering”, and is
concerned that machine learning, a key component of artificial intelligence --
which is said to be the next revolution in computing -- “automates the status
quo” and in turn creates “pernicious feedback loops” that not only trap people in the biases of the past but also magnify those biases as machine learning is itself based on recursive loops and neural networks.
This was not and is not the intention of any such systems of
course. The point of deploying data, at scale, is to build models at a speed
and complexity that far exceeds the capability of humans. As with any technological innovation, there exist unintended consequences, and the decisions made by data-driven systems are no different.
For an overview of Cathy O'Neil's book "Weapons of Math Destruction" click here.
This post also appears, in a slightly revised version, on the blog of the American Marketing Association, Toronto Chapter.
This post also appears, in a slightly revised version, on the blog of the American Marketing Association, Toronto Chapter.