Australia's very own Centrelink has caught some heat of late.
The gist appears to be: Centrelink automated a piece of their fraud detection process, and the false positive rate is unacceptably high.
There's a deeply unfortunate human cost; claims that vulnerable people are being sent debt collection notices and Centrelink are behaving bureaucratically, causing financial stress that is crippling and desperate.
Today – when we talk about machine learning, data science, big data, artificial intelligence or even passé terms like data mining or expert systems – we think of astonishing possibility. You've probably had a conversation in which somebody (likely of influence) posits: Surely we can use our data to answer the question?
And they're likely right – we can use our data to answer the question*. That asterisk is the most critical part of the previous sentence. We hide all of the complexity of analytics in that asterisk and trust that the very confusing ocean of data we have, plus the remarkably clever people we hand it to, will be able to meditate on the problem and gift us a neat, complete and robust solution.
Such data wizardry is being woven into our collective thinking. We've begun to treat analysts as clergy, data as manna and algorithms as some kind of gospel, but written in prolix Latin to ensure it remains out of the common grasp. Online courses promise ascendance through Python; vendors assure us that their machine learning solutions are so simple and robust that a single sprint will transform the organisation into a data-driven utopia.
We've been at this for a long time – a still influential algorithm like k-means clustering is around fifty years old. And sure, we're pressing forward at a tremendous pace. But today, right now, at the start of 2017, we're still in the shallow end of the pool and building Weak AI. Weak AI that necessitates human involvement. Be that through an operator receiving algorithmic support, such as a medical diagnosis system, or the involvement of a human's judgement during setup, such as picking the k in k-means.
Machine learning is still entirely conjoined with humans. People are at the center of these creations and are the most critical part. The asterisk above – that represents people, with all their ambiguity and fallibility and creativity and bias.
When we construct a data set and seek to exploit it we have a responsibility to remember the asterisk. The data can give us an answer, but without human interpretation and caution, we end up doing more harm than good – asking a welfare recipient to pay down a debt that doesn't exist. If we aren't using data to advance outcomes for real people then what's the point?
All of us in the data-driven economy have an opportunity to take the asterisk and explain it to the influencer who sees only upside. This is not to downplay the power and reach of algorithms, but a reminder that the best "Data Scientist" isn't the one who knows the most R libraries or has a particular flair for Bayesian statistics – it is the person who can teach an organisation where the possibilities and the limits of their data lie, and how best to use that resource for the betterment of humans.
*Assuming the data is accurate and timely, that past behavior is indicative of future behavior, that the cost of false positives or negatives is manageable, that we have sufficient processing power and/or time to generate results before they're needed, that we possess the right set of hard skills in the organisation and that they have been granted sufficient access to the required data and that the results will be accepted without interference from well-meaning stakeholders who are sure it would be better to add a 5% buffer at the end.