How to test statistical models

Posted by Steven on January 29, 2017

Among the signs carried by the millions of people offering advice to the incoming American administration last week, I saw a photo of a child who had paraphrased a classic:

What do we want?
Evidence Based Decision Making

When do we want it?
After Peer Review

I’ve also just finished reading Cathy O’Neil‘s excellent and timely book, Weapons of Math Destruction, which is a survey of the harms that big data brings to society. Of course, big data is just a tool, and like any tool it can be used for good or evil. O’Neil describes the combination of opacity, destructiveness, and scale as the hallmarks of a bad model, or in her words a “weapon of math destruction”. O’Neil references The Financial Modeler’s Manifesto in her concluding recommendations.

The Modelers’ Hippocratic Oath
~ I will remember that I didn’t make the world, and it doesn’t satisfy my equations.
~ Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.
~ I will never sacrifice reality for elegance without explaining why I have done so.
~ Nor will I give the people who use my model false comfort about its accuracy.
Instead, I will make explicit its assumptions and oversights.
~ I understand that my work may have enormous effects on society and the economy,
many of them beyond my comprehension

Testing a statistical model is either immensely simple or almost impossible, depending on what you mean by testing. The models result from tools that test themselves: the input data is split into training and evaluation data sets, and the winning model is picked from a set of computer generated candidates all with little or even no input from the data scientist. That’s not the testing process that we need to improve.

I’m a data professional and people often tell me that their model is too complicated for me to understand. Some modelers advise against, then insist, then complain when someone tries to examine their work. Which is why so many organisations have key person dependencies on their modelers. They can’t run their operations after the key person leaves because they let that person hoard knowledge and gave them free reign. Who makes the decisions in your organisation? The executives, or the wonk who has a key algorithm named after them because all that anyone else knows about it is who made it? I’m not making that up, nor is it an isolated example in my experience. O’Neil repeats example after example of models that are destructive because they are opaque. If you’re using measurement to shine a light on your operation, why would you wield your measurement tool in a way that limits your ability to understand and improve it? What process is stopping your models from going rogue and becoming the puppet masters who make decisions your executives wouldn’t agree with if they understood them?

The sort of testing that we need to expand is peer review. Do not make decisions based on recommendations of models that are the work of a lone individual. Do not tolerate people who are not willing to explain their work. Ensure that you have a peer review process for a model’s inputs, its fitness function, its choice of algorithm, and every other aspect of its operation. For if you allow garbage in, you will certainly join the legions of people who get garbage out.

Posted in: Book Review, Theory

Tags: big data, peer review, Testing

Comments

Be the first to comment.

How to test statistical models

Related

Comments

Leave a Reply Cancel reply

Categories

Feeds