Whobbes.com/blog

Creative Scientist

Weapons of Maths Destruction

Very good read about the application of Data Science and the potential huge impact it has on our societies. Examples are easy to understand and the stories told interesting. A bit of repetition on the concepts and many different examples are also good to understand that the systems are present everywhere. The tone is kept light and it give motivation to study the field.

87%


Summary

Book by Cathy O’Neil, PhD from Berkeley in Mathematics, worked first as an academic and then for Investment bankers. She has a deep knowledge of statistics and Data Science and gives a review of the current use of algorithms in many domains.

  • Three principal characteristics of WMD (Weapon of Math Destruction)

    • Opacity → invisible model .This means no feedback e.g. recruiters use tool to scan automatically through CVs and use only 30% of applications in their research, the rest is discarded but the people do not know why their CV did not lead to a call: content, style or simply font that can’t be automatically scanned by machine?

    • Scale → thousands, possibly millions of people affected. This is especially true with the ubiquity of Internet.

    • Damage → potential for big impact in people’s life. Damage can happen as models use definition of success very loosely made, not based on detailed study. Often the starting point has dramatic implication on the outcome of said algorithm. The initial goal is set with a hunch and then data corroborate it as people try to fit the arbitrary model.

  • E-score are getting prevalent for many aspects of our lives: job, loan, insurance…

    • Regulation used to be able to reduce the freedom of what company could use as mandatory info to build their models.

    • But now we see this being reduced e.g. peer-to-peer lending startup now work very closely with banks. They can circumvent heavy regulations and create new loaning product based on any data they can get hold of.

    • Idea in China to build a credit rating model based on social media, banking info, and any other info in order to rate the worthiness of people. The data on friends of individuals and their behaviours will also be included. More on this, in an excellent article by Charlie Stross.

  • The main idea of models is to people acting in the same ways even if the correlation is purely a matter of luck. We are “placed in a thousand tribes”.

    • Corollary issue: Because of past actions characteristics of models carry connotations and create situations of self fulfilled promise e.g. zip code for loan checking.

“Big data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that is something only humans can provide. We have to explicitly embed better values into our algorithms.”

  • Conclusion on how models can be used for good to detect for example people who need assistance in society.
    • Models should ideally be open on input, output and scoring.
    • We are inverting the U.S. motto of “out of many one” and are customizing out life down to the individual, reducing empathy to technically zero!