March 12, 2013
A study shows that what you ‘like’ on Facebook can predict, with remarkable accuracy, everything from your race to your sexual orientation, political affiliation and personality type.
Researchers studied more than 58,000 people who had volunteered to participate in the “myPersonality” application on Facebook, in which subscribers allowed access to their list of ‘likes,’ as well as the results of online personality tests that the scientists asked the participants to take. The researchers wanted to see whether such information, which is publicly available on many Facebook pages, could predict a number of aspects about Facebook users’ lives that they presumably kept to themselves, such as sexual orientation, ethnic origin, political views, religion, personality traits, substance use (including cigarettes, alcohol and drugs), and intelligence level.
Feeding people’s “likes” into an algorithm, information hidden in the lists of favorites predicted whether someone was white or African American with 95% accuracy, whether they were a gay male with 88% accuracy, and even identified participants as a Democrat or Republican with 85% accuracy. The ‘likes’ list predicted gender with 93% accuracy and age could be reliably determined 75% of the time. The pattern of online liking predicted drug use with 65% accuracy and whether someone was likely to drink alcohol with 70% accuracy.
“The most important thing that we found is that you can predict a very wide variety of individual traits and preferences based on seemingly simple and generic types of records of online behavior like Facebook ‘likes,’” says Michal Kosinski, director of operations of the Psychometrics Centre at Cambridge University in England, a consultant for Microsoft on machine learning and the lead author of the study published in the Proceedings of the National Academy of Sciences.
Some predictors were obvious: gay people were more likely to “like” anti-homophobia campaigns and Democrats liked Obama. Others matched common stereotypes: for example, gay men tended to like “Wicked The Musical” and Mac cosmetics and smart people were fans of “Science.”
But other connections were more puzzling and may require deeper analysis to understand fully — liking “curly fries” or “thunderstorms,” for example, was strongly linked with high intelligence while being a fan of the make-up store Sephora, liking the “I love being a mom” page or the Harley Davidson brand were linked with low intelligence. Being a heterosexual male was oddly linked with liking “being confused after waking from naps.”
The links could be completely random or, alternatively, related to interactions that aren’t yet obvious, says Kosinski. But the benefit of such unbiased data crunching is that the associations may reveal relationships between preferences and behaviors that don’t always seem logical. “The thing about a computer is that it might be completely politically incorrect,” says Kosinski. “It wasn’t handpicked, it was done by an unsupervised computer and it does not have any stereotypes.” (He says that he himself doesn’t like curly fries but does like Harley Davidsons.) For example, the curly fries connection, he says, “might be a cultural issue. Maybe there was a joke online [that circulated among] some kind of community of people of higher intelligence for some reason,” he suggests.
The study also found that likes were as strongly connected with some personality attributes, which were validated by the personality tests. Openness to experience, which involves being excited by variety and newness and being intellectually curious, was predicted almost as well by the ‘likes’ as by directly measuring it with a psychological test supported by prior research.
As intriguing as the revelations are, however, the research also reveals troubling implications for privacy. In the study, Kosinski cited a situation in which retailers used such big data-based information and predicted which consumers were pregnant, and then provided them with discounts and coupons on baby-related products. But in one case, an expectant teen belonging to a culture in which pre-marital pregnancy isn’t accepted, and had not told her family, found the flood of incentives intrusive and an invasion of privacy.
“Our results [also] show that these predictions could be potentially very intrusive,” says Kosinki, adding that he supports measures to avoid such problems. “There are technical ways to make sure that individuals have full control over their data, and technology can be designed in such a way that data cannot be abused.”
On the other hand, however, when it comes to research projects, the trove of data about human behavior now available online could spur new advances, even suggesting new areas of research.
That seemingly odd link between being a straight man and post-nap confusion, for example, could possibly result from some biological distinction between gay and straight men that was previously unknown. “This is just a hypothesis,” says Kosinki, noting that it could just as easily be a random result. Without this kind of data, however, it’s unlikely that anyone would even have thought to investigate such a possibility.
For now, the study clearly suggests that people should be aware of how companies and entities with whom they share “like” information may ultimately use this data — and more needs to be done to protect consumers from some of the more unscrupulous applications of such information. But understanding all the hidden meanings behind our desires could also end up teaching us a lot about ourselves.