, , ,

Facebook’s recent initial public offering of stock and the subsequent dramatic loss of the value of its shares (from an initial $38 in May to just over $21 in early August) has re-energized the discussion about the prospects of making money out of the compilation and processing of vast amounts of personal data.

A good overview of what is at stake is available in the July/August edition of Technology Review; see especially the feature article by Tom Simonite and the review article by Michael Wolff.

Facebook like Google relies on advertising for revenue but it lacks Google’s control of “the space where a buyer searches for a thing and where a seller hawks that thing” (Wolff, p.71). Besides, the growing reliance on mobile devices for using Facebook has further decreased its added value as an advertising platform.

What Facebook hopes—and along with it those who have bought its shares—is that it will instead be able to use the partly anonymized information it has on its users for developing special applications tailored to the needs of advertising agencies. Another possibility is that of capitalizing on the social psychology of friendships to nudge specific behaviors, like organ donor registration, albeit for money.

The ethics of both options are, at best, problematic, but before even addressing those, Facebook might stumble over much more mundane issues related to making sense out of social statistics. That, namely, is much more difficult than is generally thought.

At the very basic level what data miners are looking for are statistically significant correlations—between either socio-demographics and/or behavioral patterns. In this respect, the gigantic volume of Facebook users is considered a major plus. One big problem of traditional social surveys is their size, which poses barriers with respect to the identification and further exploration of two- and three-way interactions between variables after standard characteristics have been controlled for.

Overcoming the sample size problem, however, does not automatically lead to aha-like enlightenment. Actually the opposite is often the case. This is because once you have solved the sample size problem you will discover that basically everything correlates with everything else to a certain degree, and that the size of the correlation is, in itself, not very informative.

This situation resembles that of moving from a stage where one wears opaque glasses to that of pulling the shades and suddenly being blended by the sun. It is the challenge currently also faced by emerging S&T fields such as systems biology and genetics.

An important lesson in social statistics is that of formulating questions. Data, once you have it, is great. But what does it tell you? Well, nothing, unless you ask it something. In other words, there are millions of interconnections, yet the majority are meaningless, a sort of white cloud noise. Making out what is relevant and what is not, and doing so at the generic, hence, transferable level, is the difficult part of scientific analysis.