Hell is other people. Social networks and the impossibility of informational self-determination

In this highly thought-provoking talk, Ulrik Brandes, Professor of Social Networks at the ETH Department of Humanities, Social and Political Sciences discussed how data modelling and network-based inference could make it difficult for individuals to protect their privacy, even if they choose to disclose as little personal data as possible.

by Najmeh Karimian-Marnani & Florian Dorner

Privacy has received increasing attention over the course of the 20th century, as states and companies have become able to collect and process greater amounts of data. While this data could improve policies or better understand demand and consumer attitudes, it also has nefarious uses, such as monitoring individual citizens or targeted propaganda. In the context of these tensions, the German Federal Constitutional Court coined the term of Informational self-determination, which refers to each individual's authority to determine disclosure and use of their own personal data, in a 1983 ruling related to censuses. More recently, the advent of the internet and social media has made it easier than ever before to collect and distribute data, which has led the European Union to establish more stringent General Data Protection (GDPR) laws.

GDPR explicitly addresses the processing of information, which includes data modelling and inferential data predictions. Data modelling is often based on empirically observed individual, social and cultural regularities in the available data, which are extrapolated to form data points for data that would otherwise be hard or expensive to obtain. For example, voters from rural areas are more likely to vote Republican in US elections, and support for a recent Swiss referendum on keeping housing affordable was stronger in areas with high housing costs. In this way, predictions about individual political preferences based on demographic data can easily be made.

A particularly salient example of data modelling is the scandal around Cambridge Analytica. In 2018, it was revealed that the British firm used data collected from a seemingly academic mobile application to access users' Facebook profiles and even that of their connections. The data was used to predict a user's personality for more effective microtargeting. This included political messaging linked to the 2016 presidential campaigns of Donald Trump and Ted Cruz (in the US) and the 2016 Brexit election. The scandal received world-wide media attention, further exacerbated by the fact that the app had access to the profiles of the Facebook 'friends' of its users; so, even non-users unknowingly had their data shared.

The Cambridge Analytica scandal already hints at how inferences about people can be made without their consent in the context of social networks. Surprisingly, this often does not require information that individuals are unwilling to share; personal data about their connections can already provide a lot of information due to commonly observed regularities in network structures.

To explain what is meant by regularities in the data, Professor Brandes introduced the concept of homophily: those who have strong connections in real or virtual social networks are likely to share many characteristics and behaviours. This is due to two complementary processes named social selection and social influence. Social selection describes an individual's propensity to pick social partners based on their attributes - often picking those who are similar to them to connect with. On the other hand, our individual attributes and opinions are influenced by our social connections, often making us more similar to the people we are closely connected to overtime (social influence).

While the 'strength' of the closeness of social connections is usually not directly specified in social network data, it can often be inferred using methods based on the work of German sociologist Georg Simmel: The combination of social selection and influence also makes us more likely to share a greater number of contacts with people we have a closer connection to, such that strong ties between persons usually correspond to a dense network around them. Thus, the more connections two people share, the more we can infer their individual characteristics based on the other's information.

Digital self-defence is a group effort. Now, what if the information is not disclosed by Facebook or other social media sites to third parties? Well, these sites still have access to it themselves and could theoretically misuse it. If, however, you decided to quit social media to hide your information completely, the same social regularities unfortunately still hold: from past contact lists, a plethora of valuable data can still be extrapolated to infer friendship ties, and thus an individual's characteristics, even in the absence of explicit information.

Let's consider what we have learnt - the individual cannot achieve complete protection of personal data on their own; it often requires broader societal awareness and the individual's ability to dispute the use of inferred data. Each of us needs to take responsibility for our own privacy and that of our network to minimise the effect of data manipulation through these aforementioned regularities. While the best mechanisms for protecting our collective privacy in this increasingly digital world are debatable, one thing is clear: digital self-defence is a group effort.

To get a broadened sense of the ISTP and our topics of interest and past seminars visit our Colloquia page.