Differential Privacy
A few years back companies like Facebook were advancing the notion that we lived in a post privacy age, but they seem to have back-pedaled a bit since then. And then the GDPR came along and it poses some difficult challenges for companies doing business in the European Union, but that’s a subject for another blog. Anyway, it still remains true that Privacy is an important consideration when data is released as downloads or reports, whether to protect the commercial interests of the data provider or the identities of individuals represented in the data set. After many exploits, it has become clear that simple anonymization of the identifiers in the data still leaves the data open to what are called linkage attacks — identification of individuals by connecting to other data sets. One promising approach is provided by differential privacy. The TWiML & AI podcast features a conversation with Aaron Roth which provides an accessible introduction to the subject.
It remains challenging to analyse the privacy guarantees of noised data exports. The trick is to find a balance which conceals identities effectively and yet allows for useful analysis of the data. Claro Data Science is engaged in making that a reality for some of our clients.