33 Bits of Entropy

Article about Netflix paper in law journal

David Molnar pointed me to an article in the Shidler Journal of Law that prominently cites the Netflix dataset de-anonymization paper. I’m very happy to see this; when we wrote our paper, we were hoping to see the legal community analyze the implications of our work for privacy laws. As the article notes:

Re-identification of anonymized data with individual consumers may expose companies to increased liability. If data is re-identified, this may be due to the failure of companies to take reasonable precautions to protect consumer data. In addition, companies may violate their own privacy policies by releasing anonymous information to third parties that can be easily re-identified with individual users.

New lines will need to be drawn defining what is acceptable data-release policy, and in a way that takes into account the actual re-identification risk instead of relying on syntactic crutches such as removing “personally identifiable” information. Perhaps there will need to be a constant process of evaluating and responding to continuing improvements in re-identification algorithms.

Perhaps the ability of third parties to discover information about an individual’s movie rankings is not too disturbing, as movie rankings are not generally considered to be sensitive information. But because these same techniques can lead to the re-identification of data, far greater privacy concerns are implicated.

Indeed, since we wrote our paper, there have been several high profile cases in the news or in the courts where our re-identification techniques can be used to cause much more sensitive privacy breaches, including the Google-Viacom lawsuit involving Youtube viewer logs and the targeted advertising companies Phorm and Nebuad. While the lessons of our paper have begun to propagate “downstream” to the realms of law, advocacy and policy, it has come too late to make a difference in the above examples.

Part of the reason why I started this blog is in the hope of accelerating this process by reaching out to people outside the computer science community. While our papers might be couched in technical language, the results of our research are general enough to be easily accessible to a broad audience, and I hope that this blog will become a central point for disseminating information more broadly.

September 30, 2008 at 9:46 pm Leave a comment

What this blog is about

From the About page:

This is a blog about my research on privacy and anonymity. The title refers to the fact that there are only 6.6 billion people in the world, so you only need 33 bits (more precisely, 32.6 bits) of information about a person to determine who they are.

This fact has two related consequences. First, a lot of traditional thinking about anonymous data relied on the fact that you can hide in a crowd that’s too big to search through. That notion completely breaks down given today’s computing power: as long as the bad guy has enough information about his target, he can simply examine every possible entry in the database and select the best match.

The second consequence is that 33 bits is not really a lot. If your hometown has 100,000 people, then knowing your hometown gives me 16 bits of entropy about you, and only 17 bits remain. But the real danger is that information about a person’s behavior, which was traditionally not considered personally identifying, can be used to cause serious privacy breaches in a variety of different contexts.

This blog will announce, explain and elaborate on my research as it relates to the above theme. I will also use it as an outlet for my opinions on the broader technical, policy, business and social issues related to my work.

Serious content coming soon. In the meanwhile, grab the RSS feed.

September 29, 2008 at 11:41 pm 1 comment

33 Bits of Entropy

Article about Netflix paper in law journal

What this blog is about

About 33bits.org

Me, elsewhere

Email Subscription