Before I dive in to this final post on Internet security and privacy, I’d like to point you to a U.S. government website I discovered only this week, OnGuard Online, that contains a lot of useful information about Internet safety. If you’re interested, you might investigate and bookmark it for later exploration.
I want to conclude this series by talking a bit about anonymity and something known as “reidentification.”
Promises of anonymity can be misleading and are anything but absolute guarantees. In a 2000 study, Latanya Sweeney determined that a voter list could be correlated with medical records at a rate of 87 percent, using only three pieces of demographic data: sex, ZIP code and birth date. This enabled anyone with some technical skills to link the “anonymized” medical data to a particular name. The term for this linking is reidentification.
The Electronic Privacy Information Center (EPIC) defines reidentification as
…the process by which anonymized personal data is matched with its true owner. In order to protect the privacy interests of consumers, personal identifiers, such as name and social security number, are often removed from databases containing sensitive information. This anonymized, or de-identified, data safeguards the privacy of consumers while still making useful information available to marketers or datamining companies. Recently, however, computer scientists have revealed that this “anonymized” data can easily be re-identified, such that the sensitive information may be linked back to an individual. The re-identification process implicates privacy rights, because organizations will say that privacy obligations do not apply to information that is anonymized, but if the data is in fact personally identifiable, then privacy obligations should apply.
At Tech.Pinions, Steve Wildstrom writes,
For the past several years, a highly technical but very important debate has raged among privacy experts: How easy is it to identify an individual from a collection of data that supposedly lacks personally identifiable information?
A centerpiece of the debate is a 1997 incident in which Latanya Sweeney, then an MIT graduate student and now a computer scientist at Harvard, identified the medical records of Massachusetts Governor William Weld from information publicly available in a state insurance database. The incident led to important changes in privacy rules for medical information, especially under the Health Insurance Portability and Accessibility Act (HIPAA), and 15 years later it is still influencing the debate over data privacy.
By default, browser and mobile software don’t protect against the collection of data. Only a small fraction of Internet users install simple but powerful browser add-ons such as DoNotTrackMe or Ghostery to prevent tracking via cookies on personal computers. Even those can’t prevent the many other forms of tracking, and mobile devices don’t allow their installation in any case.
There is no regulatory infrastructure set up to monitor collection, aggregation and trading of consumer information. Privacy laws are no guarantee of anonymity. For example, despite HIPAA, it isn’t too difficult to determine a lot about an individual’s health and medical history just by looking at his or her routine purchases and activities. If the amount is large enough, collected and aggregated non-confidential information can violate privacy every bit as much as disclosure of confidential information does. Resistance to aggregation of our information has been mostly temporary — and mostly focused on a particular instance du jour that makes headlines.
Back in 2007, Facebook launched Beacon, which allowed them to put an invisible “bug” on websites of its more than 40 “partners” (among them Sony Pictures, eBay, Epicurious, the New York Times, and Travelocity) that allowed Facebook to see everything its users did on the partner sites, and associate that activity with their Facebook accounts, whether or not they were logged in. When someone purchased an item from Overstock.com, for example, that purchase would appear on the person’s Facebook wall, and in the News Feed of that person’s friends. Facebook users were opted-in to Beacon without being asked, and had to manually turn it off. After an outcry from Facebook users, Beacon was shut down in October, 2009, and Facebook subsequently settled a class-action lawsuit in 2012 for $9.5M that alleged Beacon breached federal wiretap and video-rental privacy laws.
But Facebook didn’t abandon Beacon’s goals. Using “like” buttons, requirements for registration to comment at online publications with your Facebook ID, and installing third-party cookies, Facebook still can monitor lots of your online activities that Beacon was supposed to capture. And we consumers still mostly aren’t aware of this monitoring.
On Facebook, things are more available by default than people may think. But even beyond specifically public settings, actions and photos that were once lost in the “sands of Timeline” are now more easily discoverable by strangers with loose ties, forcing us to reassess what we actually think is private and what is not.
There are many more examples, but I think you have the idea, so I won’t belabor it. Reidentification and collection of our personal information happens every time we go online. I urge you to be careful online, to install tracking blockers, and to adjust your Facebook privacy settings and then review them often. A good guide is here. Websites like the Electronic Frontier Foundation provide a wealth of information on staying safe online.
As always, please feel free to discuss this, or any other topic, in the comments. It’s Friday Free for All!
Photo by Kevin Dooley, licensed under the Creative Commons Attribution 2.0 Generic license.