Data breaches and stolen credit cards are a couple of the ways people experience identity theft, and keeping personal information private is a key part of securing them. However, our personal information isn't the only thing that can identify us, according to a study by researchers from MIT recently published in the journal Science, which found that our private information may not be all that private, and just having a few vague datapoints about you may be enough to identify who you are.
The researchers looked at three months of credit card transactions made by 1.1 million users and found that having just the dates and locations of just four credit card purchases was enough to identify 90 percent of the people in the data set. When the researchers included rough information about purchase prices, the identification hit rate increased to 94 percent. Even if someone had a receipt, a picture of you enjoying a purchase and a social media post about a different purchase, they'd likely be able to figure out your credit card record out of a million others, even if they didn't have any of your personal information, like name, address or credit card number.
Yves-Alexandre de Montjoye, an MIT graduate student in media arts and sciences is the first author on this study, and an earlier study that examined cell phone records and found similar results. "If we show it with a couple of data sets, then it's more likely to be true in general," said de Montjoye in a statement. "Honestly, I could imagine reasons why credit-card metadata would differ or would be equivalent to mobility data."
Three other researchers joined in on the credit card study. In their analysis, they looked at when purchases were made, the name and location of the store where the item was purchased and how much it cost. Any purchase made with the same credit card was given the same identification number, which represented each customer.
The research team even stripped down the variables to dates and types of stores the credit card holder went to. They discovered that if they could tie an identification number to a couple of purchases, and no one else went to those same two stores on the same days, suddenly the team was able to identify all of the user's other purchases. With 60 percent of purchases in the US being made with a credit card, it can quickly become significantly easy to figure out a lot about a person.
The team also studied different factors in the types of credit card users they could reidentify through their metadata. They had an easier time targeting women, as well as higher income earners. Identifying people with different income levels essentially was a waterfall effect--higher income earners were easier to pick out than mid-level income earners. Mid-level income earners were subsequently easier to suss out than low-level income earners.
To credit card users, these results mean that credit card users could be easily targeted for, as de Montjoye calls them, "correlation attacks," which can happen if you take the purchase metadata and combine it with outside information, such as posts from social media sites.
John Bohannon, a contributing correspondent for Science, wrote in the journal about a correlation attack in New York, when the city's Taxi and Limousine Commission released data containing times, routes and cab fares for 173 million rides. Zealous celebrity hounds were able to take that data and match it to pictures of celebrities taking taxis, and they were able to figure out when a celebrity paid a fare.
These studies sound alarms on the potential effect of large amounts of data. While data can be useful for research purposes, it can also potentially expose a person's privacy to anyone by putting together the pieces of purchasing history to pinpoint who they are.