Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Australia Stats Technology Your Rights Online

Bank Puts a Billion Transaction Records Behind Analytics Site 86

schliz writes "Australia's UBank has put a billion real-world transaction records behind a website that allows users to compare their spending habits with others of the same gender, in the same age/income range, neighborhood and living situation. The 'PeopleLikeU' tool surfaces favorite shops and restaurants surprisingly accurately — because it's based on real customers' transactions, it lists places like good takeout joints that wouldn't normally come to mind when you think of a favorite place to eat. The bank says all data was 'deidentified' and it consulted with privacy authorities."
This discussion has been archived. No new comments can be posted.

Bank Puts a Billion Transaction Records Behind Analytics Site

Comments Filter:
  • Privacy (Score:3, Insightful)

    by girlintraining ( 1395911 ) on Thursday November 08, 2012 @08:14PM (#41926563)

    The problem with 'anonymizing' the data is that while today it might seem safe, tomorrow a separate database showing a different subset of the same data source, or trace information, etc., which when combined can re-pair and de-anonymize it.

  • by Beryllium Sphere(tm) ( 193358 ) on Thursday November 08, 2012 @08:15PM (#41926579) Journal

    Especially in small samples, like the size of a neighborhood.

  • de-identified (Score:5, Insightful)

    by whois ( 27479 ) on Thursday November 08, 2012 @08:28PM (#41926665) Homepage

    Remember when it was discovered that the plugins you have installed in your browser, and which browser you were using could almost identify who you were? That's how I felt as I answered questions on the site and saw the number of matches dwindle. I'm not even an AU resident, I just answered truthfully up until it asked for the city and it had narrowed down to ~20000 matches for "people like me."

    If you assume that one of those 20000 is me, and that I live in a small town then the number might get even closer to just 1. And once you factor in any other data that might correlate behind the scenes it's not hard to figure out who's who.

    Remember the anonymous netflix data that they figured out how to de-anonymize? Same deal. If you're an AU resident, the data is there to uniquely identify you, they just have made a bet with the internet that people won't be able to do so.

  • Re:Privacy (Score:5, Insightful)

    by fatphil ( 181876 ) on Thursday November 08, 2012 @09:50PM (#41927417) Homepage
    Not necessarily true - assuming they anonymised *correctly*. I believe Helger Lipmaa (University of Tartu, he of the world's fastest software AES implementation) has at least one paper on anonymising large data sets. Basically, you randomise the data - perturb every datum by a delta from a symmetric, and not too wild, distribution. On average, the law of large numbers tells you that the mean perturbation taken over the whole set will be 0, and the standard deviation caused by your noise will decrease proportional to the square root of the sample set size (and the 2nd and higher moments will be modelable as a normal distribution). So if you're averaging over 10000 gay democrat-voting degree-educated males, the anonymised data you pass on will be rarely much more than 1% (i.e. sqrt(1/10000) ) from the real value. Average just over "humans", and the error could be so small it's below the noise floor. The process is, if you do it correctly, irreversible, as the true data isn't even in the system, so can't be extracted no matter how many different queries you perform.

"May your future be limited only by your dreams." -- Christa McAuliffe

Working...