Data Miners Scraping Away Our Privacy 142
Presto Vivace writes "Twig, writing for Corrente, reports on data scrapers. They are not looking for passwords and such; scrapers are looking at blogs and forums searching for material relevant to their corporate clients. We are assured that the information is 'anonymized' to protect the identities of forum participants. However, a tool called PeekYou permits users to connect online names with real world identities. No worries, though — if you have a week to spare, you can opt-out of some of the larger data banks."
It's not privacy, it's obscurity (Score:5, Insightful)
If it's posted in a public space - it's not private ... and so on.
If it's accessible via public records - it's not private
If it occurs in a public forum - it's not private
If, for legal reasons, it must be disclosed in public - it's not private
If someone were to compile that set of information in an easy-to-read for, complete with a table of contents and nice index, that is also not invasion of privacy.
Using a computer to do the heavy lifting and reducing the time required to match everything together is also not invasion of privacy.
Listen, if you're talking about the privacy of your public information, and you're threatened by search engines, you are relying on security through obscurity. At least the people here on slashdot should recognize the follow of that.
"Opt out" (Score:5, Insightful)
Yes, I know, "Don't post data about yourself online!" That is not really the answer when most people think that Facebook is the way to be social. I do not have a Facebook profile, and I stay off of other social networking websites too; I am not going to pretend for a moment, though, that I am even close to representative of the norm. It is easy to make fun of all those "fools" out there who are undermining their own privacy, but in the end, that is not going to solve the problem, and eventually even people who want to have privacy will find that it is not possible to do so.
Re:It's not privacy, it's obscurity (Score:5, Insightful)
looks like the same crap as Pipl (Score:3, Insightful)
nothing more than what anyone can find about someone else online. one time a contractor ripped off my inlaws for $15000 and it took my wife and I 3-4 hours to find his home, phone number, the fact that everything was in his wife's name, etc. cost $40 or so.
Opt-IN should ALWAYS be the default (Score:4, Insightful)
What are the consequences to opting out? (Score:4, Insightful)
Banks, insurance companies, etc may end up using this kind of data to inform their risk management decisions. Eventually, that may mean that if they don't have this kind of data, you are risky by default. Look at what's happened with the credit bureaus. Technically they are opt out. But if you actually opt out, you put yourself at such a tremendous disadvantage that you can't really do it. You are forced to let these people have all sorts of detailed personal information, if you just want to live your life.
Perhaps we need some sort of data mining fifth amendment, where refusing to provide information cannot be used against you. But that's wishful thinking. In reality, people who just want to be left alone are probably going to be better off not opting out, as that would draw more attention than just blending into the crowd.
Re:Opt-IN should ALWAYS be the default (Score:4, Insightful)
Re:if you want it to be private (Score:4, Insightful)
if you don't want it to get in a database, DON'T PUT IT ON THE FREAKING INTERNET
If only it were that simple. How do you stop other people from posting it on some website somewhere, potentially without your knowledge? What about all those people who use Facebook and think that their privacy settings are equivalent to not posting information online, or that what the post on Facebook is only accessible to people they "friend?" Just saying, "Well you posted it online so it is your own fault," is not really an answer to the question.
(For the record, I do not use any social networking websites, I do not blog, and so forth. I still have to deal with everyone around me who does, though.)
Public exposure (Score:5, Insightful)
If I flash my privates in house but have the curtains open and so anyone from the street can see, I cannot complain about people looking and might indeed be arrested myself.
If I do the same in a house seperated from the road by a high fence and you put a ladder on the street and use nightvision goggles to look at my dangler, YOU are going to be arrested.
What is privacy? Is it the absolute letter of the law OR does EXPECTATION of privacy come into play?
You can follow me night and day. BUT that is very expensive and so you don't. So my actions in public are private simply because logging them would be far to costly. So I have come to expect that my actions in public are not constantly logged. Should this now change just because it has become possible to log them all? Should it be legal to record my every movement just because total CCTV surveilance has become feasable?
I do NOT know the answer to this question. On the one hand, I think that if you misbehave in public you should not have the right to complain "but I didn't expect anyone to catch me, so I should be free" BUT I also think that private companies being able to trace everyone constantly would be a REALLY bad idea.
If I ask on a forum about a health issue, should my insurance company be able to use this? I think not. Sure, if I am breaking the law, making false claims. But to deny people access because they think they might have a probem? No, that is going way to far.
Privacy is about more then things being recorded, it is about the idea that NOT everyone should constantly want to check up on everyone else. Just because I wrote a poem to a girl does NOT mean it has to be recorded by every private company in the world and be sold to the highest bidder.
Re:It's not privacy, it's obscurity (Score:5, Insightful)
What you're suggesting is that just because corporations now have the affordable tools necessary to spy on us constantly that we should deal with it and they should be allowed to do it. Which is complete bullshit.
The real answer is requiring companies to ask permission and bar them from trying to compel people to give them the permission. It's one thing to require a drug test and background check for a job, but it's quite another to include in that background check data scraping off the net.
Re:"Opt out" (Score:4, Insightful)
YES I personally hate opt out schemes, I dont mind that my public data is public, but I hate being signed up for all sorts of BS and then being told its my responsibility to go to a billion different "services" to tell them no
Re:It's not privacy, it's obscurity (Score:3, Insightful)
The problem is that people have not yet awoken to the idea that old notions of privacy no longer apply. Until the majority of people realize that the game has changed, there will not be any meaningful regulation (why would anyone vote for it, if they do not perceive a problem that needs to be solved), nor will people switch to systems with stronger privacy guarantees.
Re:if you want it to be private (Score:5, Insightful)
Re:Hope you don't have a common name (Score:3, Insightful)
If "Bob Smith" is a registered sex offender in a large urban area, another Bob Smith in the same area might have some difficulty getting hired for a job. Perhaps the scrapers might see some revenue in selling "whitelist" services.
Don't even go there. How long before someone has the bright idea of creating suspect names just to be able to charge for an opt out.
Re:Hope you don't have a common name (Score:4, Insightful)
If "Bob Smith" is a registered sex offender in a large urban area, another Bob Smith in the same area might have some difficulty getting hired for a job. Perhaps the scrapers might see some revenue in selling "whitelist" services.
You have that backwards. Hope you don't have an uncommon name. Almost no one has a unique name, but people tend to think any uncommon name is unique. Even worse, locally uncommon names that are common elsewhere.
Re:It's not privacy, it's obscurity (Score:2, Insightful)
Happened to a friend of mine (Score:3, Insightful)
He has a middle-of-the-road name - not exactly common, but not wildly inventive.
Just so happens that a man convicted of indecent assault against a minor has the same name and comes from the same county.
The worst thing to happen (so far) was that my friend's FB account was deleted, and he had to create a new one and fire a "WTF?" email at FB. It was all rather amusing and it didn't cause any lasting damage, but I haven't had the heart to take him to one side and say, "Dude, seriously, you were *lucky* that's all that happened..."
People are dumb, and computers are dumb, yet the two sets seem to trust each other far more than is warranted. *That's* where the problem lies.
Re:It's not privacy, it's obscurity (Score:4, Insightful)
The companies are already regulated. I regulate facebook by not using it. I don't twitter. The blogs I join that require an address have me listed as "Bill Clinton, 1600 Pennsylvania Avenue, Washington DC 20050" I don't do these things because I don't want the world knowing my business (not that the world would care).
If you choose to give them all your info and tell them all about what you like and where you go, then that is your business. But at what point do you start wondering "How do they pay their staff and keep the servers on?".
If you don't want them to do something nefarious with your info, don't give it to them. There is no need for some government entity to impose rules to protect you.
Dont opt out (Score:5, Insightful)
I have no doubt that 'opting out' causes the problem to get dramatically worse, as the companies use the additional details (you have to fax your drivers licens to the first one on the list) to increase the value of your portfolio and sell it off to a bunch of other databases while they are 'removing' you from their own. They probably don't even bother removing you from theirs, because honestly what consequences are they going to suffer?
I think someone misunderstands data scraping (Score:3, Insightful)
"They are not looking for passwords and such; scrapers are looking at blogs and forums searching for material relevant to their corporate clients."
Web scraping for passwords? Why would anyone have thought this in the first place? It's a bad comparison. If your passwords are already on a website to be scraped, your problem isn't data scrapers.
Re:if you want it to be private (Score:1, Insightful)
don't ask why but i'm reminded of the following:
during WW1 (or WW2 i forget) it was noticed that nearly any man could be trained to accurately shoot a rifle. yet when it came to battle people would miss. the boiled it down to about 90% of people just being innately and subconsciously built NOT to kill people. of the 10% that coule it roughly came down to half having no social awareness/connection what so ever and the other half being so socially aware/connected that they couldn't let their friends die ergo the had to kill...
to bring that back to you, yes, there are wolves, but i know which i would want on myside.
Re:It's not privacy, it's obscurity (Score:1, Insightful)
My only issue is that it is far too easy to masquerade as someone else and post false information about them, sign them up for NAMBLA or some hate group. If this data is to be used in ways that impact a person's life then it needs to be verified which would require the cooperation of ISPs at a minimum. Until such a time that the information can be verified it needs to be treated by all parties as gossip.
Post some chaff (Score:2, Insightful)
Surely the way round this is for those that feel strongly about thier privacy to post meaningless drivel that has no relationship to themselves or anyone else at regular intervals. The datascrapers will be unable to tell the difference between truth and reality and their business model will fail.
There's a use for Twitter after all!
Re:It's not privacy, it's obscurity (Score:3, Insightful)
Fast forward a few years and what happens when everyone with an internet connection has access to that data for free from Google Stalk or whatever it appears as on their labs page?
Information wants to be free works both ways...
Could make job interviews quite interesting when you've gStalked your interviewer, know what websites they all liked and all the past candidates and use this in your bargaining process.
Re:It's not privacy, it's obscurity (Score:1, Insightful)
Where's the DRM for individual identities? Data miners have proven that facts about individuals have value. Since privacy is gone for most people not living a life of seclusion, compensate them for it.
pollute the data stream (Score:5, Insightful)
There's only two ways to fight this - one is to push for data privacy laws, and the other is to pollute the data stream. When you're asked for a name, address, phone number or birthdate on a web site or form, lie. Just flat out lie. If you live on a town that borders another state (I'm originally from Kansas City, MO), say on forms you live on the other side of the border. Mixing states REALLY confuses data aggregators. The more information you get into the data stream that is fucked up, the harder it is to put it back together in an accurate way.
Make throwaway email addresses at gmail or wherever on a regular basis to use for all this, btw. And keep using DIFFERENT fake data, too, otherwise it will still be a consistent identity of sorts, and will probably eventually be tracked back to you. And don't ever put any real data in Facebook, etc., or put a link between your Facebook account and anything else. Social networking sites are by far the biggest leakers of personal data.
I have a mailbox at a local UPS store where I have everything sent.
Re:It's not privacy, it's obscurity (Score:4, Insightful)
The real answer is requiring companies to ask permission and bar them from trying to compel people to give them the permission. It's one thing to require a drug test and background check for a job, but it's quite another to include in that background check data scraping off the net.
It's called the rigth to informational self-determinism [wikimedia.org]
And BTW, pre-employment drug tests are bullshit 999 out of a thousand - they are the result of the intersection between moralists and insurance liability since actual continued testing to maintain employment is illegal except in the most limited of safety-critical situations - might as well test for STDs for all the good it does.
Re:It's not privacy, it's obscurity (Score:3, Insightful)
Only so long as your camera never photographs a law enforcement officer, theoretically a servant of the public, in the line of duty. Because then, you know, you might have evidence of them doing something untoward.
Re:It's not privacy, it's obscurity (Score:3, Insightful)
You can claim to have a right to privacy, but that does not mean that others are forbidden to watch you. You do not have a right to invisibility. At some point you have to accept that people you don't know will know things about you from watching you. Don't call up the privacy police to regulate an entity that uses voluntarily supplied information.
That's a complete side-step of the question. For one thing - it is no longer voluntarily supplied information when it is impossible to live a regular life without disclosing significant amounts of information. Nobody sane can think that will make for a healthy society, which is why I think you avoided answering that question.
Remember, this whole discussion was about businesses scraping blogs and social sites, NOT about the forced extraction of information. You're getting off to a different (but related) topic.
Another side-step. The point here is to take the belief that underlies your rationalization in one area and apply the same rationalization to another area and see if it still makes sense. I think you see that it does not make sense, which means the initial premise is flawed or at least overly simplified.