

Data Mining Briefly Explained 119
handy_vandal writes "Time.com has published an interesting article on data mining." Note the prominent sticker ;)
"Most of us, when all is said and done, like what we like and make up reasons for it afterwards." -- Soren F. Petersen
Uhhh... (Score:2)
Doesn't he mean "snicker"?
No.. (Score:1, Informative)
Re:No.. (Score:1)
Re:No.. (Score:1)
Tony Soprano in the mines (Score:1)
I missed the episode with T in the server room.
The Real Key is People.... (Score:4, Insightful)
Chao
Open Source DateMining! (Score:4, Interesting)
Power to the people!
Planet P Blog [planetp.cc] - Liberty with Technology.
The Beast (Score:3, Interesting)
Re:The Beast (Score:1, Flamebait)
you'd be amazed... (Score:4, Funny)
Of course, the local supermarket cannot accept my personal check for groceries without their "discount card", never mind that it was *their* database admins who lost my account after a few weeks...
(er, yeah right, and my driver's licence and birth certificate aren't worth as much as their card ??)
Ggrrrrrrr......
Re:you'd be amazed... (Score:1)
Yeah, but you're a slashdot reader so you probably have a man-bust. I know I do (!@#$%! New Year's Resolution)...
Re:you'd be amazed... (Score:2)
Then I take it your wife is getting the emails to increase the size of her penis!
thank you, I'm here all week!
Prominent sticker (Score:1, Funny)
Here's an example of the Microsoft Tax at work. This company most likely paid for Windows licenses on those machines even though they aren't using Windows.
Data Mining Briefly Explained (Score:4, Funny)
1. Collect data
2. Do some mining
3. ???
4. Profit!
The data gnomes are stealing my data! (Score:2)
*sigh* (Score:3, Funny)
Imagine a beowulf cluster of these things!....mining...data... yeah.
In Soviet Russia, data mines YOU!
It's official, Data Mining is DEAD. You don't have to be Kreskin to figure it out.
Hey! I just found this site all about data mining here [goatse.cx]!!!!!
Come on, really, is this News for Nerds or Stuff That Matters?
You could probably use data mining to determine how many hot grits Natalie Portman actually eats.
Alright. That should do it. Carry on with the discussion.
Re:Data Mining Briefly Explained (Score:2)
The classic data mining result is diapers and beer. People who buy beer at convenience stores are also likely to buy diapers. Great. Given that bit of intelligence, do we:
Re:Data Mining Briefly Explained (Score:2)
Re:Data Mining Briefly Explained (Score:2)
3. Sell derivative information to people who want it, i.e. the people you *DON'T* want to have it.
This includes, as others said, life insurance companies teaming up with grocery stores to find out what you eat, thus raising rates for people who eat "bad" stuff.
Or phone spam companies buying info from phone companies-- Consumer A contacts consumer B, and A bought our stuff, therefore you should call B.
Or, perhaps radio stations selling the numbers of people who request songs to the Wherehouse, so the Wherehouse can call you and say that you can buy the cd.
Or, maybe the police decide to track where you go by reading license plates off of each of the cameras that they have up to detect speeders or light runners.
Just some thoughts. This isn't a joke-- They know exactly how to get money from mining-- It depends on what data you have to who you can sell it to. Noone buys data for no reason-- And the only two reasons to buy data is to target for selling other stuff, or to "find people who don't want to be found"-- Whether it be to find terrorists, criminals, or theoretically people that make x hundred thousand/million a year, so that they can rob you.
Of course, most of this stuff happens every day, and noone realizes.
Re:Data Mining Briefly Explained (Score:1)
In fact, we have a licenseplate-reading system like this in
Video cameras record your license plate when you pass a portal, then record it again when you pass the next portal, say after 1 km. The images are stored and processed electronically.
Your average speed is calculated and you're fined if you were speeding.
Some argue that this system is fairer than using speedtrap cameras that record only 'an incident', not 'your general behaviour'.
Others argue that "traject-controle" as the system is called here is a clear invasion of privacy (since they necessarily need to keep a record of your license plate during the 1km you're driving).
The same system can be used to check for people without valid insurance, who 'forgot' the mandatory APK car checkup or those who neglected to pay their road taxes.
The possibilities are endless... In other words, where willl this end?
Well.. so? (Score:5, Interesting)
Technology being put to use to seek out enemies of the state for the world governments is nothing new.
Atleast it is a good thing that companies are making good money in the process. Your privacy? That was lost long ago.
It was only a matter of time before this happened. Atleast be glad that we've not yet reached the stage where they'd bother having your entire genome sequence to create solutions and replacements for you
Perhaps the author of the article has just read Cryptonomicon or something.
Get over it, companies will track you, governments will monitor it. And there will be people who will beat both, and people who will be susceptible to both. Unfortunate, but hey, paranoia does not help either.
And oh, first post?
Re:Well.. so? (Score:3)
Oh, the irony.
They call themselves patriotic, and yet they're supplying the very means that are slowly turning the U.S. into a police state. Sorry, but I seriously doubt that this is what the U.S. founders had in mind, and it's certainly not the reason that U.S. war veterans both risked and sacrificed their lives. Patriots aren't sheep that blindly follow the government, they are the ones who fight to maintain the fundamental (constitutional) precepts upon which the United States were built.
Reminds me of... (Score:5, Interesting)
If the false positives of this system finding terrorists are as low as the ones that identify spam, is it really unreasonable to consider that probable cause for an investigation? At least, until the 0.000001% slips by and causes a lawsuit for wrongful arrest.
Re:Reminds me of... (Score:2, Interesting)
With a terrorist classification filter, the penalty for a false positive could cost some innocent person days/weeks in prison and thousands of dollars in lost wages and legal fees. And thats assuming they are a US citizen. A non-citizen could be held indefinitely complely destroying any career they might have.
Re:Reminds me of... (Score:3, Interesting)
The issue isn't whether or not we should use data mining to profile individuals or groups. Profilling will occur no matter what. What these methods do are help find parameters that more accurately identify candidates rather than just assume all Middle Easterners are automatically guilty until proven otherwise.
to what end? (Score:2, Interesting)
profiteering? (Score:5, Interesting)
While I want argue against the governments inability to do anything but waste money, I do think that these "anti-terrorism" dealies are going too far. We know that they are spending $53 billion on information technology. When they spend it on a hammer or a toilet seat I know that something is getting done, but "information technology" makes me suspicious.
Granted my opinion is largely a result of window flags selling in excess of twenty dollars and not hearing the results of such spending. In fact, I haven't heard of a single terrorist act averted since 9/11. It couldn't hurt to inform us when the spending pays off; could it?
Is this information actually getting results, or is it just profiteering of the corporations that we so love to slander and libel?
Re:profiteering? (Score:2, Informative)
With the current sensationalized state of mass media, would one hear of a terrorist act if it was avoided?
don't forget NIH (Score:3, Interesting)
The cops had profiled a white male Christian terrorist, and that's all they were looking for. You didn't catch the article, but the real perps were stopped **10** times at roadblocks, they were in custody that many times.
And they were let go, their skin color contradicted what the data mining told them. They weren't caught until a Maryland state trooper leaked the license plate, then a trucker at a rest stop made the collar.
Data mining won't solve the stupidity of leaders like Chief Moose.
I don't believe (Score:2)
mod parent [slashdot.org] up
Re:I don't believe (Score:2)
No, of course not. Because we all know that the most wars have not been fought in the name of opposing religious beliefs...
Sauron commands you (Score:3, Funny)
Re:Sauron commands you (Score:2)
Damn man, I don't need Sauron or anyone to command me to do that...
Re:don't forget NIH (Score:2)
Plots that have been averted... (Score:5, Insightful)
Frankly the problem is attacks such as the Twin Towers are always going to stick in your mind more than a brief news report that Abu Zubaydah was captured. Also there is always more skepticism that capturing some guy actually averted a plot -- see Jose Padilla. We will never know whether he would have actually done something. There will always be second guessing on whether a plot was really averted.
Re:profiteering? (Score:2, Insightful)
It couldn't hurt to inform us when the spending pays off; could it?
But would you believe it if your government told you "23 terrorist plots foiled this month"? They probably couldn't be more specific than that, and without any details or corroboration, who's to say. I'm all for openness and accountability, but if it's unlikely that one would get these here (there are better areas for this, like public health care), then I can do without monthly statistics that one would have to take on faith.
In Soviet Russia official statistics were made up all the time, and dismissed just as often or more.
Wow! What an eye-opener! (Score:1)
Not that it helps (Score:2)
data is useless by itself unless it can be used appropriately.
sort of like the list on conservative site NewsMax that finds that the vast majority of truly corrupt politicians in the past year were democrats [newsmax.com]. What a coincidence!
What are the odds of finding out more things like this, like at the office of Total information Awareness? Or the Transport Security Agencies list of people who cannot fly [interventionmag.com]
Print Link (Score:4, Informative)
Print Link [time.com]
Re:Print Link (Score:2)
Already used in mineral exploration (Score:4, Informative)
In related news: Seeking Sperm, Not Sex, Online [xnewswire.com]
Before You Jeer... (Score:3, Informative)
You may want to read this book [aaai.org] and see it yourself whether data mining would make a breakthrough in the future.
Re:Before You Jeer... (Score:2, Interesting)
It is perhaps not the most simple book around, but it covers a lot of important issues. Furthermore it doesn't ignore the role of computer science, as two of the authors have a CS background.
You won't find explicit instructions about how to build your own Google, but it surely does wonders for your insight.
Data mining for consumers? (Score:1, Interesting)
Couldn't the consumer use such information to get a better deal? Also of course there's the "abuse" aspects for the businesses, amd governments that use this.
Good excuse for porn! (Score:1)
A major hotel chain discovered that guests who opted for X-rated flicks spent more money and were less likely to make demands on the hotel staff, according to privacy consultant Larry Ponemon. These low-maintenance customers were rewarded with special frequent-traveler promotions.
Cool. Next time I go on a trip I can order some in room porn and justify it because I'll get better deals in the future!
*shaking head* (Score:1)
Well, I suppose it's nice to know that the handbasket we're going to hell in is at least free.
Makes me think of Bowling For Columbine (Score:2, Interesting)
In the movie 'Bowling For Columbine' Michael Moore speculates that one of the root causes of gun violence in the US is the type of fearmongering the US media engages in in an effort to keep their sales/ratings up.
It looks like Time.com's gratuitous exploitation of US fears of crime and terrorism might be an example of this.
Re:Makes me think of Bowling For Columbine (Score:2)
Sure, fear sells lots of stuff. MRE's, guns, ammo, radiation pills (iodine), bomb shelters etc.... The thing that people should realize with data mining software though is that its application to terrorism and consumer tracking is new but the technology is not. In fact, people have been using it in remote sensing to prospect for gold and oil among other things from space, it has been used since the late 70's to interpret satellite images for the CIA and NRO, it has been used for psychological research etc...etc...etc... and I use a form of it for retinal research. What should not happen with the fear mongering is that the technology be given a bad name from those who want to abuse the technology. Like many technologies, data mining is a tool that can be mis-used, but its application can also do tremendous good.
Data Mining as used by Colombian Drug Cartels ... (Score:4, Interesting)
http://www.business2.com/articles/mag/0,1640,41
KnowledgeMiner 5.0 software for Mac OS 9. (Score:2, Informative)
http://www.knowledgeminer.net/
I've thought about using this software to analyze stocks to purchase, but never got around to looking at the information required for the software to give me an edge in the market. Looks promising though.
obligatory dilbert strip* (Score:1)
Dogbert Consults
My data mining software has found another message from God.
Panel Two
It says you've been stealing lunches from the refrigerator in the break room.
Panel Three
Then it says "Ha, Ha that wasn't pudding!"
btw, that was January 3rd on the Dilbert Callender this year..
Objection to the numbers (Score:4, Informative)
huh??? (Score:1)
PR (Score:1, Informative)
Let our Data Mining Products make your life Better!
To save everyone time and annoying popups, consider visiting the sites of some of the products mentioned. These pages are every bit as insightful and critical as the article:
http://www.autonomy.com/
http://www.currentana
http://www.srdnet.com/
http://www.dig
http://www.unisys.co.uk/public-uk/j
Posting anonymously to dodge accusations of karma whoring.
Data mining companies (Score:2, Interesting)
As for datamining itself: more power to them. The government gaining the ability to mine the data it already have should mean that we don't need more organizations, more intrusive investigations, etc. Every report or credible news item about post-9/11 studies indicates that we already had enough information, so there should be no need to create new laws that allow for more information to be collected. Just use what you have already, kthx.
What would be nice is if this data-mining allowed Muslims living in the U.S. to stop having to wrry whenever they go outside. Look at the information publicly available, that may provide patterns of "nonobvious" connections, and let people live thier lives in peace, regardless of background.
As a consumer, everything I do in public I consider public information. If a business uses this to better serve me, all the better. Maybe this will mean I don't have to watch feminine ads on TV, or the phone gets answered faster when I call. Maybe it just means that the customer rep knows my name and what I bought already.
Re:WHAT?!? (Score:2)
KFG
Digging For Autism Correlations (Score:2, Interesting)
So, I decided to mine almost 200 by-State demographic variables for correlates to autism by running through every combination of 2 variables via multiplication or division under a polynomial, exponential or null transformation -- then sorted them by their correlation to autism in the year 2000 [clanarchy.com].
This is a case where what was "mined" was not just the raw data but various arithmetic combinations of statistical variables derived from the data. There needs to be some additional work to make the figure of merit, not just correlation but statistical significance. I couldn't find Perl modules that provide "alpha" (probability the null hypothesis is true) for correlations.
Uber Loyalty Card in the UK (Nectar) (Score:5, Insightful)
Nectar has been set-up by Sainsbury's (a supermarket), Barclays (a financial services company) and BP (a petrol filling station company).
I didn't mind Sainsbury's knowing that I eat junk, but now that they're telling Barclays what junk I eat I end up with Barclays putting my life insurance premiums up.
Interesting stuff.
Re:Uber Loyalty Card in the UK (Nectar) (Score:1)
Fayyad (Score:1)
In the last page, this Fayyad of digiMine claims that he doesn't want to work with the govt because the 'Bush administration' hasn't clearly enough articulated its vision of what it wants.
I hope he was misquoted. There may be some legit reasons not to work with the US Govt. on anti-terrorism technology, but Mr. Fayadd is being either overly dismissive or just immune to opportunity by saying what he's quoted as saying. It sure is nice when the client comes to you with a fully articulated vision for the solution he needs, but most just start out with stated or even just perceived needs and leave it to the, ahem, vendors to provide the solution/vision.
On another note, it would be interesting to read an article with some technical detail beyond a generic reference to XML. Maybe someone can post a link.
In a nutshell... (Score:1)
Data Mining is the wrong term (Score:2, Interesting)
If you want valuable information and you know what you're looking for, you just query. Find X in pile of data. That's mining. I know it's a semantic comment, but mining's not what we're talking about doing here.
Data mining is more like what geneticists searching for a genetic cause for a cancer are doing. Finding usable correlations and meaningful precursors. We don't call cancer-fighting biologists 'gene miners'. I think the term mining belittles a more complicated activity.
A better term? Data Correlating? Mining also just sounds brutish.
Re:Data Mining is the wrong term (Score:2)
If I have a cube of date, I can find things outside of how the data is orginazed.
Data mining is not finding X in data, its finding X in data when X isn't nessarily an hard value.
The problem with automatic identification (Score:2, Insightful)
Re:The problem with automatic identification (Score:2, Interesting)
I'm not sure the goal is to have the miner spit out names of confirmable terrorists with that kind of accuracy. You're comment is fair if you're looking for that kind of entirely automated solution, but that's not the goal. It doesn't need to be 100% accurate in order to mitigate risk and pay for itself. Neither does the J Crew web site product predictor.
The goal is definitely to help single out people that are worth further investigation. By motivated, thinking, observant humans. That's all.
I also think you might be a little bit reductionist in your estimate of 100 terrorists. It's quite possible that there are many more, though I suppose it doesn't matter because even if you're looking for just one person, it's still worth doing.
Given that you're looking for a reasonably good filter to find qualifiers for a round of investigation, a better metric to use might be the number of people you're willing to investigate as a ratio against those you hope to positively I.D. You might argue that you'd be happy to investigate 5,000 people just to find one 'terrorist'. If so, and you're looking for an estimated 100 terrorists, you can multiply to get the number of 'persons of interest' of 500,000 or .19% of the USA population. This % is much more achievable, and besides, then you use a different algo to ID which of these you should interview first or do MORE research on first.
It seems pretty managable to me. I also think your assessment of the 50% false negative rate is too rosy. It seems to me that the risks would be serious enough of even 1 getting away (as in scanning baggage for instance) that you'd want to cast the widest net possible and then narrow those carefully. False negatives may be more costly than you are suggesting.
an advertisement for privatization of security? (Score:1)
I also take issue with the statement
a customer whose IT ineptitude is matched only by its means
which is clearly a jab at the hard working professionals of the US government and an effort to push privatization of IT functions. I have work with IT professionals in Academic, Industrial, Commercial, and Government settings. I will tell you that IT professionals in all these setting range from incompetent to brilliant. The difference is that, until recently, US employees have not had to live with the fear of random layoffs or arbitrary insurance reductions. I often wonder why it is unpatriotic to insult policemen, firemen, or military officers, but when it comes to the professionals that allow these people to work, no insult is severe enough.
Re:an advertisement for privatization of security? (Score:2)
Nice story but (Score:1)
If everyone's going to go out and be paranoid, might as well know what we're being paranoid about.
How it works (Score:1)
The trick is to reduce the vast amount of data to something that can be scanned at a glance.
Typically produce a list of relevant items (eg by grabbing the doc ids based on keywords from the source data), sorting by most relevant (the scoring system). So if three keywords match in a single doc, score it high. If those three keywords appear in another doc, score both high and set the both flag. The sorted list from high score to low is then scanned. Experience soon tells you if your scoring system is working. The list you now have (electronically hopefully), has links to the original docs, the anlayst then clicks and reads. If relevant - act. If not, go to next item.
White Paper How to Catch a Thief (Score:1)
Define "Data mining" (Score:1)
The most concerete evidence of success that is presented is that Victoria's Secret realized it sold tons of size X bras in New York and 10x as many white as black items in Miami. Um, I really hope they didn't have to hire a firm to tell them that. Don't they have spreadsheets? Does anyone look around the store and notice what sells?
Which moves me on to another point. Companies seem to have very little faith in their employees and ask very little of them these days. (Gets out his pipe and rocking chair.) I remember when my sister got her first job at an ice skating rink. They sold ice skating outfits to (mostly) Mothers of young girls taking private ice skating lessons. My sister could tell you at a glance what outfits would sell first. (As I recall it was the most garish ones - she used to specifically ask for "ugly" or "anything that it looks designed by the color blind").
Now a days, when I have to ask for help finding something in a store and I suggest a different location for it (real life example: Why don't you stock the phone connectors with your phones?) I get blank stares and comments along the lines of "ya, like my manager would listen".
yep, me.. (Score:3, Funny)
those 1's and 0' can be tricky..
That's not data mining! (Score:2)
That's not exactly a task for data miners - it's just bad communication! They could have done exactly the same thing just by making sure the directors were paying attention...there seems to be a big market for telling people the perfectly obvious.
FOLDOC (Score:1)
First of all, read what is data mining in the FOLDOC (Free On-Line Dictionary Of Computing) [ic.ac.uk], if you don't know.
Re:Scoring 4 points (Score:1)