Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Technology Your Rights Online

Data Mining Briefly Explained 119

handy_vandal writes "Time.com has published an interesting article on data mining." Note the prominent sticker ;)
This discussion has been archived. No new comments can be posted.

Data Mining Briefly Explained

Comments Filter:
  • Note the prominent sticker ;)

    Doesn't he mean "snicker"? ;)
    • No.. (Score:1, Informative)

      by Anonymous Coward
      There is a redhat sticker in the top-left corner of the picture.
    • This guy in the photo looks like Tony Soprano. Maybe the Mob uses RedHat Linux for data mining.

      I missed the episode with T in the server room.

      "Hey, Jackie, whatta these computahs for?"
  • by airrage ( 514164 ) on Saturday January 04, 2003 @04:52PM (#5015516) Homepage Journal
    I think every major corporation has some sort of data-mining, and I find that there is a gap between the data (even scrubbed) and the person who needs to make the decisions. Also, the article suggests, that CRM is a subset of data-mining. In reality, it's the other way around, or completely unrelated, or both, unless I read that sentence wrong.

    Chao
  • by inode_buddha ( 576844 ) on Saturday January 04, 2003 @04:56PM (#5015533) Journal
    at how powerful data mining tecniques can be. Why, just today I have recieved 3 more "Nigerian" mails, an offer to increase my bust size (I'm a guy), and an excellent credit report from 5 different, unheard-of companies...

    Of course, the local supermarket cannot accept my personal check for groceries without their "discount card", never mind that it was *their* database admins who lost my account after a few weeks...

    (er, yeah right, and my driver's licence and birth certificate aren't worth as much as their card ??)

    Ggrrrrrrr......
    • >an offer to increase my bust size (I'm a guy),
      Yeah, but you're a slashdot reader so you probably have a man-bust. I know I do (!@#$%! New Year's Resolution)...
    • " an offer to increase my bust size (I'm a guy)"
      Then I take it your wife is getting the emails to increase the size of her penis!

      thank you, I'm here all week!
  • by Anonymous Coward
    Yup, on a Dell from probably 1998-1999. Most of the other Dells in the photo look like they are of the same vintage.

    Here's an example of the Microsoft Tax at work. This company most likely paid for Windows licenses on those machines even though they aren't using Windows.
  • by hdparm ( 575302 ) on Saturday January 04, 2003 @04:58PM (#5015546) Homepage
    Briefly? This would be briefly:

    1. Collect data

    2. Do some mining

    3. ???

    4. Profit!

    • Why doesn't anyone else see them!?
    • *sigh* (Score:3, Funny)

      by Chester K ( 145560 )
      Ok let's get this out of our system now:


      Imagine a beowulf cluster of these things!....mining...data... yeah.

      In Soviet Russia, data mines YOU!

      It's official, Data Mining is DEAD. You don't have to be Kreskin to figure it out.

      Hey! I just found this site all about data mining here [goatse.cx]!!!!!

      Come on, really, is this News for Nerds or Stuff That Matters?

      You could probably use data mining to determine how many hot grits Natalie Portman actually eats.


      Alright. That should do it. Carry on with the discussion.
    • You have hit the nail on the head. The ??? is the problem. The link or leap between knowledge and action is the hard part. Data mining can 'identify' 'profitable' and 'unprofitable' customers, but it can't tell you if your expense and profit allocations are right or if you should want to 'get rid' of 'unprofitable' customers or should want to try to turn them into profitable customers.

      The classic data mining result is diapers and beer. People who buy beer at convenience stores are also likely to buy diapers. Great. Given that bit of intelligence, do we:

      1. Put diapers and beer in close proximity so that people who buy diapers can easily pick up beer and vice versa, or
      2. Put diapers and beer at opposite ends of the store so that people who buy both diapers and beer must travel through the store and have a chance to buy everything else?
      The data seldom tell you what to do. Taking the data too seriously leads to treating customers like numbers, predictable statistical entities to be manipulated for profit's sake. This is not healthy for most businesses. Most of the important things that the data tell you, you could learn better by simply listening to customers respectfully.
    • Actually, step three could be explicated as:
      3. Sell derivative information to people who want it, i.e. the people you *DON'T* want to have it.

      This includes, as others said, life insurance companies teaming up with grocery stores to find out what you eat, thus raising rates for people who eat "bad" stuff.

      Or phone spam companies buying info from phone companies-- Consumer A contacts consumer B, and A bought our stuff, therefore you should call B.

      Or, perhaps radio stations selling the numbers of people who request songs to the Wherehouse, so the Wherehouse can call you and say that you can buy the cd.

      Or, maybe the police decide to track where you go by reading license plates off of each of the cameras that they have up to detect speeders or light runners.

      Just some thoughts. This isn't a joke-- They know exactly how to get money from mining-- It depends on what data you have to who you can sell it to. Noone buys data for no reason-- And the only two reasons to buy data is to target for selling other stuff, or to "find people who don't want to be found"-- Whether it be to find terrorists, criminals, or theoretically people that make x hundred thousand/million a year, so that they can rob you.

      Of course, most of this stuff happens every day, and noone realizes. /ex.
      • Or, maybe the police decide to track where you go by reading license plates off of each of the cameras that they have up to detect speeders or light runners.

        In fact, we have a licenseplate-reading system like this in .nl

        Video cameras record your license plate when you pass a portal, then record it again when you pass the next portal, say after 1 km. The images are stored and processed electronically.

        Your average speed is calculated and you're fined if you were speeding.

        Some argue that this system is fairer than using speedtrap cameras that record only 'an incident', not 'your general behaviour'.

        Others argue that "traject-controle" as the system is called here is a clear invasion of privacy (since they necessarily need to keep a record of your license plate during the 1km you're driving).

        The same system can be used to check for people without valid insurance, who 'forgot' the mandatory APK car checkup or those who neglected to pay their road taxes.

        The possibilities are endless... In other words, where willl this end?
  • Well.. so? (Score:5, Interesting)

    by metlin ( 258108 ) on Saturday January 04, 2003 @04:58PM (#5015547) Journal
    Interesting article, but this is something that has been happening and will continue to.

    Technology being put to use to seek out enemies of the state for the world governments is nothing new.

    Atleast it is a good thing that companies are making good money in the process. Your privacy? That was lost long ago.

    It was only a matter of time before this happened. Atleast be glad that we've not yet reached the stage where they'd bother having your entire genome sequence to create solutions and replacements for you :-)

    Perhaps the author of the article has just read Cryptonomicon or something.

    Get over it, companies will track you, governments will monitor it. And there will be people who will beat both, and people who will be susceptible to both. Unfortunate, but hey, paranoia does not help either.

    And oh, first post?
    • Atleast it is a good thing that companies are making good money in the process. Your privacy? That was lost long ago.

      Oh, the irony.

      They call themselves patriotic, and yet they're supplying the very means that are slowly turning the U.S. into a police state. Sorry, but I seriously doubt that this is what the U.S. founders had in mind, and it's certainly not the reason that U.S. war veterans both risked and sacrificed their lives. Patriots aren't sheep that blindly follow the government, they are the ones who fight to maintain the fundamental (constitutional) precepts upon which the United States were built.
  • Reminds me of... (Score:5, Interesting)

    by gpinzone ( 531794 ) on Saturday January 04, 2003 @04:58PM (#5015548) Homepage Journal
    ...how the Bayesian spam filters operate (on a much smaller scale). They find predictors of "spam" like these guys find predictors of "terrorists."

    If the false positives of this system finding terrorists are as low as the ones that identify spam, is it really unreasonable to consider that probable cause for an investigation? At least, until the 0.000001% slips by and causes a lawsuit for wrongful arrest.
    • Re:Reminds me of... (Score:2, Interesting)

      by Anonymous Coward
      With a spam filter, the penalty for false positive is perhaps a lost sale or an annoyed friend/coworker.

      With a terrorist classification filter, the penalty for a false positive could cost some innocent person days/weeks in prison and thousands of dollars in lost wages and legal fees. And thats assuming they are a US citizen. A non-citizen could be held indefinitely complely destroying any career they might have.
      • Re:Reminds me of... (Score:3, Interesting)

        by gpinzone ( 531794 )
        Yes, but remember that the current methods aren't much better. I mean, right now there's lots of complaints about how the USA is racially profiling Middle Eastern men. Whether or not this profiling is justified could be based on a report of such a filter.

        The issue isn't whether or not we should use data mining to profile individuals or groups. Profilling will occur no matter what. What these methods do are help find parameters that more accurately identify candidates rather than just assume all Middle Easterners are automatically guilty until proven otherwise.
  • to what end? (Score:2, Interesting)

    the more i read about data mining, the more it seems to provide a conectinvity and interaction leap, a step we are really due, in a technological sense. when the internet was new and all (shortly after Al Gore invented it), there was much talk of how Big Brother would swoop in and turn us into ones and zeros, monitor our every move, and control us through the new portal. that hasn't happened yet (though Ashcroft is trying). doese it seem that data mining is more harmful (making us all into terrorsts for buying fireworks and seeing born on the fourth of july in the same day) than good (allowing better prediction of supply and demand to lower costs and raise productivity)?
  • profiteering? (Score:5, Interesting)

    by SHEENmaster ( 581283 ) <travis@u[ ]edu ['tk.' in gap]> on Saturday January 04, 2003 @04:59PM (#5015553) Homepage Journal
    Today, however, companies that excel in connecting the data dots are finding a lifeline in a customer whose IT ineptitude is matched only by its means: the U.S. government, which will spend $53 billion on information technology this year. The Federal Government's inability to share and analyze information became clear in the months after the 9/11 attacks.

    While I want argue against the governments inability to do anything but waste money, I do think that these "anti-terrorism" dealies are going too far. We know that they are spending $53 billion on information technology. When they spend it on a hammer or a toilet seat I know that something is getting done, but "information technology" makes me suspicious.

    Granted my opinion is largely a result of window flags selling in excess of twenty dollars and not hearing the results of such spending. In fact, I haven't heard of a single terrorist act averted since 9/11. It couldn't hurt to inform us when the spending pays off; could it?

    Is this information actually getting results, or is it just profiteering of the corporations that we so love to slander and libel?
    • Re:profiteering? (Score:2, Informative)

      by acidfast7 ( 551610 )
      In fact, I haven't heard of a single terrorist act averted since 9/11.

      With the current sensationalized state of mass media, would one hear of a terrorist act if it was avoided?

    • don't forget NIH (Score:3, Interesting)

      by Anonymous Coward
      At the end of the article, it mentions data mining helping to catch the DC snipers. Whoooooooa.

      The cops had profiled a white male Christian terrorist, and that's all they were looking for. You didn't catch the article, but the real perps were stopped **10** times at roadblocks, they were in custody that many times.

      And they were let go, their skin color contradicted what the data mining told them. They weren't caught until a Maryland state trooper leaked the license plate, then a trucker at a rest stop made the collar.

      Data mining won't solve the stupidity of leaders like Chief Moose.
      • that any true Christian could do this anymore than I believe a true Jew or a true Muslim could have done it.,

        mod parent [slashdot.org] up
      • Did you read (as opposed to glance over) the article? Data mining was *NOT* used during the DC sniper case, only after the fact:
        The system was set up in Montgomery County, Md., only a day before the arrests were made, so it did not play a role in solving the shootings. Working through the hundreds of thousands of leads that were entered into various police computer systems, however, Coplink noted that witnesses reported seeing John Muhammad's blue Chevrolet Caprice near two of the Washington-area shootings, and local police ran computer checks on his license plate at least three times during the killing spree.
        The profiling was done entirely by humans, with no computer assistance.
    • by MyNameIsFred ( 543994 ) on Saturday January 04, 2003 @06:15PM (#5015864)
      ...I haven't heard of a single terrorist act averted since 9/11...
      You haven't been paying much attention to the news have you. Let's see, we had the plot to attack ships in the Straits of Gilbrater that was averted, the possibly overblown Jose Padilla - Dirty Bomb case, and the capture of key operatives such as Abu Zubaydah, which surely put a dent in al-Qaida's plans.

      Frankly the problem is attacks such as the Twin Towers are always going to stick in your mind more than a brief news report that Abu Zubaydah was captured. Also there is always more skepticism that capturing some guy actually averted a plot -- see Jose Padilla. We will never know whether he would have actually done something. There will always be second guessing on whether a plot was really averted.

    • Re:profiteering? (Score:2, Insightful)

      by RDPIII ( 586736 )

      It couldn't hurt to inform us when the spending pays off; could it?

      But would you believe it if your government told you "23 terrorist plots foiled this month"? They probably couldn't be more specific than that, and without any details or corroboration, who's to say. I'm all for openness and accountability, but if it's unlikely that one would get these here (there are better areas for this, like public health care), then I can do without monthly statistics that one would have to take on faith.

      In Soviet Russia official statistics were made up all the time, and dismissed just as often or more.

  • And here I thought 'data miners' were seven really short geeks, holed up in a server closet with some hot chick that's hiding from her evil step-mother. Well, you learn something new every day! =)
  • Noting all of the ways certain monopolies have acted illegally has not helped in getting appropriate penalties for them in court.

    data is useless by itself unless it can be used appropriately.

    sort of like the list on conservative site NewsMax that finds that the vast majority of truly corrupt politicians in the past year were democrats [newsmax.com]. What a coincidence!

    What are the odds of finding out more things like this, like at the office of Total information Awareness? Or the Transport Security Agencies list of people who cannot fly [interventionmag.com]

  • Print Link (Score:4, Informative)

    by VargrX ( 104404 ) on Saturday January 04, 2003 @05:04PM (#5015572) Journal
    dunno 'bout any one else, but I don't care for all the ads...
    Print Link [time.com]

  • by core plexus ( 599119 ) on Saturday January 04, 2003 @05:06PM (#5015578) Homepage
    We've been using data mining in mineral exploration for quite some time now, and it really helps given the tremendous volums of data generated from modern geophysical, geochemical, and geological exploration.

    In related news: Seeking Sperm, Not Sex, Online [xnewswire.com]

  • Before You Jeer... (Score:3, Informative)

    by robbyjo ( 315601 ) on Saturday January 04, 2003 @05:06PM (#5015580) Homepage

    You may want to read this book [aaai.org] and see it yourself whether data mining would make a breakthrough in the future.

    • by arasinen ( 22038 )
      Another good book that explains the basics of data mining is Principles of Data Mining by Hand et al.

      It is perhaps not the most simple book around, but it covers a lot of important issues. Furthermore it doesn't ignore the role of computer science, as two of the authors have a CS background.

      You won't find explicit instructions about how to build your own Google, but it surely does wonders for your insight.
  • by Anonymous Coward
    "Throughout the '90s, data mining spread from one industry to the next, enabling companies to know more about customers' needs and to zero in on the characteristics that distinguish the customers they want from those they do not. A credit-card company using a system designed by Teradata, a division of NCR, found that customers who fill out applications in pencil rather than pen are more likely to default. A major hotel chain discovered that guests who opted for X-rated flicks spent more money and were less likely to make demands on the hotel staff, according to privacy consultant Larry Ponemon. These low-maintenance customers were rewarded with special frequent-traveler promotions. Victoria's Secret stopped uniformly stocking its stores once MicroStrategy showed that the chain sold 20 times as many size-32 bras in New York City as in other cities and that in Miami ivory was 10 times as popular as black. Aspect Communications, based in San Jose, Calif., sells a program that identifies callers by purchase history. The bigger the spender, the quicker the call gets picked up. So if you think your call is being answered in the order in which it was received, think again."

    Couldn't the consumer use such information to get a better deal? Also of course there's the "abuse" aspects for the businesses, amd governments that use this.
    • A major hotel chain discovered that guests who opted for X-rated flicks spent more money and were less likely to make demands on the hotel staff, according to privacy consultant Larry Ponemon. These low-maintenance customers were rewarded with special frequent-traveler promotions.

      Cool. Next time I go on a trip I can order some in room porn and justify it because I'll get better deals in the future!

  • After 9/11, many tech companies saw opportunities for both patriotism and profit. Oracle offered to donate the software to create a federal identity database.


    Well, I suppose it's nice to know that the handbasket we're going to hell in is at least free.
  • I couldn't help noticing the Time.com article made reference to crime and terrorism, particularly the September 11 WTC/Pentagon attacks (which happened over a year ago), and to the recent Washington Sniper killings (which ended months ago), in spite of the fact that this article would have been jst as fascinating if they had simply used the business examples as illustration.

    In the movie 'Bowling For Columbine' Michael Moore speculates that one of the root causes of gun violence in the US is the type of fearmongering the US media engages in in an effort to keep their sales/ratings up.

    It looks like Time.com's gratuitous exploitation of US fears of crime and terrorism might be an example of this.

    • I couldn't help noticing the Time.com article made reference to crime and terrorism, ....in spite of the fact that this article would have been jst as fascinating if they had simply used the business examples as illustration.

      Sure, fear sells lots of stuff. MRE's, guns, ammo, radiation pills (iodine), bomb shelters etc.... The thing that people should realize with data mining software though is that its application to terrorism and consumer tracking is new but the technology is not. In fact, people have been using it in remote sensing to prospect for gold and oil among other things from space, it has been used since the late 70's to interpret satellite images for the CIA and NRO, it has been used for psychological research etc...etc...etc... and I use a form of it for retinal research. What should not happen with the fear mongering is that the technology be given a bad name from those who want to abuse the technology. Like many technologies, data mining is a tool that can be mis-used, but its application can also do tremendous good.

  • by Anonymous Coward on Saturday January 04, 2003 @05:21PM (#5015651)
    Here is a real life story about data mining and its potential for brutal consequences. This was a very early application. Those who were fingered were killed. Of course, they adopted our new (lack of) due process rules a decade ago...

    http://www.business2.com/articles/mag/0,1640,412 06 ,00.html
  • can be located here:

    http://www.knowledgeminer.net/

    I've thought about using this software to analyze stocks to purchase, but never got around to looking at the information required for the software to give me an edge in the market. Looks promising though.
  • Panel One:
    Dogbert Consults
    My data mining software has found another message from God.
    Panel Two
    It says you've been stealing lunches from the refrigerator in the break room.
    Panel Three
    Then it says "Ha, Ha that wasn't pudding!"
    btw, that was January 3rd on the Dilbert Callender this year..
  • by rootmonkey ( 457887 ) on Saturday January 04, 2003 @05:31PM (#5015691)
    The article use NASDAQ as an example of having to process terabytes of data on a daily basis and the data mining software can help filter things out. The software may be useful but NASDAQ does not process terabytes per day of incoming data. I work in the market data industry and we take exchange feeds from around the world including NASDAQ and we don't process close to that much. OPRA (options) have the most data per day and that is only in the order of tens of GB range.
  • i don't get it. what's that red hat thingy mean??
  • PR (Score:1, Informative)

    by Anonymous Coward
    This article seemed to me more like a concatenation of a few press releases, especially the ones noting data mining successes, than "news." Then again, most news is simply rehashed PR (as a lecturer on NPR noted the other night).

    Let our Data Mining Products make your life Better!

    To save everyone time and annoying popups, consider visiting the sites of some of the products mentioned. These pages are every bit as insightful and critical as the article:

    http://www.autonomy.com/
    http://www.currentanal ysis.com/
    http://www.srdnet.com/
    http://www.digi mine.com/ (this didn't load for me, but I have Javascript disabled...)
    http://www.unisys.co.uk/public-uk/ju stice/police/d efault.asp?cn=pa

    Posting anonymously to dodge accusations of karma whoring.
  • by MrWa ( 144753 )
    So "Data-mining companies have been among the hardest hit in recent years" is claimed by Time.com, which goes on to use MicroStrategy as a prime example of a company that skyrocketed in value and plummeted in the "tech crash" later. Oh, and by the way, they also overstated earnings. What these articles about the "tech crash" need to do is normalize the comparisions, because these companies that balloned in value so much, then crashed, probably just experienced a slight correction due to the stupid values they attained to begin with!

    As for datamining itself: more power to them. The government gaining the ability to mine the data it already have should mean that we don't need more organizations, more intrusive investigations, etc. Every report or credible news item about post-9/11 studies indicates that we already had enough information, so there should be no need to create new laws that allow for more information to be collected. Just use what you have already, kthx.
    What would be nice is if this data-mining allowed Muslims living in the U.S. to stop having to wrry whenever they go outside. Look at the information publicly available, that may provide patterns of "nonobvious" connections, and let people live thier lives in peace, regardless of background.

    As a consumer, everything I do in public I consider public information. If a business uses this to better serve me, all the better. Maybe this will mean I don't have to watch feminine ads on TV, or the phone gets answered faster when I call. Maybe it just means that the customer rep knows my name and what I bought already.

  • If you look at closely at autism statistics, you'll notice it has a lower average correlation with all other statistics than 95% of the variables normally available to epidemiologists [clanarchy.com].

    So, I decided to mine almost 200 by-State demographic variables for correlates to autism by running through every combination of 2 variables via multiplication or division under a polynomial, exponential or null transformation -- then sorted them by their correlation to autism in the year 2000 [clanarchy.com].

    This is a case where what was "mined" was not just the raw data but various arithmetic combinations of statistical variables derived from the data. There needs to be some additional work to make the figure of merit, not just correlation but statistical significance. I couldn't find Perl modules that provide "alpha" (probability the null hypothesis is true) for correlations.

  • by Boss, Pointy Haired ( 537010 ) on Saturday January 04, 2003 @05:58PM (#5015802)
    Three large British retail companies have recently created a joint loyalty card.

    Nectar has been set-up by Sainsbury's (a supermarket), Barclays (a financial services company) and BP (a petrol filling station company).

    I didn't mind Sainsbury's knowing that I eat junk, but now that they're telling Barclays what junk I eat I end up with Barclays putting my life insurance premiums up.

    Interesting stuff.

  • In the last page, this Fayyad of digiMine claims that he doesn't want to work with the govt because the 'Bush administration' hasn't clearly enough articulated its vision of what it wants.

    I hope he was misquoted. There may be some legit reasons not to work with the US Govt. on anti-terrorism technology, but Mr. Fayadd is being either overly dismissive or just immune to opportunity by saying what he's quoted as saying. It sure is nice when the client comes to you with a fully articulated vision for the solution he needs, but most just start out with stated or even just perceived needs and leave it to the, ahem, vendors to provide the solution/vision.

    On another note, it would be interesting to read an article with some technical detail beyond a generic reference to XML. Maybe someone can post a link.

  • You can mine data to look for hidden business trends. If you mine the data really hard, you can see messages from GOD.
  • Ther term data mining is misleading. Mining is more a matter of sifting through lots of junk to get at the valuable material. That's not exactly what 'data mining' is about.

    If you want valuable information and you know what you're looking for, you just query. Find X in pile of data. That's mining. I know it's a semantic comment, but mining's not what we're talking about doing here.

    Data mining is more like what geneticists searching for a genetic cause for a cancer are doing. Finding usable correlations and meaningful precursors. We don't call cancer-fighting biologists 'gene miners'. I think the term mining belittles a more complicated activity.

    A better term? Data Correlating? Mining also just sounds brutish.

    • No, its mining, not coralating.
      If I have a cube of date, I can find things outside of how the data is orginazed.
      Data mining is not finding X in data, its finding X in data when X isn't nessarily an hard value.
  • The problem with automatic identification of any specific type of person within a large group (Say, the entire U.S. population - or , hey, the entire world! Why not? ) is the obscenely low false positive rate you must have. I mean to identify 100 terrorists in 270 million people, sure, a 50% false negative rate is fine (catching 50 terrorists is better than catching none, right?), but to not get those real terrorists swamped by innocent people who happen to match a profile, then the false positive rate must be lower than about 0.000037% ... that's almost impossible to achieve. And that is why automated terrorist (or anything) identification is still a long way off.
    • I'm not sure the goal is to have the miner spit out names of confirmable terrorists with that kind of accuracy. You're comment is fair if you're looking for that kind of entirely automated solution, but that's not the goal. It doesn't need to be 100% accurate in order to mitigate risk and pay for itself. Neither does the J Crew web site product predictor.

      The goal is definitely to help single out people that are worth further investigation. By motivated, thinking, observant humans. That's all.

      I also think you might be a little bit reductionist in your estimate of 100 terrorists. It's quite possible that there are many more, though I suppose it doesn't matter because even if you're looking for just one person, it's still worth doing.

      Given that you're looking for a reasonably good filter to find qualifiers for a round of investigation, a better metric to use might be the number of people you're willing to investigate as a ratio against those you hope to positively I.D. You might argue that you'd be happy to investigate 5,000 people just to find one 'terrorist'. If so, and you're looking for an estimated 100 terrorists, you can multiply to get the number of 'persons of interest' of 500,000 or .19% of the USA population. This % is much more achievable, and besides, then you use a different algo to ID which of these you should interview first or do MORE research on first.

      It seems pretty managable to me. I also think your assessment of the 50% false negative rate is too rosy. It seems to me that the risks would be serious enough of even 1 getting away (as in scanning baggage for instance) that you'd want to cast the widest net possible and then narrow those carefully. False negatives may be more costly than you are suggesting.

  • This article seems to explain very little of data mining, and is far from concise. The real gist of the article seems to be that data mining companies, which may be guilty of fraud and certainly seem to lack a viable business plan, are once again suckling off the teat of mother U.S.A. instead of finding the private customers that they all would claim is the basis of capitalism. Likewise, the military contractors are desperately tying to get into the data mining game to maintain relevance.

    I also take issue with the statement
    a customer whose IT ineptitude is matched only by its means
    which is clearly a jab at the hard working professionals of the US government and an effort to push privatization of IT functions. I have work with IT professionals in Academic, Industrial, Commercial, and Government settings. I will tell you that IT professionals in all these setting range from incompetent to brilliant. The difference is that, until recently, US employees have not had to live with the fear of random layoffs or arbitrary insurance reductions. I often wonder why it is unpatriotic to insult policemen, firemen, or military officers, but when it comes to the professionals that allow these people to work, no insult is severe enough.

    • Not that I advocate insulting "policemen, firemen, or military officers"...

      ...but I'd say that the difference is that these people are on the "front lines", so to speak. I'd rather have an IT job where I can surf /. on my spare time rather than have to investigate shootings, put out fires, or make strategy decisions that could potentially costs millions of lives.

  • *how* does data mining work? (beyond "it makes connections between various data.") I don't recall it ever coming up in any of my classes. It seems like it would be an AI problem.

    If everyone's going to go out and be paranoid, might as well know what we're being paranoid about.
    • The best device I know of for turning data into information is the human visual cortex. Forget AI use HI (Human Intelligence).

      The trick is to reduce the vast amount of data to something that can be scanned at a glance.

      Typically produce a list of relevant items (eg by grabbing the doc ids based on keywords from the source data), sorting by most relevant (the scoring system). So if three keywords match in a single doc, score it high. If those three keywords appear in another doc, score both high and set the both flag. The sorted list from high score to low is then scanned. Experience soon tells you if your scoring system is working. The list you now have (electronically hopefully), has links to the original docs, the anlayst then clicks and reads. If relevant - act. If not, go to next item.

  • I have not read all of this, but some of you with questions on how the actual Data Mining process works might get something out of it. Some of it is over my head, but that is not saying much. Check it out. http://sales.visualanalytics.com/whitepaper/index2 .cfm?Template=HowToCatchAThief
  • I always think of artificial intelligence when I hear data mining, and I kind of assumed that was what would be clairified (at least) by this article. However I was wrong.

    The most concerete evidence of success that is presented is that Victoria's Secret realized it sold tons of size X bras in New York and 10x as many white as black items in Miami. Um, I really hope they didn't have to hire a firm to tell them that. Don't they have spreadsheets? Does anyone look around the store and notice what sells?

    Which moves me on to another point. Companies seem to have very little faith in their employees and ask very little of them these days. (Gets out his pipe and rocking chair.) I remember when my sister got her first job at an ice skating rink. They sold ice skating outfits to (mostly) Mothers of young girls taking private ice skating lessons. My sister could tell you at a glance what outfits would sell first. (As I recall it was the most garish ones - she used to specifically ask for "ugly" or "anything that it looks designed by the color blind").

    Now a days, when I have to ask for help finding something in a store and I suggest a different location for it (real life example: Why don't you stock the phone connectors with your phones?) I get blank stares and comments along the lines of "ya, like my manager would listen".

  • yep, me.. (Score:3, Funny)

    by geekoid ( 135745 ) <dadinportland AT yahoo DOT com> on Sunday January 05, 2003 @04:26AM (#5018618) Homepage Journal
    ..and six other dwarfs grab are pickaxes, and lanterns, and go to the data mines.
    those 1's and 0' can be tricky..

  • Software developed by Autonomy, based in Cambridge, England, connected BAE's research databases and alerted civilian aircraft engineers to the fact that the wing-construction problem they were working on was also being addressed by the company's military division.

    That's not exactly a task for data miners - it's just bad communication! They could have done exactly the same thing just by making sure the directors were paying attention...there seems to be a big market for telling people the perfectly obvious.

Those who can, do; those who can't, write. Those who can't write work for the Bell Labs Record.

Working...