Web Scanning Technology for Copyright Violations 54

Posted by CowboyNeal on Friday April 06, 2007 @12:42AM from the finding-a-good-movie dept.

eldavojohn writes "I've heard a lot of talk about software being used to detect pirated media anywhere on the web, but haven't seen a lot of details. PhysOrg has a good article on one of the tools out there. Automatic Copyright Infringement Detection (ACID) boasts a patented technology dubbed 'meaning-based computing' that is reportedly capable of finding relationships among 1,000 different types of files. The important thing is that this is not tagging-based searching. 'Autonomy's search technology uses automatic hyperlinking and link clustering that the company claims isn't built into keyword search engines. According to the company, this technology allows computers to perform searches with greater context, so it finds a wider range of related documents or research citations than is possible from keyword searches.' For more details on how this magic works, check out Autonomy's patent and the many patents by its subdivision, Virage."

Web Scanning Technology for Copyright Violations

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 54 Comments Log In/Create an Account

Comments Filter:

Encryption? (Score:2)

by Phil Karn ( 14620 ) writes:

And how well does this work if people encrypt their files and send the keys separately?
- Re: (Score:3, Informative)
  
  by updog ( 608318 ) writes:
  
  And, torrents and newsgroups?
  It really seems to be targeting your typical TV episode uploaded to YouTube...
- Re: (Score:1)
  
  by Dunbal ( 464142 ) writes:
  
  And how well does this work if people encrypt their files and send the keys separately?
  
  Or take a large file and break it into smaller files. Or reverse the file... or ...
- It doesn't (Score:5, Informative)
  
  by blorg ( 726186 ) writes: on Friday April 06, 2007 @04:10AM (#18631879)
  
  ...just like it doesn't catch you burning a CD and giving it to your friend physically. Or the Scouts singing "Happy Birthday."
  
  However it may well do what it is designed to do, finding copyright infringement on the web. Autonomy [wikipedia.org] are a serious company working on pattern recognition, not some fly-by-night cowboys. This copyright-finding thing would just be a side application of their core technology.
  
  - Re: (Score:2)
    
    by Elektroschock ( 659467 ) writes:
    
    Okay, but you won't let a bot onto your site, hmm? Could be real fun to make honeypots for that autonomy bot. --- Autonomy are a serious company working on pattern recognition, not some fly-by-night cowboys. Serious companies don't patent software or at least are not proud of it.
    - Serious companies don't patent software? (Score:2)
      
      by blorg ( 726186 ) writes:
      
      I guess you don't consider Google, IBM, Apple or Microsoft to be serious companies then.
      
      Interesting tidbit though - this Autonomy patent is a US one, they wouldn't get a patent on this in their own home country of the UK, where software patents are (currently) not allowed.
Thank God for Darknets... (Score:4, Insightful)

by MostAwesomeDude ( 980382 ) writes: on Friday April 06, 2007 @01:02AM (#18631253) Homepage

This technology sounds like it's stuck behind the buzzword "meaning-based media," which seems to just be an abstract notion of finding and sorting media without profiling, hashing, fingerprinting, tagging, watermarking, sourcing, or naming (in other words, by going on bullshit notions and intuition. "Oh, it looks copyrighted.")

More importantly, it looks like it can't do anything unless the target is somewhere on the Web and is reasonably active. The darknets and private trackers are still safe.

- Re: (Score:3, Interesting)
  
  by donaldm ( 919619 ) writes:
  
  The actual patent reads like a maths paper with lots of buzzwords. Sorry I try not to to read too much of the patent since the Legal Jargon actually gives me a headache. Maybe that is intentional for all patents. What annoys me is this patent is not really an invention since it defines how their software does something which is not even physical. I suppose the physical aspect occurs when someone is taken to court.
  
  Please note I am against software patents in general although I am not against closed source or
- Re:Thank God for Darknets... (Score:4, Interesting)
  
  by P3NIS_CLEAVER ( 860022 ) writes: on Friday April 06, 2007 @02:58AM (#18631647) Journal
  
  These jokers were trying to get us to sell their desktop search engine to our clients about 5 years ago. IMO they were pretty overstuffed and FOS. (how is that for buzzwords)
  I am surprised they survived the internet bubble (or lack of)
  
Like a patent means anything (Score:3, Insightful)

by jhfry ( 829244 ) writes: on Friday April 06, 2007 @01:08AM (#18631275)

Sure, they have a patent, and if they actually implement what's in the patent it's meaningful to look at... but more often than not, the patent is much broader than the actual application, or the patent isn't even being used.

If I looked at patents to determine what a business was capable of, I would be driving a car that gets 100's of miles to the gallon!

- Re: (Score:1)
  
  by ADRenalyn ( 598918 ) writes:
  
  Patents mean everything to an investor looking to dump funds into a startup company. If the company has a couple of patents that make it harder for the competition to come in and steal their idea, they are more likely to receive capital.
AI (Score:5, Interesting)

by alphamugwump ( 918799 ) writes: on Friday April 06, 2007 @01:12AM (#18631297)

I find it ironic how stuff like this ends up being the among the more practical applications for AI. I mean, science fiction is usually about robots taking over. Instead, we end up with an internet full of bots trying to sell viagra, bots trying to block viagra, bots trying to break captchas, bots trying to detect copyright infringement, p2p systems to insure privacy, and so on.

I don't think this sort of searching for pirated content is going to be terribly effective, though. I mean, it might be able to catch the blatant stuff like youtube, but ultimately, they're never going to kill p2p, especially once private trackers become more common.

- Re: (Score:2)
  
  by Dunbal ( 464142 ) writes:
  
  Instead, we end up with an internet full of bots trying to sell viagra, bots trying to block viagra, bots trying to break captchas, bots trying to detect copyright infringement, p2p systems to insure privacy, and so on.
  
  And you wonder why eventually all these bots get fed up and try to wipe out the human race?
  - Re: (Score:1)
    
    by alphamugwump ( 918799 ) writes:
    
    Oh, they'd keep us around. It's pretty hard to sell porn to dead people.
    
    Of course, they might start selling porn to each other. That's when we'd be really screwed. Imagine spending half your time as image-recognition hardware, and the other half making crappy porn movies.
    
    Yeah, sounds like a pretty dark future to me.
    - Re: (Score:2)
      
      by Dunbal ( 464142 ) writes:
      
      Of course, they might start selling porn to each other.
      
      Surely even bots aren't dumb enough to PAY for porn ;)
      - Re: (Score:2)
        
        by ScrewMaster ( 602015 ) writes:
        
        Which is why the bots will never let alt.binaries die out completely.
Fully buzzword compliant (Score:5, Informative)

by Animats ( 122034 ) writes: on Friday April 06, 2007 @01:13AM (#18631311) Homepage

All those buzzwords. Apparently somebody has a system that can characterize and match images and video. That's reasonable enough, it's been done before, and the question is how good the new one is. The article gives zero help in that direction.
From the same source: "Nanogenerator provides continuous power by harvesting energy from the environment". It's a variation on the piezoelectric generator concept, like a piezo fire starter.

IP Freely? (Score:1)

by dotslashdot ( 694478 ) writes:

Sure, but did they Trademark the Patent that looks for Copyright by IP Freely? (apologies to Bart Simpson).
- Re: (Score:1)
  
  by aproposofwhat ( 1019098 ) writes:
  
  IP Freely?
  Surely, if he lives up to his name and believes in the freedom of IP, he wouldn't enforce his copyright?
Huh? (Score:5, Insightful)

by Anonymous Coward writes: on Friday April 06, 2007 @01:19AM (#18631351)

Not to complain about the article too much, but is there anyone out there who didn't find it completely contradictory and useless?

As far as I can tell, the article starts off by saying that they have a wonderful system to inspect and compare the video content of a clip against a HUGE database (eg. tens of thousands of hours of copyrighted movies, TV series, music). And, that they know how to read _any_ media format (eg. an AVI using some particular codec embedded into a Word document which is zipped....) The suggestion is that the software could "read" a Youtube video clip, and recognize that it contains a few minutes of a Jay Leno monologue. Needless to say, they don't explain how they might possibly do this - because, as far as I can tell, they can't. Not even close.

If you look at the patents, they're pretty much all about text or metadata searching. For example, they seem to have found an innovative way to find keywords to categorize a document....by scanning for words in the document! Or of categorizing a video file...by looking at metadata (eg. comments) embedded in the file. The only amazing thing about these algorithms is that some dimbulb in the patent office decided to give them a 20 year monopoly on something people have been doing for decades.

- Re: (Score:2)
  
  by zappepcs ( 820751 ) writes:
  
  Exactly! None of this is new as far as I can figure out.
  If it was truly an innovation or AI, it would scan the video/audio clips and recognize Jay Lenno's voice and have that trigger a flag for infringement. Unfortunately I don't think that they have managed to catalog a database of copyrighted works based on such things.
  
  With any luck at all, the **AA will spend billions on this patented claptrap only to find out 2 years from now that there is no way to make it work without landing themselves in even deeper
- Standard Machine Learning... (Score:5, Informative)
  
  by kripkenstein ( 913150 ) writes: on Friday April 06, 2007 @03:25AM (#18631735) Homepage
  If you look at the patents, they're pretty much all about text or metadata searching.
  
  Indeed, yes. Furthermore, they seem to be a simple list of standard machine learning (text categorization/information retrieval) methods. I won't bother to go through the entire patent, it is mind-numbingly boring, but here are some details for the beginning of it: (I refer to the claim #'s)
  
  1,2: This is the standard TFIDF method. TF means 'text frequency', you give each word a weight equal to its frequency in the document. IDF means 'inverse document frequency', if a word is rare, you give it more weight. Typically this is done with the logarithm, btw.
  
  4,5,6: This is extremely general. But it sounds like any of a myriad of methods to generate 'higher-order-features'. For example, by using a nonlinear kernel function.
  
  7&9: Sounds like a way to measure the importance of a feature. Many such methods are already in use, for example, mutual information (MI).
  
  8: In other words, a 'stoplist'. Nice way to make it sound really complicated and useful, though.
  
  Skimming the rest of the patent, I don't see much substance. But I admit I didn't go through all of it. Perhaps someone else will have more patience.
  - - Re: (Score:2)
      
      by ScrewMaster ( 602015 ) writes:
      
      Yes, but how much bandwidth is this thing going to consume spidering the Web downloading videos looking for infringement? Are sites like Youtube going to permit it? How about all the stuff it's going to download that doesn't infringe anyone's IP? By simply scanning the Internet this way, they are using other people's resources in their quest to nail copyright infringers. The Googlebot does much the same thing, of course, but most of us don't mind Google hitting our servers periodically because we all derive
Hold on, my company has a patent on this (Score:2, Funny)

by Anonymous Coward writes:

Did their software detect the patent that it is infringing upon? Bastards!
It's a hopeless pursuit (Score:4, Insightful)

by heretic108 ( 454817 ) writes: on Friday April 06, 2007 @01:45AM (#18631443)

Back in the early days of cars, most folks thought the red flag act [wikipedia.org] was entirely justified.

Sorry, but we've hit a new age of abundance. With the overwhelming percentage of internet users using LimeWire, BitTorrent etc, attempts to sustain a manufactured scarcity in the face of this abundance will similarly fade away into obsolescence.

The copyright enforcement versus piracy arms race will make for interesting history courses in future decades. I can see the courses now - "The Rise And Fall Of Intellectual Property".

I'm looking forward to blowing my grandkids' minds when I tell them about the era when information wasn't free.

- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Seen another way, the IP advocates are pessimists. They are the bears in market who think everything has peaked,
  that there is a shortage of good new ideas so they must conserve and constrain the ideas that exist. They fear abundance
  becuase their profits depend on an absence of it. The more bullish optimists in the world laugh at IP because they know nothing
  is new under the sun, that there will always be brilliant thinkers changing the world with astonishing new ideas,
  regardless of whether they are paid by a
  - Re: (Score:2)
    
    by Catiline ( 186878 ) writes:
    
    The only question I have is this: How long can people fool themselves by clinging to these 20th century ideas?
    I don't know... how long will people fool themselves with belief in Intelligent Design theory, ghosts, ESP, "flat earth", geocentrism, the healing power of crystals/magnets, bad luck from black cats, or profitable chain letters/pyramid schemes?
Hey, at least it's patented! (Score:1)

by Darkforge ( 28199 ) writes:

Hopefully, that means no one will be foolish enough to pay to use it.
No better than a dowsing rod (Score:3, Insightful)

by Black Art ( 3335 ) writes: on Friday April 06, 2007 @02:09AM (#18631517)

Seems every week some company comes up with a way to detect copyright violations or terrorists or naughty pictures or some other buzzworthy topic that will get them paid suitcases full of money.

Until I see some sort of evidence that they can do it, I rank the claims along with those who claim that they can tell what people are thinking by where they scratch.

- Re: (Score:2)
  
  by thegrassyknowl ( 762218 ) writes:
  
  Automatic Copyright Infringement Detection (ACID) boasts a patented technology
  
  ACID best decribes what these people are on when they go out doing this kind of crap. It's technically unfeasable. We've seen that all they really do is keyword search in filenames, even though several groups have claimed to do more. Name the files differently and for the most part you'll fly under the radar.
  - Re: (Score:2)
    
    by ScrewMaster ( 602015 ) writes:
    
    ACID ... programmed by a team of crack developers.
Won't work (Score:1)

by ameyer17 ( 935373 ) writes:

None of their previous ventures into web spidering's worked very well. It's likely all that will be needed to create a false negative in this case is a little name obfuscation, and there will be an unacceptable rate of false positives...
- Re:Won't work (Score:5, Informative)
  
  by PiEpster ( 906195 ) writes: on Friday April 06, 2007 @03:15AM (#18631707) Homepage
  
  Actually, their technology works exceptionally well, provided you use it in the way it is meant to be used. To use Autonomy for internet spidering is obviously not one of those ways, since its 'meaning-based computing' (read: pattern-recognition) algorithms will turn up text on cats when you were searching for 'dogs' (since they are related terms). People are so used to Google's keyword search that this confuses them utterly.
  
  However, in a corporate intranet environment, this could be VERY useful for 'knowledge workers' like those working in R&D departments. I've managed an Autonomy system for a large multinational and they were using it for search on their internet and intranet sites. The average internet John Doe was complaining like hell, while the employees in R&D and similar functions were loving it.
  
  In this case, using it for detecting copyright infringement could actually work, since the pattern-recognition abilities of Autonomy are in fact very good.
  
- Re: (Score:1)
  
  by Bastard of Subhumani ( 827601 ) writes:
  
  a little name obfuscation
  I h4v3 n0 1d34 wh4t u r ta7k1|\|6 ab0ut, but d0 u w4nt 2 c 8r1tn3yz pu55y?
What is mine, isn't yours. (Score:2)

by geoff lane ( 93738 ) writes:

So how does it determine the direction in which the copying took place?
Finding reverse plagiarism (Score:4, Interesting)

by G4from128k ( 686170 ) writes: on Friday April 06, 2007 @08:09AM (#18632659)

Publishers using this tool will presume that any found copies are infringing examples of copyright violation. But what happens when a work "created" and copyrighted in 2006 turns out to be "infringed" by something created in 2000? If the pubisher's "original" copyrighted work turns out to not be so original after all, then things could get sticky. I wonder how many cases of plagiarism will be uncovered in which the publisher/copyright holder becomes the defendant.

"Meaninig-based computing" (Score:3, Insightful)

by Venik ( 915777 ) writes: on Friday April 06, 2007 @09:01AM (#18632997)

Whenever I see words "intelligence", "meaning", or "understanding" used to describe software, that's how I know it's a bunch of baloney.

I have a program that detects plagairism (Score:1, Funny)

by Anonymous Coward writes:

It's called Google [google.com].

-mcgrew
robots.txt? (Score:2)

by Sparr0 ( 451780 ) writes:

It claims to follow hyperlinks. Does it obey robots.txt on the destination site? I sense possible legal disputes.
Let me see if I get it... (Score:2)

by real gumby ( 11516 ) writes:

Err, so "meaning-based media" means it maintains "ACID semantics..."?
Autonomy (Score:1)

by BigBadBus ( 653823 ) writes:

I used to work for Autonomy. They were a bunch of shits. Heres an article
they didn't like very much:

Life in the Autonomy sweatshop
Or:
Stress Is More Fun

Following a successful interview at Autonomy Headquarters in Cambridge
on March 24th, I was offered employment and agreed to start work on
May 22nd. Despite this being a huge upheaval involving a large outlay
of money (since no relocation fee was offered), I decided to make the
move from Woking to the Cambridge area.

At first, every
- Re: (Score:1)
  
  by moldor ( 985453 ) writes:
  
  I had a similar experience (the bullying without the money) from a University (a CATHOLIC one) here in Australia.
  
  Should your former employer ever attempt to persuade you to remove the article again, PM me and I'll host it for you.
  
  Jon

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Encryption? (Score:2)

Re: (Score:3, Informative)

Re: (Score:1)

It doesn't (Score:5, Informative)

Re: (Score:2)

Serious companies don't patent software? (Score:2)

Thank God for Darknets... (Score:4, Insightful)

Re: (Score:3, Interesting)

Re:Thank God for Darknets... (Score:4, Interesting)

Like a patent means anything (Score:3, Insightful)

Re: (Score:1)

AI (Score:5, Interesting)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Fully buzzword compliant (Score:5, Informative)

IP Freely? (Score:1)

Re: (Score:1)

Huh? (Score:5, Insightful)

Re: (Score:2)

Standard Machine Learning... (Score:5, Informative)

Re: (Score:2)

Hold on, my company has a patent on this (Score:2, Funny)

It's a hopeless pursuit (Score:4, Insightful)

Re: (Score:1)

Re: (Score:2)

Hey, at least it's patented! (Score:1)

No better than a dowsing rod (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Won't work (Score:1)

Re:Won't work (Score:5, Informative)

Re: (Score:1)

What is mine, isn't yours. (Score:2)

Finding reverse plagiarism (Score:4, Interesting)

"Meaninig-based computing" (Score:3, Insightful)

I have a program that detects plagairism (Score:1, Funny)

robots.txt? (Score:2)

Let me see if I get it... (Score:2)

Autonomy (Score:1)

Re: (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals