Copyright Tool Scans Web For Violations

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Copyright Tool Scans Web For Violations 185

Posted by Zonk on Tuesday December 19, 2006 @12:21PM from the he-knows-when-you've-been-bad-or-good dept.

The Wall Street Journal is reporting on a tech start-up that proposes to offer the ultimate in assurance for content owners. Attributor Corporation is going to offer clients the ability to scan the web for their own intellectual property. The article touches on previous use of techniques like DRM and in-house staff searches, and the limited usefulness of both. They specifically cite the pending legal actions against companies like YouTube, and wonder about what their attitude will be towards initiatives like this. From the article: "Attributor analyzes the content of clients, who could range from individuals to big media companies, using a technique known as 'digital fingerprinting,' which determines unique and identifying characteristics of content. It uses these digital fingerprints to search its index of the Web for the content. The company claims to be able to spot a customer's content based on the appearance of as little as a few sentences of text or a few seconds of audio or video. It will provide customers with alerts and a dashboard of identified uses of their content on the Web and the context in which it is used. The content owners can then try to negotiate revenue from whoever is using it or request that it be taken down. In some cases, they may decide the content is being used fairly or to acceptable promotional ends. Attributor plans to help automate the interaction between content owners and those using their content on the Web, though it declines to specify how."

This discussion has been archived. No new comments can be posted.

Copyright Tool Scans Web For Violations

Load All Comments

Search 185 Comments Log In/Create an Account

Comments Filter:

Wager (Score:4, Insightful)

by Baricom ( 763970 ) writes: on Tuesday December 19, 2006 @12:33PM (#17300868)

Anybody care to place a friendly wager that they're not going to honor robots.txt?

Share
twitter facebook
- Raise. (Score:4, Funny)
  
  by Tackhead ( 54550 ) writes: on Tuesday December 19, 2006 @12:44PM (#17301006)
  
  > Anybody care to place a friendly wager that they're not going to honor robots.txt?
  127.0.0.1: $ cat robots.txt
  # robots.txt for 127.0.0.1 # This file is copyright 2006 by me. User-agent: AttributorCorporationDMCABot Disallow: *
  And if they do honor robots.txt, I'll be able to sue the fuckers for infringing on my copyright, because they must have read it in order to honor it.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Hijacked Public ( 999535 ) writes:
    
    Good luck with that.
    Unless you also sell a few companies and put together a few billion as a stake to hand over to attorneys I suspect you'll fare as poorly as everyone else does.
  - Re: (Score:2, Insightful)
    
    by rhartness ( 993048 ) writes:
    
    You know, I've actually had a thought along those lines in trying to explain to untechnologically savvy individuals why Digital Rights laws are screwed up and that handling digital content on the web is a grey area. Consider the following.
    
    Most web sites have a copyright statement on them some where (even this one!). Technically speaking, if I go to that web site, my browser copies the page along with all it's media content and caches it. Since many of those sites do not have a terms of service posted
  - Re: (Score:2)
    
    by commodoresloat ( 172735 ) * writes:
    
    Reading something does not violate its copyright. If they distribute copies of robots.txt you might have a case of some sort.
    - Re: (Score:2)
      
      by advocate_one ( 662832 ) writes:
      
      Reading something does not violate its copyright. If they distribute copies of robots.txt you might have a case of some sort.
      
      how can you read it on the web then without having made a copy of it somewhere on your computer... you've pulled in a copy of it using your browser, there is now a copy of it in ram and also maybe in the cache... so you've made at least two unauthorised copies.
      - Re: (Score:2)
        
        by civilizedINTENSITY ( 45686 ) writes:
        
        Perhaps it doesn't matter because you aren't distributing the copies?
      - Re: (Score:2)
        
        by Da_Weasel ( 458921 ) writes:
        
        It's called fair use. Maybe using one of the "Offline" browsing options in browsers might step over the fair use line, but the cached copy and in memory copy don't.
  - Re:Raise. (Score:5, Funny)
    
    by Mayhem178 ( 920970 ) writes: on Tuesday December 19, 2006 @01:12PM (#17301304)
    
    127.0.0.1: $ cat robots.txt
    # robots.txt for 127.0.0.1
    # This file is copyright 2006 by me.
    User-agent: AttributorCorporationDMCABot
    Disallow: *
    
    Hahaha! You screwed up! I have your IP address now! I will send 127.0.0.1 to every company that uses the sniffer and tell them the person at that IP is an evil, evil person who exploits innocent people for their own profit and power!
    
    Parent Share
    twitter facebook
    - His IP is my IP to (Score:2, Funny)
      
      by Anonymous Coward writes:
      
      and whenever I go out, the FBI begins to shout Title 17 U.S.C...
  - Re:Raise. (Score:4, Interesting)
    
    by FooAtWFU ( 699187 ) writes: on Tuesday December 19, 2006 @01:32PM (#17301496) Homepage
    
    You joke, of course, of course, but there are tools out there to detect when a bot is abusing your site and not following robots.txt. The usual technique is to hide a few links in your page, and also have these links blocked by robots.txt. When a user visits the link, they're banned from viewing the site. (Sometimes, a CAPTCHA-like utility for unblocking yourself is presented along with the 403 page, in the event that a particularly curious user manages to find the link and activate it manually.)
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Insightful)
      
      by Kamiza Ikioi ( 893310 ) writes:
      
      True, but there's a way around that as well. Any robot service worth its weight in fiber has more than one IP, and can have multiple subnets. Best way is to dump robots.txt links to a separate subnet, have it check later in the day. If the IP gets banned, it can check by trying to access the main page, see if it starts getting errors. It can then mark "booby-trap" sites on a list, and route around either the specific triggers or actually honor the robots.txt.
      
      You have to have more links than they have IP
- Re: (Score:3, Informative)
  
  by Crudely_Indecent ( 739699 ) writes:
  
  Another company "Cyveillance" already does this for major corporations and the government. I've used htaccess rules to disallow all from their assigned netblocks after they racked up almost 20,000 hits to my personal site in one day. As you mentioned, they didn't follow robots.txt and attempted to index parts of my site that are password protected as well as content names that did not exist (music and videos and such), all the while identifying their bot as a variant of IE.
  
  Here's how to block two subnets
  - Re: (Score:3, Informative)
    
    by BrynM ( 217883 ) * writes:
    
    There's an easier way. You can hand mod_access netblocks and more [apache.org]. This method will avoid eating cycles with mod_rewrite. If you can put it in your conf instead of .htaccess, you'll save even more time/processing. Just put it in for your doc root. From my httpd.conf:
    <Directory "/var/www/htdocs/"> # BRYN'S DENIALS # allresearch.com deny from 209.73.228.160/28 # branddimensions.com user-agent: BDFetch deny from 204.92.59.0/24 # cyveillance.com deny from 63.148.99.224/27 deny from 65.118.41.192/27 # www.mar
- Re: (Score:2)
  
  by PalmKiller ( 174161 ) writes:
  
  If they dont honor them, I will bet that the new startup's ip address blocks are filtered at most routers though.
- Re: (Score:2)
  
  by antarctican ( 301636 ) writes:
  
  Anybody care to place a friendly wager that they're not going to honor robots.txt?
  
  I had a similar thought. How much extra bandwidth is this going to suck from sites hunting for copyright material on completely legitimate sites? Particularly sites which might have a lot of large media content.
  
  If I put up a terms of service forbidding the crawling of my site, can I then sue them for bandwidth costs? Seems reasonable to me, why should I be presummed to be guilty?
- - Re:i don't like robots.txt anyway. (Score:5, Informative)
    
    by FooAtWFU ( 699187 ) writes: on Tuesday December 19, 2006 @12:57PM (#17301150) Homepage
    
    You're absolutely right that "if you don't want it on the public Web, don't put it there in the first place" -- but there are still times when you have a legitimate reason that you don't want a page indexed, downloaded, or otherwise visited by a robot. Dynamically generated content is one example reason; sometimes certain pages can be a big drain on your website, and you'd prefer not to have every spider in the world hitting them up every few minutes.
    Let's take a fun legitimate site like, oh... Wikipedia [wikipedia.org]:
    
    # Folks get annoyed when VfD discussions end up the number 1 google hit for # their name. See bugzilla bug #4776 # en: Disallow: /wiki/Wikipedia:Articles_for_deletion/ Disallow: /wiki/Wikipedia%3AArticles_for_deletion/ Disallow : /wiki/Wikipedia:Votes_for_deletion/ Disallow: /wiki/Wikipedia%3AVotes_for_deletion/ Disallow: /wiki/Wikipedia:Pages_for_deletion/ Disallow: /wiki/Wikipedia%3APages_for_deletion/ Disallow: /wiki/Wikipedia:Miscellany_for_deletion/ Disallow : /wiki/Wikipedia%3AMiscellany_for_deletion/ Disall ow: /wiki/Wikipedia:Miscellaneous_deletion/ Disallow: /wiki/Wikipedia%3AMiscellaneous_deletion/ Disallo w: /wiki/Wikipedia:Copyright_problems Disallow: /wiki/Wikipedia%3ACopyright_problems
    
    (They also disallow certain specially generated pages like Special:Random, and any of the pages which actually let you edit the site).
    Let's see, what are some other sites? Ooh. Take a look at Slashdot's robots.txt [slashdot.org]! (disallows a variety of fun pages.) Microsoft's? [microsoft.com] How about whitehouse.gov [whitehouse.gov]? Google [google.com]?
    
    Parent Share
    twitter facebook
    - Re:i don't like robots.txt anyway. (Score:5, Informative)
      
      by mandelbr0t ( 1015855 ) writes: on Tuesday December 19, 2006 @02:32PM (#17302496) Journal
      
      Dynamically generated content is one example reason; sometimes certain pages can be a big drain on your website
      
      And dynamic content is, of course, the answer. If I'm going to put up copyrighted content in the future, I'd use one of a dozen schemes that regenerate the download link on a per-session basis. Obviously they're not going to honour robots.txt, but why are your links readable by such a basic spider? You need to:
      
      Disallow anonymous downloads. You need to be logged onto the site to download anything, torrent or otherwise
      
      Use a CAPTCHA to prevent spiders from signing up for said accounts
      
      Use the session id to generate unique download links on a per-session basis
      
      Change the key on your BitTorrent tracker every 12-24 hours. This will require that a downloader get the latest torrent from the original website (which requires login), reducing the impact of a leaked torrent
      
      Compress and possibly encrypt the content so that it's less obvious what it is
      
      Anyone who follows the above steps (and most sites already do most or all of this) won't be found by the spider. Period.
      
      The only thing I can think of that this product would be useful for is to find people who have blatantly copied my website, but I'm sure you could find those people equally easily with Google.
      
      mandelbr0t
      
      Parent Share
      twitter facebook
  - - Re: (Score:2)
      
      by Da_Weasel ( 458921 ) writes:
      
      You can't leave naked pictures of your girl friend laying around on the side walk and then get mad at people for looking at them.
      
      Putting a non-password protected web site online is about the same thing.
- - Re: (Score:2)
    
    by markana ( 152984 ) writes:
    
    If they're fingerprinting such a small amount of source material, then they'll generate *mostly* false positives. Of course, that won't stop them from sending takedowns and auto-suits based on just the supposed match. You just can't get a very unique fingerprint with so few input bits.
    
    I hope everyone is prepared for the massive flood of notices this is going to generate...
- - Re: (Score:2)
    
    by VJ42 ( 860241 ) writes:
    
    Upon 3, are they criminally liable for hacking?
    IANAL but I believe that they would be in breech of the 1990 computer misuse act here in the UK. I don't know about your part of the world.
  - Re: (Score:2)
    
    by Fred_A ( 10934 ) writes:
    
    2) Sites that don't want to be scanned by them will add code to their rewriting rules and/or dynamic pages so that their search bot gets directed to a dead-end page.
    
    What if the copyright violation scanning bot uses "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" for its UA string ?
    
    Presumably since they are looking for "questionable content" they won't be playing by the rules.
Can't they just use google or torrent sites? (Score:3, Informative)

by LiquidCoooled ( 634315 ) writes: on Tuesday December 19, 2006 @12:33PM (#17300870) Homepage Journal

Can't they just use google or torrent sites?
If users can find items they want, presumably the copyright holders could use the same methods...

Share
twitter facebook
- Re: (Score:3, Funny)
  
  by owlnation ( 858981 ) writes:
  
  And the opposite situation shows why this tool is a waste of time.
  
  Imagine a tool where you could reliably return accurate and search results for images and video. Does this exist yet? No, as one who searches the web daily for pics and video for my own sordid uses, let me assure you that it most certainly does not yet exist.
  
  And what an horrific waste to have such a tool - if it works - for policing content for copyright violations. Bearing in mind also that such "violations" are no such thing in some
  - Re: (Score:2)
    
    by advocate_one ( 662832 ) writes:
    
    As always, and tell your family and friends, only buy music directly from the artist or secondhand. It's the only way to win.
    
    or else make it yourself... but then again you've got to pay the nickel for the bl00dy sheet music or tabs... and they don't half try to rip you off there as well... it's that or write your own... and then try and stop them from ripping you off...
buh (Score:5, Insightful)

by lucky130 ( 267588 ) writes: on Tuesday December 19, 2006 @12:36PM (#17300910)

"as little as a few sentences of text or a few seconds of audio or video"

Like quotations in a paper, or video snippets in an educational presentation?

Share
twitter facebook
- - Re:buh (Score:5, Insightful)
    
    by NeutronCowboy ( 896098 ) writes: on Tuesday December 19, 2006 @01:38PM (#17301574)
    
    You're assuming anyone is going to manually verify any of the results. From my experience with people using monitoring software (especially non-techies who are simply consumers of the technology, but who provided the money for it), the vast majority of them are simply going to call their lawyers when they see the dashboard light up. I see vast letter writing campaigns come from this, with little actual infringing being prosecuted.
    
    This is a scary product. Not so much because of the technology behind it, but because of how it is going to be implemented and (ab)used.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by Reziac ( 43301 ) * writes:
      
      I had the same thought. This is going to catch an awful lot of "fair use" snippets in the crossfire.
      
      It wouldn't be so bad if the crawler would then further verify that the ENTIRE work was present and infringed, but you can bet it'll lead to a flurry of half-cocked threats instead.
    - Re: (Score:2)
      
      by grimJester ( 890090 ) writes:
      
      Does it matter? Normal people can't afford to defend themselves in court, so the money will roll in regardless.
      
      Hmm.. Is that profit I smell?
Spam obfuscation techniques suddenly useful... (Score:2)

by scottsk ( 781208 ) writes:

Seems like spam obfuscation techniques will be useful against this sort of scan, too, if someone really wanted to infringe on copyright.
Yeah.. good luck with that. (Score:2)

by Rob T Firefly ( 844560 ) writes:

The Wall Street Journal is reporting on a tech start-up that proposes to offer the ultimate in assurance for content owners.
This almost had me going until the second half of the sentence. When has anyone ever offered any product as the "ultimate" anything that ultimately proved to actually ultimately be the ultimate whatever it was?
Fighting an avalanche with a snow shovel (Score:5, Insightful)

by TheWoozle ( 984500 ) writes: on Tuesday December 19, 2006 @12:42PM (#17300994)

Doesn't this merely serve to point out the absurdity of "Intellectual Property"?

Share
twitter facebook
- Re: (Score:2)
  
  by cryfreedomlove ( 929828 ) writes:
  
  TheWoozle,
  
  Today's world of copy protection is voluntary. You have the right to produce content that people want and to waive copyright on it. That's your free choice. Are you doing that? If not, then why not?
  - Re: (Score:2)
    
    by TheWoozle ( 984500 ) writes:
    
    At least in the U.S. (where I'm from), copyright is an "opt out" form of copy protection. I'd rather it was "opt in".
    
    Early physical and psychological development in humans is spurred by, and social behavior is learned through, imitation. We are, it appears, hard-wired [washington.edu] to imitate other humans. Art and self-expression are rooted in imitation of others and almost all art forms are taught by imitation (called "technique") and most art is derivative of earlier expression.
    
    In light of all this, it seems abs
    - Re: (Score:2)
      
      by cryfreedomlove ( 929828 ) writes:
      
      I'd rather it was opt in as well. However, you could still be a producer and opt out. Have you done that?
Yeah (Score:4, Interesting)

by Hijacked Public ( 999535 ) writes: on Tuesday December 19, 2006 @12:45PM (#17301020)

FTFA:
If it works, it's a fantastic invention

Its purpose aside, yes, it would be a fantastic thing to be able to scan the entire web and reliably identify the context and content of any specific media file type. Video, audio, image, etc. Particularly if it could identify purposely obfuscated content.
I'm in what is almost certainly a tiny minority of Slashdotters in that I actually create copyrightable material rather than only consume it. I'm again in the minority in that I think copyrights are a good thing and again in the minority in that I can separate out the purpose of copyrights and the evil actions of the legal arms of **AA companies.
Regardless, while scanning the internet for improperly used material sounds great on paper this will probably end up being as effective as finding water with a divining rod. The current tactic of locking down things at the hardware and OS levels will get more support from the media companies, not that they seem all that good at choosing tactics when the internet is involved.

Share
twitter facebook
- Re:Yeah (Score:4, Insightful)
  
  by jedidiah ( 1196 ) writes: on Tuesday December 19, 2006 @12:59PM (#17301166) Homepage
  
  There's a wide gulf between copyright being a good idea in concept and being sensibly implemented in it's current form.
  
  Not everyone that creates content thinks that draconian enforcement attempts are a good idea, or even in the best interests of those that create content.
  
  If your work can't survive in the marketplace, which includes the prospect of everyone on the planet getting to use it for free, then perhaps you should get some sort of more conventional day job.
  
  The difference between a game that sells 50K and one that sells 5 Million has nothing to do with DRM.
  
  Parent Share
  twitter facebook
- Re:Yeah (Score:4, Interesting)
  
  by AdamKG ( 1004604 ) writes: <slashdot&adamgomaa,com> on Tuesday December 19, 2006 @01:17PM (#17301346) Homepage
  
  and again in the minority in that I can separate out the purpose of copyrights and the evil actions of the legal arms of **AA companies.
  Let's make one thing clear: the RIAA/MPAA lawsuits are not, in any way, shape, or form, an abuse, negative side of, misapplication or malicious use of Copyrights. They fulfill the role of Copyrights in the first place; they are the logical end result of a system that says citizens are allowed to distribute ideas (or expressions of ideas), then stop any further distribution of them.
  
  The **AA lawsuits are ridiculous, yes. But the ridiculous part is not the litigation itself, it's the laws on which the lawsuits are brought under.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by DeadChobi ( 740395 ) writes:
    
    Just one little niggle, but citizens are most certainly not required to stop distributing an idea once the implementation of that idea is copyrighted. Otherwise there would be no more crappy songs about high school relationships on the radio after the first, as the idea of obsessively romantic love will have been copyrighted. The idea is to expressly prohibit the copying of a specific expression of an idea while still maintaining everyone's right to love each other like idiots, for example.
- Re: (Score:3, Interesting)
  
  by kanweg ( 771128 ) writes:
  
  I'm a patent attorney and no stranger to IP. Having said that, any IP law is, or at least should be, a balance to on the one hand freedom to operate (both for IP users and for IP creators) and on the other hand a means for compensation for IP creators. For patents, that balance is not there for patents on software. Also for patents, at least they last for 20 years max. For copyright, that balance is not there. And I'm curious to hear whether you think it is a good thing that whatever you create is still und
  - Re: (Score:2)
    
    by fatman22 ( 574039 ) writes:
    
    Copyrights and patents are there to protect the ownership of, and distribution/licensing rights to, original works created or invented by people. They should belong solely to the creator(s) or inventor(s) of the works or ideas and be nontransferable and non-inheritable.
  - Re: (Score:2)
    
    by Hijacked Public ( 999535 ) writes:
    
    And I'm curious to hear whether you think it is a good thing that whatever you create is still under copyright more than 40 years after you die
    No I do not think that life+40 years is a good thing. Any length of time is likely to be some arbitrary guess, but anything more than the life of the creator is too long in my estimation.
    These repeated attempts by media companies to extend the time periods for both their copyright and sometimes mine make a lot of news here and are often held up as examples of the way copyrights have been bent against the public. When compared with the reality of file sharing they matter very little though. A look at
  - Re: (Score:2)
    
    by grcumb ( 781340 ) writes:
    
    I'm a patent attorney and no stranger to IP.
    Are you indeed? Then you should know better than to use the term 'Intellectual Property'.
    You of all people should know that no such thing exists - certainly not under the laws of any country I've ever had the leisure to study. A lawyer of all people should know better than to bandy inaccurate, misleading terms about. I believe the reason is that unwise talk such as that can come back to... what the legal term again? Ah yes: bite you in the ass. 8^)
- Re: (Score:2)
  
  by Laur ( 673497 ) writes:
  
  I'm in what is almost certainly a tiny minority of Slashdotters in that I actually create copyrightable material rather than only consume it.
  Everyone who posts on Slashdot creates coyrighted material.
- Re: (Score:2, Insightful)
  
  by DamnStupidElf ( 649844 ) writes:
  
  I'm in what is almost certainly a tiny minority of Slashdotters in that I actually create copyrightable material rather than only consume it. I'm again in the minority in that I think copyrights are a good thing and again in the minority in that I can separate out the purpose of copyrights and the evil actions of the legal arms of **AA companies.
  
  Tiny minority? Everyone who posts to slashdot is creating copyrighted material. Everyone who sends an email or writes on a post-it note is creating copyrighted m
- Re: (Score:2)
  
  by Relic of the Future ( 118669 ) writes:
  
  "I'm in what is almost certainly a tiny minority of Slashdotters in that I actually create copyrightable material"
  Well aren't we all high-and-mighty. Forget something though?
  "All trademarks and copyrights on this page are owned by their respective owners. Comments are owned by the Poster."
  (Virtually) EVERY expression of an idea is copyrightable; including every lame post made to /.. You've fallen for the same trap as so many others (artists, politicians, even everyday people) of believing that it only
- - Re: (Score:2)
    
    by teamhasnoi ( 554944 ) writes:
    
    Hell, most slashdotters do create copyrightable material. That email you sent to your sysadmin? Copyrightable (oops, almost said girlfriend there). That comment you wrote on Slashdot? Copyrightable (well, nevermind. Most are dupes).
    
    Copyright © me, 2006
    I'm in ur Slashdot
    Infringing ur copyrights
    Fair Use in da house?
and in little pieces, they will consume bandwidth (Score:2)

by way2trivial ( 601132 ) writes:

roughly equal to the entire volume of the publically available internet..

think about it, to do what they say, they have to request ALL the data they can lay their hands on,
and then chuck it.. and for comparative purposes, they'll have to do it again.

so Sony hires 'jfm copyright trackers'
and microsoft hires 'sco copyright trackers'
and mgm hires yo momma

and each of these 'ip owners' representatives have to scour the entire net, bit by byte by megabyte, for their clients.

holy crap! think about the potential
Software is in beta (Score:3, Funny)

by Weaselmancer ( 533834 ) writes: on Tuesday December 19, 2006 @12:46PM (#17301030)

Attributor plans to help automate the interaction between content owners and those using their content on the Web, though it declines to specify how.

And apparently being written by underpants gnomes.

Share
twitter facebook
Some interesting questions... (Score:5, Insightful)

by PingSpike ( 947548 ) writes: on Tuesday December 19, 2006 @12:46PM (#17301032)

Great, now all the torrent sites will require captcha verification too! ;P

Actually, can they even scan torrents without downloading the entire file? And whats to stop everyone from just blocking them from accessing their websites? Are they going to go in covertly, pretending to be actual users? I can see every legit website blocking their access as well, why pay for bandwidth to supply that?

Sure, youtube can be more efficiently attacked...but youtube has been dancing in front of the cannons since its inception, we all knew it was going to get shot eventually.

Share
twitter facebook
- Re: (Score:2)
  
  by NeutronCowboy ( 896098 ) writes:
  
  Here's another thought: what if your copyright license expressly forbids this kind of downloading? Can you then sue whoever downloaded your home grown musical, fanfic or picture of your cat via that tool?
  
  Then again, this entire counter-suing point is completely moot. Very few individuals have the money to slug it out in court with large media publishers, and not too many businesses can either.
Dashboard (Score:2)

by AVee ( 557523 ) writes:

This must be really essential bussiness software. It has a Dashboard! Wanna bet the next version is SOA enabled?
search by hash? (Score:4, Interesting)

by straponego ( 521991 ) writes: on Tuesday December 19, 2006 @12:47PM (#17301040)

Does Google allow searching by md5sum or equivalent? I'm sure they have the capability. While not as impressive as what this company claims, it'd also be more reliable for unaltered media files.
But it looks like the real "innovation" these guys are pushing toward is fully automated filing of lawsuits. I think that was in Accelerando, which is fantastic, and which you can download it free. [accelerando.org]

Share
twitter facebook
- Re:search by hash? (Score:4, Informative)
  
  by Johann Lau ( 1040920 ) writes: on Tuesday December 19, 2006 @01:45PM (#17301650) Homepage Journal
  
  "Unaltered media files" are the exception, not the rule. Changing even a bit of metadata (stripping exif from an image, changing an mp3 tag) would change the checksum, not to mention things like putting things into an archive, resizing images, (re)recompressing music.
  
  But yeah, it might make sense for Google to become "aware" of unique content and variations of it.. but I doubt they'd ever use that openly for (aiding in) hunting down copyright infringement, simply for PR reasons.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by sootman ( 158191 ) writes:
    
    They're still common enough to make md5-based searching a very useful tool. And, in fact, a lot of stuff does just get downloaded form here and posted there with no change.
  - Re: (Score:3, Interesting)
    
    by stivi ( 534158 ) writes:
    
    Hm, what about computing checksum of the actual media contents? For example, compute checksum only for sound data in MP3 or image data in image files, ignore all other data/metadata. Usualy media files are containers for smaller objects or data streams... Resampled or modified contents would not be detected though.
- Re: (Score:2)
  
  by sootman ( 158191 ) writes:
  
  I wrote to google and yahoo a year or two ago suggesting they implement this but never heard back from either and have not seen it implemented. (Please, everyone WRITE!*) It would be the COOLEST THING EVER for a number of reasons. Say you downloaded a picture off the web. A year later, you stumble across it and decide you want to see if the site has any more similar pics. You could just md5 the image and search for that. (Of course this could be made very easy for non-technical users: Google could have a li
  - Re: (Score:2)
    
    by straponego ( 521991 ) writes:
    
    I'd mod you up if I could. Another benefit of this would be the network effect on hashing tools. Yeah, any linux/osx/unix user has them already, and they're easy to get for Windows as well. But if google started exposing this, tool makers would follow. This would really boost infrastructure and standards for things like p2p apps, desktop search, backup tools, Internet-hosted storage, etc. The ??AA would also want to use it, of course, and this might even be a reason google has refrained from making it
Copying is great! (Score:2)

by MarkByers ( 770551 ) writes:

After all they just copied http://copyscape.com/ [copyscape.com] 's idea.
- Re: (Score:2)
  
  by asadodetira ( 664509 ) writes:
  
  That's the first thing that came to mind when I saw the article. It's been around for years. I've used a it a few times and was amazed to find one of my random website texts in other peoples's work (It was properly cited so I don't complain).
Negotiate Monitization? (Score:2)

by eno2001 ( 527078 ) writes:

Why the fuck does everyone want to be paid for every little thing these days? Sure, wholesale piracy is one thing. I disagree with the idea that people should be trading movies and music online with no restrictions at all. If you want an album, buy it. If you want software that costs something, buy it or learn to use free/open software. If you want to see a movie, pay to watch it in the theater or rent the DVD when it comes out. But, where this all falls apart is when someone quotes someone else onlin
- Re: (Score:2, Funny)
  
  by FireFury03 ( 653718 ) writes:
  
  If the industry had their way, rap music would have never happened
  
  I don't understand... your post seems to imply this is a Bad Thing?
- Ringtone (Score:2)
  
  by tepples ( 727027 ) writes:
  
  If you want an album, buy it. If you want software that costs something, buy it or learn to use free/open software.
  So where's the free/open alternative to an album?
  Or... someone uses a popular song as the music bed in their Youtube video and the entire video clip is only 25 seconds long
  A ringtone is 25 seconds long, as that's how long it takes for the call to be routed to voice mail.
  or the quality is so poor that no one in their right mind would consider keeping it as something to put on their iPod.
  Over a mobile phone's ringer, quality matters little.
  Whatever happened to the concept of fair use and encouraging people to build upon the works of others?
  Sonny Bono happened [pineight.com].
It's just a tool (Score:2, Insightful)

by 91degrees ( 207121 ) writes:

As long as it respects basic internet rules of conduct (including respecting robots.txt), then this is ethically neutral.

It all depends on how it's used. Many companies would prefer to avoid coypyright infringing material, and will take it down if the existence is pointed out to them. Many companies will simply be asking others to remove material which clearly and flagrantly breaches their copyright. This is perfectly reasonable behaviour.
Fair Use Issues (Score:2)

by MrLizard ( 95131 ) writes:

Of course, "a few sentences of text or a few seconds of video" most likely are being used within legal fair use boundaries. So what's going to happen is that the corporate law firm will grab this program, then send out auto-takedown notices without a human being (to the extent anyone working in the legal department meets that criteria) ever looking to see if the use is even arguably a violation of copyright. Then you'll get the backlash where at least one such auto-generated letter makes its way to someone
what's their probability of false alarm? (Score:2, Insightful)

by Anonymous Coward writes:

This may be much less helpful than its promoters claim.

First of all, what's the their probability of a false alarm? Even if they false alarm fairly infrequently, the vast amount of content on the Web means they could easily have a flood of false alarms, in addition to whatever actual copies are found. The user of the system is then going to have to have human beings sift through that flood to identify what's A) really a copy, B) whether that copy is infringing or not, and C) if so, is it worth taking actio
- Re: (Score:2)
  
  by DamnStupidElf ( 649844 ) writes:
  
  First of all, what's the their probability of a false alarm? Even if they false alarm fairly infrequently, the vast amount of content on the Web means they could easily have a flood of false alarms, in addition to whatever actual copies are found. The user of the system is then going to have to have human beings sift through that flood to identify what's A) really a copy, B) whether that copy is infringing or not, and C) if so, is it worth taking action against the infringer?
  
  You must be new here. The sol
Wait a minute (Score:2)

by Billosaur ( 927319 ) * writes:

Ok, it's supposed to be unlawful to access copyrighted information on the Internet without the copyright holder's permission, right? I mean, that's the gist of the *AA's arguments right -- we hold the rights, you can't access this material unless we say so. So if the tool has to access the information to determine the copyright, wouldn't it be violating that principle? Nitpicking I know, but an interesting thought. They'd have to get dispensation from the *AAs to do it, wouldn't they?
If you value your "property" so much... (Score:2, Insightful)

by Anonymous Coward writes:

...then do not put it to the Internet.
In fact, burn it to a DVD and lock it up to a safe, and never talk about it. That way nobody else will ever have access to your "intellectual property".
- Re: (Score:2)
  
  by AmberBlackCat ( 829689 ) writes:
  
  Unless you're selling books, CD's and DVD's and somebody else puts it on the Internet.
Scan Blocking (Score:2)

by Daemonstar ( 84116 ) writes:

Proactive firewalls (IDS) properly configured should shut the "scan" down relatively quickly, no? Besides, if the service is provided by a specific location (IP block), then IP blocking is trivial.

On another note, so now they are going to throw more traffic over the Internet? :P
- Re: (Score:2)
  
  by Opportunist ( 166417 ) writes:
  
  It's just a little more noise in the static between Spammails and Windows boxes phoning home.
Now SCO can continue... (Score:2)

by filesiteguy ( 695431 ) writes:

This is the tool Micros - um, I mean - SCO has been waiting for. They can now just scan all those millions of Linux Servers on the intraweb and see their copyrighted code right there in the open....

...or maybe not.
What a waste (Score:2)

by j00r0m4nc3r ( 959816 ) writes:

Like there's any copyright infringement on The Interweb. I don't see how a whole book could fit in those tubes...
- Re: (Score:2)
  
  by glenstar ( 569572 ) writes:
  
  From what I understand They (the Illuminati) have secretely made the tubes bigger. You can even put whole CDs and DVDs through them.
Evidence of a disease. (Score:2)

by GodInHell ( 258915 ) * writes:

The problem: your services as a content mitigator have been rendered useless by the appearance of a medium which is so cheap as to appear free, so fast as to appear instant, and so easy as to appear effortless.
The cure, corrosive, caustic and highly dangerous responses flooded into the arteries of your survival - a general failing of the organs of service, and an increasingly gruesome appearance as you stamp on the consumer and turn on your distributors looking for signs of theft and duplicity.
Prognosis -
Copyright protection for the rich only. (Score:2)

by John Sokol ( 109591 ) writes:

I find my stuff copied and plagiarized all the time, and it's nearly impossible to enforce without a large budget for lawyers. From inventions to source code to writing.
More then I could ever possible list here, but I have come to realize it's in the nature of things.

So now big cooperate America are going to get even better at chasing stuff down and coming after everyone that even borrows a paragraph now. Using there intimidation tactics.

The place where i
- Re: (Score:2)
  
  by Reziac ( 43301 ) * writes:
  
  "This system can tell when you copy from then, but not when they copy from you....."
  
  That's the best point anyone's made here today. How does the tool know if the person doing the scanning is the actual originator of the content? It can't. It can only go by the subscriber's say-so.
Well, that's Ironic (Score:2)

by cfulmer ( 3166 ) writes:

They're going to be COPYING stuff from websites into their index so they can perform paid searches on it. Why isn't that copyright infringement all by itself?

If somebody were to sue them, they would have to claim that theirs is a fair use. But, many large copyright holders (i.e. their potential customers) would vehemently disagree with such a position. That's an interesting position to be in.
I can see another use for this software (Score:2)

by exp(pi*sqrt(163)) ( 613870 ) writes:

It'll save me the time I spend doing 'vanity' web searches.
Sounds like TurnItIn (Score:2)

by Kelson ( 129150 ) * writes:

Anyone else ever had their site visited by the Turnitin [turnitin.com] bot?

And the article mentioned Copyscape [copyscape.com], which is more aimed at finding dupes of web pages (you enter a website, and it looks for similar pages in their index).
"...may decide the content is being use fairly..." (Score:2)

by yar ( 170650 ) writes:

Of course, some nice things about fair use are that
a) the creator of the copyrighted content does not get to decide whether the use is or is not fair;
b) although the amount being used is one of the factors used to evaluate fair use, it is by no means the only factor, and in some situations using more than a limited amount is fair.
No technology can make that evaluation, and copyright holders don't get to, either.
How to detect your IP! (Score:2)

by merc ( 115854 ) writes:

/sbin/ifconfig -a

Walla!
Countermeasure (Score:2)

by hoggoth ( 414195 ) writes:

So now as a countermeasure someone will produce a tool to scramble the lowest order/frequency information in the file. For example, randomize the lowest order bit in an image, randomly exchanging black[#020202] and black [#020302]. For videos and music randomize the lowest frequency that is below the threshold of viewing. It will take horsepower to reencode the files, but it only has to be done once. You only need to change one bit for a fingerprint to fail.

And the arms race goes on...
- Re: (Score:2)
  
  by Da_Weasel ( 458921 ) writes:
  
  I don't claim to know how they have or might develop this system, but it seems to me that if they plan on dealing with a file being encoded by different people in different formats, with different quality levels then your "low order bit" theory isn't going to do jack to stop them. It seems to me like a pretty trivial thing to add thresholds to these checks to allow slight to moderate variations in the finger print.
  
  Remember they don't have to 100% identify content as unauthorized copyrighted material with t
  - Re: (Score:2)
    
    by Opportunist ( 166417 ) writes:
    
    These people are as retarded as you might think.
    
    I wonder what Dr. Freud would think 'bout that slip...
CopyScape (Score:2)

by rakerman ( 409507 ) writes:

Nice venture-capital-boosting announcement there, but CopyScape [copyscape.com] has already been doing this for years, albeit for text only.
What concerns me: (Score:2, Interesting)

by botlrokit ( 244504 ) writes:

I'm bothered by this type of scenario:

"Dear [webmaster]:

It has come to our attention that your website, [sh*touttaluck.com], does not meet compliance in terms of a variety of copyright laws of the United States and other countries. Infractions indicated by our software include, but are not limited to:

Images created with an unregistered copy of Adobe Photoshop
Flash files created with an unregistered copy of Macromedia Studio MX 2004
PDFs created with an unregistered copy of Adobe Acro
- Re: (Score:2, Funny)
  
  by PPH ( 736903 ) writes:
  
  ...html created with an unregistered copy of vi.
may decide content is fair use (Score:2)

by Da_Weasel ( 458921 ) writes:

In some cases, they may decide the content is being used fairly or to acceptable promotional ends.
Riiiiiiiiiiight.....!
I've experienced it from both sides. (Score:3, Informative)

by bcrowell ( 177657 ) writes: on Tuesday December 19, 2006 @05:20PM (#17305144) Homepage

I've experienced this from both sides.

I have a bunch of my books on the web, and every once in a while I do a search on some text from my own books to see who else is mirroring them. The books happen to be copylefted (dual-licensed GFDL/CC-BY-SA), but I'd like to know who's mirroring them, and check whether they're violating the license. A lot of people just seem to be hoarding the PDF files on their university servers, maybe because they're afraid my web site will disappear; that's flattering. One guy was selling them on CDs on e-bay, violating my license (claimed they were PD, didn't propagate the license). Another guy translated them to html, with lots of errors, changed the license to a more restrictive one, and put his own ads up; he fixed the licensing violation when I complained, and in a way it was a good thing, because it motivated me to make my own html versions (which are now bringing me a significant amount of money from adsense every month). One kind of annoying thing about mirroring is that the people who are mirroring never bother to update their mirrors, but in general I just figure there's no such thing as bad publicity :-)

From the other side, I once received an e-mail from a museum in the UK that was complaining that I was using a 17th century oil painting of Isaac Newton. I guess they own the original, and they may also have been the ones who did the scan that I found in a google image search, but under U.S. law (Bridgeman Art Library, Ltd. v. Corel Corp.), a realistic reproduction of a PD two-dimensional art work is not copyrightable. What really surprised me was that they came across it at all, because at that time I think my book was only in PDF format, and hadn't been indexed by google because the file size was too big.

The whole thing doesn't seem negative to me in general. It makes just as much sense as people doing a vanity search in Google before they apply for a job, or authors watching their amazon.com sales rankings obsessively. I guess the most obvious potential for abuse would be if they send a nastygram to your webhost, and your webhost is a low-end one that figures it's not worth their time to keep your account, so they just shut off your account.

Share
twitter facebook
robots.txt may be moot (Score:2)

by MooseTick ( 895855 ) writes:

It wouldn't be too hard to make this software by looking up key phrases of a web site in google. If there is an exact hit, then there may be a copyright violation.

How hard would it be to intelligently grab chunks of YOUR web site and then Google those parts. Then grep the results. If there is/are positive hits (not from your domain) then light up the dashboard. If you wanted to be extra picky, query yahoo, msn, google, and whoever else you like to search with.
Duplicate! There's a surprise! (Score:2)

by Baloo Ursidae ( 29355 ) writes:

Mmmmm, eight year old duplicate [slashdot.org]. This one really takes me back to my high school days.
A decent tool already exists (Score:2)

by trawg ( 308495 ) writes:

See: www.google.com

Searching for +mp3 intitle:index.of +[insert your favourite artist here] would be enough to keep these jerks busy for a while.
- Re: (Score:3, Interesting)
  
  by AKAImBatman ( 238306 ) * writes:
  
  Pretty sure this is a dupe, or so closely related to an earlier story as to not matter.
  It's not a dupe. (Unless you count anything that appears on Digg first to be a dupe.) However, it's also not the first story of its kind. About a gazillion companies have formed with the exact same business plan (save for the "hotness" at the time being digital music) and about a gazillion of those companies have failed to develop software that catches anything but the most obvious infractions.
  
  Every so often, some RIAA/MP
  - Re: (Score:2)
    
    by novus ordo ( 843883 ) writes:
    
    The legal implications of this tool greatly outweighs the technical considerations. Especially when you consider that there is a good chance somebody from another country might be infringing and then you get into a big mess of bureaucracy. But I think these sorts of ventures will ultimately fail because they underestimate the honesty of most people. See this [uchicago.edu] interesting little tidbit from Freakonomics for a telling example.
- Re:Dupe (Score:4, Interesting)
  
  by Maximum Prophet ( 716608 ) writes: on Tuesday December 19, 2006 @12:53PM (#17301106)
  Since copyright lasts a long time and doesn't depend on being defended like trademark, there will be some allowances "for promotional reasons" like this:
  
  Leak copywritten material in easy to copy format to places where it will be copied
  
  Watch viral marketing campaign take over
  
  Profit
  
  Wait 'til revenue falls
  
  Find infringers using new scan tools
  
  Sue them
  
  Profit more!!!
  Parent Share
  twitter facebook
- A real use on /. (Score:3, Funny)
  
  by EmbeddedJanitor ( 597831 ) writes:
  
  The editors could run this tool just on /. to check for dupes!
- Re: (Score:2)
  
  by SeaFox ( 739806 ) writes:
  
  I don't see how this will change anything; copyright holders still have to pay lawyers to go after infringing sites/servers so there is still a bottleneck.
  Well, since the major media conglomerates have lawyers on salary, it wont effect their costs at all. They'll just send a letter to your ISP/host and the host, fearing legal costs of their own (since they DON'T have a lawyer always available), will bow down and pull your whatever from the servers.
  
  Many don't seem to follow the proper procedure of like, you
- Five words corporate response (Score:2)
  
  by Opportunist ( 166417 ) writes:
  
  We don't give a fu..

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Wager (Score:4, Insightful)

Raise. (Score:4, Funny)

Re: (Score:2)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Raise. (Score:5, Funny)

His IP is my IP to (Score:2, Funny)

Re:Raise. (Score:4, Interesting)

Re: (Score:3, Insightful)

Re: (Score:3, Informative)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re:i don't like robots.txt anyway. (Score:5, Informative)

Re:i don't like robots.txt anyway. (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Can't they just use google or torrent sites? (Score:3, Informative)

Re: (Score:3, Funny)

Re: (Score:2)

buh (Score:5, Insightful)

Re:buh (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Spam obfuscation techniques suddenly useful... (Score:2)

Yeah.. good luck with that. (Score:2)

Fighting an avalanche with a snow shovel (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Yeah (Score:4, Interesting)

Re:Yeah (Score:4, Insightful)

Re:Yeah (Score:4, Interesting)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:2)

and in little pieces, they will consume bandwidth (Score:2)

Software is in beta (Score:3, Funny)

Some interesting questions... (Score:5, Insightful)

Re: (Score:2)

Dashboard (Score:2)

search by hash? (Score:4, Interesting)

Re:search by hash? (Score:4, Informative)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Copying is great! (Score:2)

Re: (Score:2)

Negotiate Monitization? (Score:2)

Re: (Score:2, Funny)

Ringtone (Score:2)

It's just a tool (Score:2, Insightful)

Fair Use Issues (Score:2)

what's their probability of false alarm? (Score:2, Insightful)

Re: (Score:2)

Wait a minute (Score:2)

If you value your "property" so much... (Score:2, Insightful)

Re: (Score:2)

Scan Blocking (Score:2)

Re: (Score:2)

Now SCO can continue... (Score:2)

What a waste (Score:2)

Re: (Score:2)

Evidence of a disease. (Score:2)

Copyright protection for the rich only. (Score:2)

Re: (Score:2)

Well, that's Ironic (Score:2)

I can see another use for this software (Score:2)