Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Open Source Piracy Social Networks Science

Redditors Aim to 'Free Science' From For-Profit Publishers (interestingengineering.com) 63

A group of Redditors came together in a bid to archive over 85 million scientific papers from the website Sci-Hub and make an open-source library that cannot be taken down. Interesting Engineering reports: Over the last decade or so, Sci-Hub, often referred to as "The Pirate Bay of Science," has been giving free access to a huge database of scientific papers that would otherwise be locked behind a paywall. Unsurprisingly, the website has been the target of multiple lawsuits, as well as an investigation from the United States Department of Justice. The site's Twitter account was also recently suspended under Twitter's counterfeit policy, and its founder, Alexandra Elbakyan, reported that the FBI gained access to her Apple accounts.

Now, Redditors from a subreddit called DataHoarder, which is aimed at archiving knowledge in the digital space, have come together to try to save the numerous papers available on the website. In a post on May 13, the moderators of r/DataHoarder, stated that "it's time we sent Elsevier and the USDOJ a clearer message about the fate of Sci-Hub and open science. We are the library, we do not get silenced, we do not shut down our computers, and we are many." This will be no easy task. Sci-Hub is home to over 85 million papers, totaling a staggering 77TB of data. The group of Redditors is currently recruiting for its archiving efforts and its stated goal is to have approximately 8,500 individuals torrenting the papers in order to download the entire library. Once that task is complete, the Redditors aim to release all of the downloaded data via a new "uncensorable" open-source website.

This discussion has been archived. No new comments can be posted.

Redditors Aim to 'Free Science' From For-Profit Publishers

Comments Filter:
  • Staggering? (Score:4, Interesting)

    by snookerdoodle ( 123851 ) on Tuesday May 25, 2021 @05:56PM (#61421810)

    Really? Is 77 tb still considered "staggering"?

    • Re: (Score:3, Informative)

      by fleeped ( 1945926 )

      If a scientific paper is 3-10MB, yes, it's a staggering amount. We're not counting blu-ray rips of the extended LoTR trilogy

      • I think what was meant is that 77TB is no big deal in terms of storage space. You can buy some hard drives to store 80TB for under $1000. Staggering would be data expressed in exabytes or at least hundreds of petabytes.

        • Poor phrasing in the story I guess. What's staggering is not the storage space but the amount of papers available there

    • Of text, yes.

      • by rtb61 ( 674572 )

        It's not the text, it is the graphics and how well compressed they have been, probably not much at all. You could probably write a program to go through all those files, pick out all the graphic files and convert them to more compact forms, the data amount will likely shrink enormously. Text takes up very little storage, graphics files are huge, well, not so much any more (not like the olden days, employees trying to jam modems with massive uncompressed image files). Find every image file put them into bett

        • Since most of it will be as PDFs, and most people generate PDFs with images down-sampled to 300 or even 150 dpi, the graphics aren't likely to be that much of an issue.

          If you read a lot of science papers (as I do - see my submissions record) you'll often see a page render pause for a minute or two as the data for drawing a graph is parsed, and the graph is drawn block by block. No "image file" to re-process.

          Feel free to run your own experiment on a TB or so (my 20-year habit of storing every paper I read

    • I agree. I have 7tb of storage on my computer and almost *all* my friends have 3 TB now.

      So that's only about 12-15 PC's worth of of storage. And could easily be in the 10 PC range.

    • It is for redditors who only have $100 to their name.
    • 85 million papers, even if you read one every minute it would take two lifetimes to read them all if you don't eat or sleep.

  • I have about 50TB of space on my home server

    How is 77TB "staggering"?

  • Great! (Score:5, Interesting)

    by locater16 ( 2326718 ) on Tuesday May 25, 2021 @06:02PM (#61421832)
    Sounds great, needs to be a way to keep track of, search for everything, as well as upload more stuff. But hells I'll volunteer.

    Sci-hub is used on the regular by a ton of researchers. Who knows what advances, and how many lives, have been saved because of it. It's great to know central authorities would rather see the advancement of humanity curtailed and to have dead bodies pile up so the profits of a handful of functionally useless bastards can keep rolling in.
    • by jafac ( 1449 )

      Yes; and the problem is that this is made VERY clear in Article I of the US Constitution concerning copyright, whose purpose is spelled out clearly: "To promote the useful arts and sciences".

      If it does not promote the useful arts and sciences, then copyright interdiction of material is actually unconstitutional, (no matter what some paid-off judge says on behalf of the media monopolies).

  • by Coius ( 743781 ) on Tuesday May 25, 2021 @06:03PM (#61421840)

    Just picture this. You are sitting there eating your dinner, and a tear canister comes through the window. All of a sudden the FBI comes in and SWAT team ordering you and your family to the ground. They check the PC in the corner. Chief says "Yeah, it's pure!" after clicking the mouse a few times and tapping some keys.
    You are arrested and marched to some unknown detainee center.
    Inside the interview room. The DA flops down a packet with papers.
    On top you notice a Potato calculator with a bunch of words and diagrams.
    "Recognize this?" the DA Says
    You stare harder.
    "It's a Potato calculator?" you ask
    DA: "Yes, a potato calculator. or rather, a study paper on potato calculators in space. You sickos creep me out. You peddle info like this, that we paid to have made and then another company claims to own. You allowed THIS smut to be open to anyone. How dare you!"
    You stare blankly
    DA: "Close the door" as he nods to the security officer. "I'm going to make him regret making open information available to anyone without paying anything. "

    Seems familiar? the MPAA did it, the RIAA did it. I'm talking about using the government to do the dirty work of fucking with people and then profiting while others suffer.

    This is how you get 1984. This is how you make people dumb. God I wish there was a purge. These people that own these companies would be turned on by the public by rabid hate towards them after all this time.

    It's so sad, half the papers might even be crap. We paid for them, and yes they are crap, but *WE* paid for them. Not the assholes that keep the papers behind lock and key.

    If you want to see massive oppression going on in a country, look no further than how information is kept from the public. I see a similarity between this and churches of the past, when the churches even kept recipes of beer from the public to prevent them from helping themselves and grow. We need a new renaissance.

    • The Renaissance artist or engineer had a patron. The cleric or the merchant price - notoriously unforgiving and dangerous to cross.

      I remain skeptical of the geek's eternal quest for the "immortal" pirate website.

    • I see a similarity between this and churches of the past, when the churches even kept recipes of beer from the public to prevent them from helping themselves and grow.

      Citation needed. I know it's easy to hate on religion, but c'mon ...

    • Re: (Score:3, Interesting)

      Slight correction because you got your dystopias wrong. That isn't a 1984 scenario. 1984 is the end state of an all encompassing uber state. Big Brother does not tolerate private enterprise. There is only the state.

      This scenario, where corporate interests control the government and get it to do its bidding is far more cyberpunk in feel. That's helped along by that fact that its a technological subject, which separates it from previous historical examples of mercantilism where "private" firms gained

  • Yeah.. Uh huh... On a new a new "uncensorable" hosting service. Protected by a moat, with snakes and alligators

    • several ways that can be done, if you haven't noticed untraceable and unstoppable piracy is long a done deal.

      • I suspect the NSA has more important things to do than chase down a bunch of teenagers with their "untraceable and unstoppable" piracy.

        • NSA doesn't need to chase anyone down, can track your phone and listen to your calls anyway, if they wanted to bother. I doubt NSA cares about science papers, mainly money grubbing lawyers and publishers would.

  • Somebody has to pay (Score:2, Interesting)

    by lfp98 ( 740073 )
    The fact is, it takes a lot of effort and big money to review, format, and enforce the quality of scientific papers and to maintain and secure an online database of articles. When I went to graduate school in 1973, they were already talking about publication methods that bypassed profiteering publishers, where authors would just be charged a nominal fee to archive their work in a form freely accessible to all. But in practice the "nominal" fee turns out to be thousands of dollars. The Public Library of Sc
    • by jaa101 ( 627731 ) on Tuesday May 25, 2021 @08:22PM (#61422136)

      The fact is, it takes a lot of effort and big money to review, format, and enforce the quality of scientific papers and to maintain and secure an online database of articles.

      The reviewing and enforcing of quality is not done or paid for by the publishers; researchers review each other's work in a peer-review process. Sci-hub seems to maintain a secure online database of articles without charging authors or readers. Which leaves just formatting from your list.

      One issue I can see with this new project is that it may strain Sci-hub's resources if people try to scrape their whole repository. Hopefully there's some communication so that they can work out a minimally disruptive approach.

      • Re: (Score:3, Insightful)

        by pz ( 113803 )

        The reviewing and enforcing of quality is not done or paid for by the publishers; researchers review each other's work in a peer-review process. Sci-hub seems to maintain a secure online database of articles without charging authors or readers. Which leaves just formatting from your list.

        You forgot about running the reviewing process, and the staff necessary to do an initial screening of submissions. Remember, not every submission gets sent out for peer review. Moreover, typesetting is a non-trivial cost, especially if you want it done with a modicum of finesse. And, since we're talking about businesses, they do need to make a profit. As stated in an earlier post, these motivations have been the reason many societies decide to publish their own journals, and they end up realizing that t

        • Re: (Score:2, Informative)

          You forgot about running the reviewing process, and the staff necessary to do an initial screening of submissions.

          Please stop pretending you actually know anything about this topic. You can't logic your way into a different reality.

          In almost all journals the editors are unpaid academics too.

          Don't believe me? Pick some journals look up the editors and you'll find they have full time academic jobs. I actually know people who do journal editing. I've done area chair work at computer science conferences (which

          • by pz ( 113803 ) on Wednesday May 26, 2021 @07:57AM (#61423546) Journal

            I'm sorry you appear to be having a bad day and feel compelled to respond in such an insulting way, rather than engaging in civil discourse.

            Not only do I have many publications -- both in computer science and in biology -- I run a small (very small) journal. In all cases of non-society journals that I'm familiar with (and, frankly, computer science is a tiny field compared to biology where the big players that everyone loves to hate operate), there is a paid, professional staff. There are offices. There are associate editors that quickly review each submission and make an initial decision for a hard reject or not. That's their full-time job, reading submissions and keeping up-to-date in their assigned specialties. Most submissions get immediately rejected; the trickier submissions get discussed with upper staff before a go/no-go decision is made; and the rest are sent out to an editorial board member to start the peer review process. For many journals, the editorial board is quite large, and those positions are typically unpaid. But there is still a permanent, paid, professional staff. But, hey, what do I know, I've only been on the editorial board of two large journals!

            Yes, Knuth invented TeX (to typeset his *books*, let's remember), and I love it. I use it whenever possible. But let me give you statistics on its usage for the small journal that I run: we offer templates in DOC and TeX form, and less than 5% of straight biology papers use TeX, while about 50% of computational biology submissions do. Most biologists don't know what TeX is, and if you're going to criticize the publication process, you should start with the largest segment, which would be biology. Since my journal includes computational biology, our statistics are skewed: straight biologists, which is where the vast majority of research is currently done, both in terms of funding and publications, don't know TeX. If, as a publisher, you are willing to accept variations in presentation from paper to paper that invariably creep in no matter how strict and stringent your Guide to Authors (again, I write from experience as a publisher here) then, sure, let your authors do the formatting. You'll end up with a crappy looking journal that won't get much attention.

            I have many papers, too. Only the IEEE BME papers were submitted with TeX, the rest were in DOC, because that's what biology uses. Biologists are not good typesetters. (Heck, professional typesetters aren't always good at their job either, as my most recent paper shows.) Some biologists are terrible typesetters.

            Again, looking at the publication process from the author's view is not understanding the full picture of how a journal actually runs. The proof is in the pudding: academic societies that start their own journals in part to escape the large publication costs of the big players end up charging not that much less. And the reason is clear if you stop to think about it for a minute: just because as an author you are willing to do the incremental work necessary for your paper for free, there's a lot more that happens, and people who do that work as their job need to get paid to do it. Even if they're doing it only part time, they need to get paid. The biggest cost in both academic publication and research? Salary support. But my experience is from running a lab and a small publishing house for a couple of decades now, and I recognize that I don't know everything. I haven't run a big journal, for example, only talked to people who do.

          • As it currently stands in practice I have a hard time believing that peer review is actually used on any real scale. As you stated, its the norm that the peer review staff of a given journal are not paid and are unable to dedicate the time or resources to properly do it. And we have an ever growing pile of publishing scandals were either fake papers were intentionally pushed as a test of rigor or actual bad scholarship with made up, fake or simply unreproducible results were published. I personally witnesse

            • Re: (Score:2, Informative)

              As you stated, its the norm that the peer review staff of a given journal are not paid and are unable to dedicate the time or resources to properly do it.

              I made no such claim. please don't just assign your own opinions and assumptions to me.

              • I made no such claim. please don't just assign your own opinions and assumptions to me.

                Wat

                Don't believe me? Pick some journals look up the editors and you'll find they have full time academic jobs. I actually know people who do journal editing. I've done area chair work at computer science conferences (which is more or less the same). The money goes on the venue and stuff. None of the editors, chairs, area chairs or reviwers are paid.

                If that's not what you meant then please be more precise with your speech. You explicitly said they weren't paid and implied heavily that they have limited time available for it.

                • Stop being a dickhead, k?

                  I at no point claimed they didn't have time to do a good job not did I imply that.

                  • I'm not being a dickhead, my speech is nothing but polite. I would advise you to avoid applying emotion to text that has none.

                    I at no point claimed they didn't have time to do a good job not did I imply that.

                    Whether you intentionally did or not that is how it was perceived. Instead of being aggressive and hostile you can just say, hey I didn't mean to imply that. And then I would say something sorry I guess I misinterpreted what you said.

                    Politeness is met in kind and so is rudeness and hostility.

                    • I'm not being a dickhead, my speech is nothing but polite.

                      Claiming I said something that I did not say is the height of rudeness whether you say it with a curse or a smile.

                    • I'm sure you're fun at parties. You don't correct misunderstandings by biting someone's head off, asshole. Have a nice day.

                    • Yep still a dick.

                      I corrected your misunderstanding and you replied "wat" and then doubled down. It's bad enough making a claim that I said something I didn't (you didn't say I implied it, you said I said it), then doubled down when you were corrected. Now you're acting all high and mighty over some supposed standard of politeness which somehow ignores you being a total cockwomble.

      • by vix86 ( 592763 )

        One issue I can see with this new project is that it may strain Sci-hub's resources if people try to scrape their whole repository.

        The smart decision would be to try and get in contact with the Sci-Hub crew in Russia and either ship them some HDDs to back up onto or send over some archival tapes, assuming both sides have compatible tape drives on hand. You can get like a pack of 6 LTO-7 tapes (6 to 15TB/each) for just under $500 which is about the price of a single large HDD.

      • by bh_doc ( 930270 ) <brendon@nosPaM.quantumfurball.net> on Tuesday May 25, 2021 @10:12PM (#61422380) Homepage

        Publishers rarely even do much formatting, putting a lot of that burden also on the authors by way of thorough (and strict) submission format requirements. And after jumping through those hoops, you have to check closely to ensure that the copy editor hasn't ballsed something important up (pro-tip: they have).

        I'm a published scientist. It's a racket. There are two reasons why these "high-tier" journals are sought after: because it's hard to get accepted (thereby attributing renown by proxy), and because funding and promotion is predicated on publishing in them (based on the assumption of renown by proxy). They provide no real value.

    • by pz ( 113803 )

      I've been trying to make the same points here for yonks. The publishers, while they do make a profit, are not bringing zero value to the proposition. They form a filter which, in an ideally virtuous cycle, means that they build a reputation on the quality of work associated with their name. Thus if you read a paper published on one of the Frontiers journals, you know it is probably scientifically sound, but probably not answering a very interesting question, but if you read a paper in Nature, it probably

    • by oldbox ( 415265 )

      Academics peer review articles for free. That does not cost the publishers anything.

         

    • by Uecker ( 1842596 )

      The publishers merely do the type setting and provide infrastructure. The reviewing is done by academics for free.

      While $2500 (usually a bit less) is some noticeable cost, it is not a serious burden for a lab. The typical cost of research that goes into a paper is more like $100000. This number is somewhat of a rough guess and may vary a lot across fields and location, so take it with a grain of salt, but it should give you an idea. The cost could roughly reflect a (underpaid) scientist working for one year

    • by Pimpy ( 143938 )

      The role of the publisher is little more than issuing a print-on-demand order, trying to gamify their impact factor, and figuring out how much the market is willing to pay for any given publication. There are ~500 page books that sell for 50 EUR and 150 page ones that sell for 300 EUR, it's clearly not an effort thing. Formatting and typesetting? Done almost entirely by the authors, the journals just provide them with templates, most of which are derivative. Review of submissions? Done by a program committe

    • I agree with your position: creation consumes resources, SOMEBODY has to pay/invest in order for us to "have nice things". Think of this as parallel to music. Musicians should be compensated for their work -- including the costs they incur for renting a studio, etc. Consumers should pay something, even if a pittance. Why? Because paying nothing makes you a parasite. Show some respect for the original creator by at least paying something, e.g., standard rate = 0.1% of initial. But, as with all food cha
  • by labnet ( 457441 ) on Tuesday May 25, 2021 @08:44PM (#61422174)

    I wish someone would do this for tens of thousands of standards used around the world, especially the mandatory ones hidden behind paywalls.
    Seems like they charge $100 for 20 pages.

    • by Pimpy ( 143938 )

      That's the really strange part, you'd figure that standards bodies would be all about people adopting their work and building compliant solutions, particularly as a significant amount of their revenue stream comes from certification. Charging people thousands to e.g. look at your REST API pretty much encourages people to go their own way.

  • I got about 10 TB spare on my home NAS.

    Maybe I should contribute the space for hosting some of the files / torrents, thru appropriate vpns, etc.

  • Comment removed based on user account deletion
  • All of the SciHub mirror torrents have a minimum of 5 seeds now. Some have over 500 seeds, making them better-seeded than many older Hollywood movies.

    Next phase, get every torrent up to over 10 seeds. The list of torrents under 10 is here [phillm.net].

Avoid strange women and temporary variables.

Working...