The EPA Plans To Sunset Its Online Archive (theverge.com) 30
Come July, the EPA plans to retire the archive containing old news releases, policy changes, regulatory actions, and more. The Verge reports: The archive was never built to be a permanent repository of content, and maintaining the outdated site was no longer "cost effective," the EPA said to The Verge in an emailed statement. The EPA announced the retirement early this year, after finishing an overhaul of its main website in 2021, but says that the decision was years in the making. The agency maintains that it's abiding by federal rules for records management and that not all webpages qualify as official records that need to be preserved.
The EPA says it plans to migrate much of the information to other places. Old news releases will go to the current EPA website's page for press releases. When it comes to the rest of the content, the EPA has a process for making case-by-case decisions on what content can be deleted -- and what is relevant enough to move to the modern website. Some content might be deemed important enough to join the National Archives. The public will be able to request that content through the Freedom of Information Act.
The archive is the only comprehensive way that public information about agency policies, like fact sheets breaking down the impact of environmental legislation, and actions, like how the agency implements those laws, have been preserved, [says Gretchen Gehrke, one of the cofounders of a group called Environmental Data and Governance Initiative (EDGI) that's fighting for public access to resources like the EPA's online archives]. That makes the archive vital for understanding how regulation and enforcement have changed over the years. It also shows how the agency's understanding of an issue, like climate change, has evolved. And when the Trump administration deleted information about climate change on the EPA's website, much of it could still be found on the archive. Besides that, Gehrke says the content should just be available on principle because it's public information, paid for by taxpayer dollars.
The EPA says it plans to migrate much of the information to other places. Old news releases will go to the current EPA website's page for press releases. When it comes to the rest of the content, the EPA has a process for making case-by-case decisions on what content can be deleted -- and what is relevant enough to move to the modern website. Some content might be deemed important enough to join the National Archives. The public will be able to request that content through the Freedom of Information Act.
The archive is the only comprehensive way that public information about agency policies, like fact sheets breaking down the impact of environmental legislation, and actions, like how the agency implements those laws, have been preserved, [says Gretchen Gehrke, one of the cofounders of a group called Environmental Data and Governance Initiative (EDGI) that's fighting for public access to resources like the EPA's online archives]. That makes the archive vital for understanding how regulation and enforcement have changed over the years. It also shows how the agency's understanding of an issue, like climate change, has evolved. And when the Trump administration deleted information about climate change on the EPA's website, much of it could still be found on the archive. Besides that, Gehrke says the content should just be available on principle because it's public information, paid for by taxpayer dollars.
Archivists in every field feel this. (Score:4, Insightful)
The problem is everywhere, and nothing new or unique to the EPA website.
While the web has made many resources more available, they are also more fragile than ever. Published legal documents get silently modified or vanish. Research papers get amended, or vanish.
Long ago you could go to an archive to find an ancient newspaper for a specific date, magazine, or a university collection of government publications for an obscure data collection. This has been dying off for the past 30 years, sometimes more detrimental than others.
If they won't agree to put it on archive.org.... (Score:2)
...then they want the past records to disappear.
Re: (Score:3)
Bingo!
Had a situation where the local magistrate online listing of a law was seemingly changed. Had recollection of the previous version, so asked when the law had been revised.
No record of that, and much hemming and hawing later put it to a typo. Typo that ended up costing several, thousands of dollars and threats of lawsuits.
Memory holing of official documents. Especially now, storage is cheap. There is no reason for this.
Re: (Score:3)
Those who control the past, control the future.
Re: (Score:2)
Re: (Score:2)
I would think this collection is something that should be shipped to the Library of Congress.
Re: (Score:2)
We have a National Archives. The Library of Congress is for a different sort of material.
Re: (Score:2)
The best argument against anthropomorphic global w (Score:1)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Cost effective? (Score:2)
If not that many people are using the archive, it shouldn't cost that much to keep a couple terabytes sitting on disk.
If it's about paying for bandwidth, if that many are hitting that much data, it needs to be retained and upgraded.
If it's about hiding and deleting information, those responsible should be replaced with a competent staff.
Re: (Score:2)
If not that many people are using the archive, it shouldn't cost that much to keep a couple terabytes sitting on disk. If it's about paying for bandwidth, if that many are hitting that much data, it needs to be retained and upgraded. If it's about hiding and deleting information, those responsible should be replaced with a competent staff.
It's not just about the data. It's about the interface for your access to the data. You need the website which manages the interface, the servers which process requests for the website, and other servers managing things like caching and data storage. Those all run software at a lower level which needs to be kept up to date, which costs time and money and most notably someone's attention - especially across major software updates. I would guess they are most likely making this change to try and consolidate t
Re: (Score:2)
No they haven't. Budget line items don't go to public referendum, if this was even a line item. And no, voting for such and such candidate isn't the same thing either since *representatives* (not true democracy) must stand in one's stead for hundreds of issues and therefore are never a perfect match for one's positions (one's luck if they're even a passable match for many).
Re: (Score:2)
it's a Technology problem (Score:2)
Fancy CMS based systems are the problem. They can keep static html pages forever and make some kind of index of them for an archive CHEAPLY with almost no cost. The problem becomes when you build a complex CMS with database to power the website and then have to maintain that database and CMS software forever even when it's no longer supported - that is a long term COST that goes on forever. Sure you could run a VM with it forever... but it gets hacked forever etc. Still costs long term.
Everything they do
Time for FDRA The Federal Data Retention Agency (Score:1)
Is it in the internet archive? (Score:2)
If they made the site correctly then it is already in the internet archive.
If they didn't, that's the real crime here.
This time will be known as the lost digital age. (Score:2)
Well, unless there are still Monks somewhere transcribing important documents and twitter posts..
Sunset Its Online Archive (Score:2)
Will that be Standard or DST? :-)
Re: (Score:2)
IPFS, IPFS, where are you when we need you (Score:2)
Technologies to counter the memory hole (Score:4, Interesting)
The central problem is, er, centralization. One authority maintains a copy of the content, so they control the content's lifetime. Archive.org is great, but it's still a centralized repository. There are other tools that can be used to keep content available to everyone.
With IPFS [ipfs.io], content is distributed as people access it. As long as one node has a copy, it can be retrieved.
Arweave [arweave.org] is enabling the permaweb, allowing data to be stored "permanently", in a distributed fashion. There's a browser plugin; just click "Archive this page", and it'll be stored forever.
Memory hole (Score:2)
Anything that does not comport with current doctrine goes down the memory hole. Nothing to see here, folks.
Article confuses archive of web vs env data (Score:1)