Forgot your password?
typodupeerror
Government Your Rights Online

Adobe Pushing For Flash and PDF In Open Government Initiative 172

Posted by Soulskill
from the open-is-relative dept.
angryrice tips news that Adobe seems to be campaigning for the inclusion of Flash and PDF in the Obama administration's efforts at increasing government transparency and openness. A post from the Sunlight Labs blog is critical of Adobe's undertaking, in part since PDF is often "non-parsable by software, unfindable by search engines, and unreliable if text is extracted." They also say government's priority should be to publish datasets and the APIs to interact with them, rather than choosing how they're displayed in fancy graphs and charts.
This discussion has been archived. No new comments can be posted.

Adobe Pushing For Flash and PDF In Open Government Initiative

Comments Filter:
  • by vaporland (713337) on Saturday October 31, 2009 @09:25AM (#29933945) Homepage

    "non-parsable by software, unfindable by search engines, and unreliable if text is extracted."

    I don't believe this is true - I find PDF documents in search results all the time. The consistency and reliability of PDF for forms creation has no real competition. If you hate Adobe, ok, but don't hate PDF 'cause it's beautiful...

  • by quantic_oscillation7 (973678) on Saturday October 31, 2009 @09:35AM (#29934007)

    The future is ODF (a real open xml) and of course PDF, but specially html5+js+canvas+svg+ogg vorbis/theora for rich web content.

    With this kind of technology that the new browsers bring to the arena, adobe is getting scared!

  • by Antique Geekmeister (740220) on Saturday October 31, 2009 @09:39AM (#29934027)

    PDF remains difficult to manage. Like MS Word documents, an incredible amount of resources is wasted in display information rather than actual text or graphical content. Unlike MS Word, they're parseable: but unfortunately like MS Word, the commercial vendor-sold document creation tool (Adobe Acrobat) generates unstable and unreliable content that interacts very badly with other tools. Oddly, the ghostscript created PDF remains very stable and legible, and tools like "PDFCreator" which uses ghostscript creates long-term viable PDF printouts of other document formats. I use it for complex MS Word documents that cannot be handled by other software, even different versions of MS Word.

    Adobe can actually do better with this, and I hope that they will in the future. But it's not stable enough to be reliably indexed or viewable even 5 years in the future, much less 10 or 20 or 100 such as may be needed for legal or historical documents.

    Flash, you're quite right. Unless they open up the source, it has no business as yet another document format.

  • by Crudely_Indecent (739699) on Saturday October 31, 2009 @09:57AM (#29934149) Journal

    Many implementations of PDF converters merely print a document to images and then embed the images into a PDF. Those are non-searchable and no text can be extracted with the existing tools. I once created a documentation website which relied on these embedded image types of PDF documents. I had to implement an OCR solution in order to extract the text to make my clients documentation searchable. It was ugly and a real pain in the ass.

    Certainly, PDF can be beautiful, but it is often not implemented that way. Personally, I'm a big fan of PDF. If not implemented properly, I try to avoid it.

  • by Anonymous Coward on Saturday October 31, 2009 @10:07AM (#29934227)

    - It is proprietary

    FAIL.

    PDF is an ISO standard. See: ISO 32000-1, Document management – Portable document format – Part 1: PDF 1.7

    This doesn't change the fact that it is a portable typesetting document format though. It's good for read only documents from your word processor but it shouldn't be (ab)used to store tables or graphs or whatever other crap people use it for.

    ---
    As for Flash, lets not even go there. Flash is passable as a streaming video container, if you're making animated cartoons like Homestar Runner or as a platform for small web games but other than those use cases, you're using it wrong.

  • by TheRaven64 (641858) on Saturday October 31, 2009 @10:13AM (#29934283) Journal
    What are you talking about? The PDF specification has been available as a free download from Adobe with no royalties payable by implementors since PDF was first created. More recently, the PDF/X family of specifications was approved by ISO. These define subsets of the PDF 1.4 specification for different uses (see ISO 15930). There are at least three open source PDF readers that I know of as well as several commercial viewers (Adobe Reader, FoxIt, Apple's Preview, and so on) and numerous tools can generate PDFs.
  • by Anonymous Coward on Saturday October 31, 2009 @10:23AM (#29934361)

    Unlike micros~1 word documents, there are freely available specifications and a reasonable number of quite reasonable third party implementations that can either display or generate PDF, or even both. That is to say, you can very well ``do PDF'' without ever using adobe software. Part of its success is that it's a dumbed-down version of PostScript, also open and arguably the right way to talk to printers. That's a whole sight better than micros~1's ooxml abomination, that once standardized turned out to have not even one conformant working implementation. Agree on the flash, but there's more.

    PDF is pretty good on storing bound-for-paper documents (and when doing that, use metric paper, dammit) though for scans you're probably better off with DJVU. Flash is basically pure concentrated dancing rodents, and has very little to offer beyond gimmicks. Unless it opens, and opens soon, it will have no staying power and flash data will be rendered useless in a decade or two. That's bad for archiving.

    The core goal should be content: Content, interop, accessability for the disabled, accessability for non-wintendo machines regardless of marketshare, archiving, being able to re-use, and still being able to access centuries down the road. PDF may qualify, flash certainly does not.

  • by SgtChaireBourne (457691) on Saturday October 31, 2009 @10:30AM (#29934417) Homepage

    The summary does not do a good job of reflecting the original blog post's point. The point was that the government should make data available in a machine-parseable and generic format. PDF is a great format for storing typeset pages, but it is a terrible format for publishing data. It's easy to generate beautiful PDFs from well-structured data but it's much harder to go the other way. Would you rather have budget figures (for example) as a CSV file in a well-defined format or as a PDF of tables and graphs? If the data is available in the former format, it's easy for you or a third party to produce the latter format. If it's only available in the PDF form then it's much harder to create the CSV.

    If the goal is to make the data available, then even CSV would be a better option than PDF. PDF, while pretty, is a terminal format and is the digital equivalent of a mayfly. It's paper that hasn't happened yet and when it does it will exist for a few short hours before finding its way to the circular file.

    Much of the government data consists of tables and tables of data. gzipped csv would be readable by anyone, so would ODF. Adobe appears to be looking for a handout at the expense of creating a useful and open data system.

    Put it in context: open government requires data formats that are independent of campaign donors.

  • by Ilgaz (86384) on Saturday October 31, 2009 @11:17AM (#29934741) Homepage

    I work with PDFs a lot, especially on OS X. I am telling you from an OS which you can have 60 KB 1080p screenshots in PDF in some circumstances: Whoever did that "text as image" trick, he is a complete moron.

    One of the reasons that PDF took off is exactly embedding fonts used in a document so it will appear as pixel perfect on client machines.

    As last resort (and a good practice), you can embed unformatted pure text of the entire PDF in your PDF file. PDF, like Quicktime Mov is one of the formats where people doesn't use the features and bitch about the size of client etc.

  • by russotto (537200) on Saturday October 31, 2009 @11:50AM (#29934951) Journal

    I think what we need is is an OPEN DEFINITION for PDF files, probably a subset of Adobe's definition, that any OSS viewer can follow and get the proper results - and ask the user what to do with files that don't follow it.

    There is such; Adobe publishes it and makes it freely available on its web site. It's possible your file didn't follow it, but it's more likely your reader wasn't 100% compliant; it's a very complicated specification.

  • by Anonymous Coward on Saturday October 31, 2009 @12:52PM (#29935323)

    PDF is also not really an open standard. It's mostly open - but some very interesting features, like "Allow commenting in Reader" and "Allow Reader to save filled-in forms" cannot be implemented using published standards information.

    I suppose it's ok if the website offers an option to return data in multiple forms (eg. here's a link to the original word file, or here's a PDF if you can't read Word), but it doesn't quite seem appropriate as _the_ way to present information.

  • by jeremyp (130771) on Saturday October 31, 2009 @06:37PM (#29937545) Homepage Journal

    A PDF file produced by the LiveCycle suite is actually an XML document with a thin PDF wrapper. The XML conforms to the XFA standard which is owned by Adobe but is a published standard (http://partners.adobe.com/public/developer/en/xml/xfa_spec_2_4.pdf).

  • by 99BottlesOfBeerInMyF (813746) on Sunday November 01, 2009 @02:56PM (#29943100)

    Whoever did that "text as image" trick, he is a complete moron.

    Generally text as images in PDFs are the result of people who scan in paper documents but don't have access to or don't use OCR programs to convert the raw image coming in from the scanner into text.

Every successful person has had failures but repeated failure is no guarantee of eventual success.

Working...