Adobe Pushing For Flash and PDF In Open Government Initiative 172

Posted by Soulskill on Saturday October 31, 2009 @09:20AM from the open-is-relative dept.

angryrice tips news that Adobe seems to be campaigning for the inclusion of Flash and PDF in the Obama administration's efforts at increasing government transparency and openness. A post from the Sunlight Labs blog is critical of Adobe's undertaking, in part since PDF is often "non-parsable by software, unfindable by search engines, and unreliable if text is extracted." They also say government's priority should be to publish datasets and the APIs to interact with them, rather than choosing how they're displayed in fancy graphs and charts.

This discussion has been archived. No new comments can be posted.

Adobe Pushing For Flash and PDF In Open Government Initiative

Load All Comments

Search 172 Comments Log In/Create an Account

Comments Filter:

don't hate PDF 'cause it's beautiful (Score:2, Informative)

by vaporland ( 713337 ) writes:

"non-parsable by software, unfindable by search engines, and unreliable if text is extracted."
I don't believe this is true - I find PDF documents in search results all the time. The consistency and reliability of PDF for forms creation has no real competition. If you hate Adobe, ok, but don't hate PDF 'cause it's beautiful...
- Re:don't hate PDF 'cause it's beautiful (Score:5, Insightful)
  
  by hedwards ( 940851 ) writes: on Saturday October 31, 2009 @09:29AM (#29933975)
  
  I have no problem with PDFs, there are a number of free and commercial applications out there that can work with them.
  
  Flash on the other hand is absolutely an abomination that must be wiped from the net. They still haven't released a proper version for *BSD and they commonly don't bother with less popular OSes. If they want it to be used for this sort of purpose then they need to get their act together and make it available for all operating environments on an equal basis. Which I don't think they have the resources to do.
  
  Parent Share
  twitter facebook
  - Re:don't hate PDF 'cause it's beautiful (Score:5, Informative)
    
    by Antique Geekmeister ( 740220 ) writes: on Saturday October 31, 2009 @09:39AM (#29934027)
    
    PDF remains difficult to manage. Like MS Word documents, an incredible amount of resources is wasted in display information rather than actual text or graphical content. Unlike MS Word, they're parseable: but unfortunately like MS Word, the commercial vendor-sold document creation tool (Adobe Acrobat) generates unstable and unreliable content that interacts very badly with other tools. Oddly, the ghostscript created PDF remains very stable and legible, and tools like "PDFCreator" which uses ghostscript creates long-term viable PDF printouts of other document formats. I use it for complex MS Word documents that cannot be handled by other software, even different versions of MS Word.
    Adobe can actually do better with this, and I hope that they will in the future. But it's not stable enough to be reliably indexed or viewable even 5 years in the future, much less 10 or 20 or 100 such as may be needed for legal or historical documents.
    Flash, you're quite right. Unless they open up the source, it has no business as yet another document format.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by petermgreen ( 876956 ) writes:
      
      but unfortunately like MS Word, the commercial vendor-sold document creation tool (Adobe Acrobat) generates unstable and unreliable content that interacts very badly with other tools
      Can you be more specific as to what problems you have had using files from acrobat in other tools?
      - Re:don't hate PDF 'cause it's beautiful (Score:4, Interesting)
        
        by Antique Geekmeister ( 740220 ) writes: on Saturday October 31, 2009 @10:19AM (#29934331)
        
        Printing documents created in other language versions of Acrobat. In particular, the Adobe Acrobat for German created documents that were not only unviewable in a normal Acrobat viewer, but when used to "print PDF" for MS Word documents, created documents that actually crashed Windows computers. The Acrobat for Hebrew didn't crash Windows with the printed documents, but was filled with layout errors when rendered even by Acrobat Reader, errors that didn't show up in the Adobe Acrobat tool. Much of this may have been fixed with the latest release, but I'm not spending nor suggesting that my peers overseas spend all the money needed to upgrade.
        Getting our colleagues to stop using Acrobat and use _anything else_ to generate their documents, and use PDFCreator to print them as PDF, stabilized the situation enough for us to generate the documents we needed. It didn't provide PDF forms for people to fill out, which was its only flaw.
        
        Parent Share
        twitter facebook
      - Re:don't hate PDF 'cause it's beautiful (Score:4, Interesting)
        
        by xjimhb ( 234034 ) writes: on Saturday October 31, 2009 @11:01AM (#29934647) Homepage
        
        Just recently I had to look at, and print a few pages from, a PDF document. Knowing where it came from, a corporation that is only very slowly dipping a toe in the water of software other than the big names, I'm sure it was done with Adobe.
        Now I don't even have the Adobe Acrobat reader on my system, when I try to install it, the install crashes. But Fedora comes with several other PDF readers, and the default is set to "Evince" which works fine MOST of the time.
        But I got this PDF, and one page was a picture of a tax form, and when I tried to print it, the tax form came out as a big black blob - man, does that waste ink! Obviously I killed the print job to try something else. (Just VIEWING this tax form was fine, only printing messed up.)
        I remembered using "Xpdf" a while ago, so I tried that, and voila, the tax form printed perfectly. Since I knew there were more tax forms in there, I used Xpdf for the rest of the job.
        So here is a case where two different PDF viewers reacted differently to the same PDF file. I think what we need is is an OPEN DEFINITION for PDF files, probably a subset of Adobe's definition, that any OSS viewer can follow and get the proper results - and ask the user what to do with files that don't follow it.
        And tell Adobe they can either follow the open definition, or stuff it where the sun don't shine!
        
        Parent Share
        twitter facebook
        
        Re:don't hate PDF 'cause it's beautiful (Score:4, Informative)
        
        by russotto ( 537200 ) writes: on Saturday October 31, 2009 @11:50AM (#29934951) Journal
        
        I think what we need is is an OPEN DEFINITION for PDF files, probably a subset of Adobe's definition, that any OSS viewer can follow and get the proper results - and ask the user what to do with files that don't follow it.
        There is such; Adobe publishes it and makes it freely available on its web site. It's possible your file didn't follow it, but it's more likely your reader wasn't 100% compliant; it's a very complicated specification.
        
        Parent Share
        twitter facebook
    - Re: (Score:2, Informative)
      
      by Anonymous Coward writes:
      
      Unlike micros~1 word documents, there are freely available specifications and a reasonable number of quite reasonable third party implementations that can either display or generate PDF, or even both. That is to say, you can very well ``do PDF'' without ever using adobe software. Part of its success is that it's a dumbed-down version of PostScript, also open and arguably the right way to talk to printers. That's a whole sight better than micros~1's ooxml abomination, that once standardized turned out to hav
    - - Re: (Score:2)
        
        by Antique Geekmeister ( 740220 ) writes:
        
        Yes, detailed display format is a critical feature of PDF. This is key to why it will it's not appropriate for indexing and stable long-term storage: the visual detail actively interferes with its stability and reliability.
  - Re: (Score:2)
    
    by cryfreedomlove ( 929828 ) writes:
    
    I agree that open government docs should stay away from Flash. I don't agree that Flash is an abomination because Adobe does not bother with less popular OSes. Why should they implement Flash on less popular OSes? That costs Adobe real money and then only a handful of users would benefit. If you were in charge of the engineering budget at Adobe, would you spend $ on a feature for Mac and Windows that 100 million people would use or would you use that same $ to port Flash to a less popular OS with 10,000
    - PDF and Flash are massively multiplatform (Score:3, Interesting)
      
      by Ilgaz ( 86384 ) writes:
      
      Adobe ships Flash/PDF readers/plugins to: Windows, OS X, Symbian (in some form), Linux, *BSD and various, uncountable tiny platforms. iPhone/iPod does not count because of obvious reasons.
      Lets see what MS Silverlight ships to: Windows/Intel Mac. Damn thing is so tied to Windows that they couldn't even convert/ship the V2 for PPC Macs or they simply abandoned them. (like we cared!)
      MS XPS format and viewer is the answer to PDF which, some people who didn't use Windows have never, ever heard of. It is that Win
      - Re: (Score:2)
        
        by tolan-b ( 230077 ) writes:
        
        > What we need is, something combines ODF and PDF. You can add binary file to PDF document like some layer.
        Already exists:
        http://www.oooninja.com/2008/06/pdf-import-hybrid-odf-pdfs-extension-30.html [oooninja.com]
        (scroll down a little)
    - Re: (Score:3, Insightful)
      
      by Darkness404 ( 1287218 ) writes:
      
      Because Flash is now a crucial part of the internet. Until HTML 5 comes out with video standards and the like, Flash is about the only way you can embed videos in sites without ruining the layout of the site with a third-party media player and without your users searching for codecs.
      
      If Adobe would simply release the source to the Flash player, they could -save- money, have full platform compatibility and perhaps make more money with the Flash creation products. Think of it this way, if there was a fast
      - Re: (Score:2)
        
        by cryfreedomlove ( 929828 ) writes:
        
        Why, as you say, is it good business for Adobe to port it to every single OS, even those with only a handful of users?
        
        Re: (Score:2)
        
        by Darkness404 ( 1287218 ) writes:
        
        Because it would allow for a language and framework that works on every single OS. Look at Java, even though it has numerous faults, the fact that it is now open source and ported to just about every single device means that it is used for lots of cross-platform programs. Flash could be the same way if they ported the player to every OS and device. By open sourcing the Flash player they would A) save money in development B) allow the porting of it to various platforms and C) Improve sales for their developm
  - Free programs only work with some govt PDFs (Score:2)
    
    by bigtrike ( 904535 ) writes:
    
    A number of government forms don't work with the free PDF readers.
    This is because Adobe broke its own published spec with its LiveCycle product, and by default it saves files that aren't compatible with anything else. It does a great job of forcing you to buy LiveCycle/Acrobat instead of using free tools. The Adobe people will tell you that it speeds up rendering of downloaded data, which I find hard to believe as the files are between 2x and 3x the size of a regular PDF.
    The current use of Adobe products
    - Re: (Score:3, Informative)
      
      by jeremyp ( 130771 ) writes:
      
      A PDF file produced by the LiveCycle suite is actually an XML document with a thin PDF wrapper. The XML conforms to the XFA standard which is owned by Adobe but is a published standard (http://partners.adobe.com/public/developer/en/xml/xfa_spec_2_4.pdf).
      - Re: (Score:2)
        
        by bigtrike ( 904535 ) writes:
        
        Acrobat Pro can't even edit XFA forms (beyond filling in values), why should 3rd party tools do so? I'm aware that you can save it as a hybrid "compatible" form, but it's not actually editable in Acrobat without stripping out the xfa data with a non-Adobe tool such as pdftk. The spec is subject to change at any time and has quite a few ambiguities, making it much more difficult to work with. How many more extra "open" specs and additions would we see if PDF was the official format of the government?
        XFA/
  - - Re: (Score:2)
      
      by Tynin ( 634655 ) writes:
      
      With the way CSS is developing, won't flash be redundant soon anyway? I certainly hope so!
      I haven't been paying attention to CSS and Flash development for a while, so please help fill me in. How does CSS and flash relate? CSS is used for easy and consistent page formatting across a site, where flash is used for a specific page to render something, be it an app/movie/effect. Please explain how flash will be made redundant in favor of CSS for the uninformed of us. Thanks.
- Re: (Score:3, Interesting)
  
  by Bacon Bits ( 926911 ) writes:
  
  PDFs are only searchable if the document contains text. Half the time PDFs contain text-as-image, which is about as useful to a search engine as a captcha image. Google doesn't run OCR on PDFs, AFAIK. Although, come to think of it, that sounds like something they'd get sued by a random company for doing for "violating copyright proprietary information".
  - What do you want? (Score:2)
    
    by FranTaylor ( 164577 ) writes:
    
    Perhaps you know of a document format where the text in images IS searchable?
    - Re: (Score:3, Interesting)
      
      by Bacon Bits ( 926911 ) writes:
      
      A document format shouldn't store text as an image. That's why it's called text.
      - Re: (Score:3, Insightful)
        
        by petermgreen ( 876956 ) writes:
        
        That is not really a format issue though, in any format that supports images I can insert an image containing text.
      - Which idiot managed to do it? (Score:3, Informative)
        
        by Ilgaz ( 86384 ) writes:
        
        I work with PDFs a lot, especially on OS X. I am telling you from an OS which you can have 60 KB 1080p screenshots in PDF in some circumstances: Whoever did that "text as image" trick, he is a complete moron.
        One of the reasons that PDF took off is exactly embedding fonts used in a document so it will appear as pixel perfect on client machines.
        As last resort (and a good practice), you can embed unformatted pure text of the entire PDF in your PDF file. PDF, like Quicktime Mov is one of the formats where peopl
        
        Re: (Score:3, Informative)
        
        by 99BottlesOfBeerInMyF ( 813746 ) writes:
        
        Whoever did that "text as image" trick, he is a complete moron.
        Generally text as images in PDFs are the result of people who scan in paper documents but don't have access to or don't use OCR programs to convert the raw image coming in from the scanner into text.
      - Re: (Score:2)
        
        by 99BottlesOfBeerInMyF ( 813746 ) writes:
        
        A document format shouldn't store text as an image. That's why it's called text.
        A document shouldn't store text as images. A document format can be misused and should not be trying to interpret images and reject them if they contain text. Heck, I can misuse the standard text files by storing the text as ASCII images output by "banner", making the difficult to copy and paste and near impossible to search. That's not the fault of the format, but me for misusing it.
        It's more of a problem with PDF because unlike the example I give with .txt files, because of how the formats are used. Docu
    - Re: (Score:3, Interesting)
      
      by TheRaven64 ( 641858 ) writes:
      
      You're missing the point. PDFs do not store text. Text is a stream of characters. PDFs store glyphs and their locations. It is more or less possible to convert glyphs into characters, although things like ligatures and the fact that spaces are not really represented make this difficult. In the metadata, some PDFs also store the text of the document, allowing it to be extracted. Given that the PDF is created automatically from the text in most cases, the text is more useful. You can create the PDF fro
      - Re: (Score:2)
        
        by TheRaven64 ( 641858 ) writes:
        
        It depends on how the PDF was created. If the PDF had the source text embedded in the metadata then it will work fine. Now try it with a PDF that's generated by printing to PostScript and then distilling to PDF (as a lot of PDFs are). It won't work.
        
        Re: (Score:2)
        
        by 99BottlesOfBeerInMyF ( 813746 ) writes:
        
        It depends on how the PDF was created. If the PDF had the source text embedded in the metadata then it will work fine. Now try it with a PDF that's generated by printing to PostScript and then distilling to PDF (as a lot of PDFs are). It won't work.
        I use a lot of PDFs from a lot of sources. I can copy and paste text from pretty much all of them with the rare exception of documents that are clearly scanned in versions of printed documents, complete with artifacts left over by the scanner. Now there are issues using multi-column PDFs in some readers that aren't smart enough to recognize the columns when the copy paste is performed, and different readers handle this with different amounts of ease. But that' does not indicate you can't get text out of a P
- Re:don't hate PDF 'cause it's beautiful (Score:5, Insightful)
  
  by TheRaven64 ( 641858 ) writes: on Saturday October 31, 2009 @09:40AM (#29934043) Journal
  
  The summary does not do a good job of reflecting the original blog post's point. The point was that the government should make data available in a machine-parseable and generic format. PDF is a great format for storing typeset pages, but it is a terrible format for publishing data. It's easy to generate beautiful PDFs from well-structured data but it's much harder to go the other way. Would you rather have budget figures (for example) as a CSV file in a well-defined format or as a PDF of tables and graphs? If the data is available in the former format, it's easy for you or a third party to produce the latter format. If it's only available in the PDF form then it's much harder to create the CSV.
  
  Parent Share
  twitter facebook
  - data formats independent of campaign donors (Score:3, Informative)
    
    by SgtChaireBourne ( 457691 ) writes:
    
    The summary does not do a good job of reflecting the original blog post's point. The point was that the government should make data available in a machine-parseable and generic format. PDF is a great format for storing typeset pages, but it is a terrible format for publishing data. It's easy to generate beautiful PDFs from well-structured data but it's much harder to go the other way. Would you rather have budget figures (for example) as a CSV file in a well-defined format or as a PDF of tables and graphs? If the data is available in the former format, it's easy for you or a third party to produce the latter format. If it's only available in the PDF form then it's much harder to create the CSV.
    If the goal is to make the data available, then even CSV would be a better option than PDF. PDF, while pretty, is a terminal format and is the digital equivalent of a mayfly. It's paper that hasn't happened yet and when it does it will exist for a few short hours before finding its way to the circular file.
    Much of the government data consists of tables and tables of data. gzipped csv would be readable by anyone, so would ODF. Adobe appears to be looking for a handout at the expense of creating a usef
  - Re: (Score:2)
    
    by John Whitley ( 6067 ) writes:
    
    It's easy to generate beautiful PDFs from well-structured data but it's much harder to go the other way. Would you rather have budget figures (for example) as a CSV file in a well-defined format or as a PDF of tables and graphs?
    More importantly, it's then easy to import that data for visualization and analysis purposes. Data presented as a PDF file is effectively so inaccessible that it will rarely be extracted for further analysis, meaning that some gov't functionary becomes responsible for the presentation and analysis instead of members of the public. Then a panoply of tools become available for finding out things from that data that no one ever knew were there. Something like Tableau Desktop [tableausoftware.com] can slurp in CSV data (or data i
    - Re: (Score:2)
      
      by John Hasler ( 414242 ) writes:
      
      > More importantly, it's then easy to import that data for visualization and
      > analysis purposes. Data presented as a PDF file is effectively so
      > inaccessible that it will rarely be extracted for further analysis, meaning
      > that some gov't functionary becomes responsible for the presentation and
      > analysis instead of members of the public.
      Which is exactly why PDF is what you are going to get (or something even more inaccessible).
  - - Re: (Score:3, Interesting)
      
      by John Whitley ( 6067 ) writes:
      
      CSV is kinda evil (see my post above), but it's better for tabular data than JSON or XML. Again, a tabular serialization format such as Avro, Thrift, or Protocol Buffers might well be far better than CSV for tabular data. JSON has quite a bit of format bloat, and would need some standardized way to explain the data's schema for further analysis. XML is the king of format bloat, but at least has standard schema representations. XML is far better for semi-structured or unstructured data than tables.
  - - Re: (Score:2)
      
      by TheRaven64 ( 641858 ) writes:
      
      That bug was also present in Flash 9 on Mac/PowerPC. I had a lot of Flash videos turn into slide shows (one frame every few seconds) with a 1.5GHz PowerPC G4. Restarting the browser fixed it, but for some videos it would reappear after about ten minutes of playback. Upgrading to Flash 10 fixed it for me, so I suspect that Adobe just chased it from one part of their code to another if it's still present for other people.
- Re: (Score:3, Informative)
  
  by Crudely_Indecent ( 739699 ) writes:
  
  Many implementations of PDF converters merely print a document to images and then embed the images into a PDF. Those are non-searchable and no text can be extracted with the existing tools. I once created a documentation website which relied on these embedded image types of PDF documents. I had to implement an OCR solution in order to extract the text to make my clients documentation searchable. It was ugly and a real pain in the ass.
  Certainly, PDF can be beautiful, but it is often not implemented that
  - Re: (Score:2)
    
    by Cochonou ( 576531 ) writes:
    
    Which PDF converters do that ? Because they must be really crap.
    Most PDF converters I have used rely on Ghostscript on a way or another (after all, it's free!), and Ghostscript definitely doesn't do like this.
    Most images-embedded-as-PDF files come from Xerox printers. Which, of course, have trouble knowing whatever was typed in the document in the first place.
- Re: (Score:2)
  
  by Joce640k ( 829181 ) writes:
  
  Can I hate all the multimedia/hyperlink/scripting/vulnerabilities they've added to PDF?
  I'll back this so long as it's PDF light - text and graphics only (OK, maybe I'll allow hyperlinks...).
  - Re: (Score:2)
    
    by TheRaven64 ( 641858 ) writes:
    
    PDF/A is the term you are looking for. It is the ISO-defined subset of PDF that prohibits encryption, JavaScript, sound and video.
    - Right you are, sir... (Score:2)
      
      by Joce640k ( 829181 ) writes:
      
      I'd vote for that as a standard.
Nobody likes flash (Score:5, Insightful)

by bcmm ( 768152 ) writes: on Saturday October 31, 2009 @09:31AM (#29933993)

Nobody likes Flash, and they probably shouldn't use it for anything. But there's not much wrong with PDF, if it's done right. When publishing something, one could offer "source" (some sane, machine-readable format) and PDF (autogenerated from the source, and prettified for easier reading).

PDF shouldn't be used as a way to encapsulate scanned JPEGs and pretend they're a real electronic document.

I would also note that many of the complaints about PDF as a format in TFA are really complaints about Adobe's abysmal PDF reading software. For example, the concern about the visually impaired: KDE's Okular does speech synthesis and has a high-contrast mode.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by NotBorg ( 829820 ) writes:
  
  But there's not much wrong with PDF, if it's done right.
  
  I'm sure they won't fuck this up, after all it is the US government.
- Re: (Score:2)
  
  by robogun ( 466062 ) writes:
  
  Additional reasons are they're closed, and are malware vectors that need to be constantly updated. As if that wasn't enough reason not to use them, there are even phishing scams to update your flash or pdf installs... with the scammers horrible malware.
  Unless you play Flash games or view Youtube all day there is no need to run Flash, all it does is deliver ugly ads or someone's horribly botched schoolboy attempt at an edgy webpage. Flash, far from enabling web usage is often used to RESTRICT usage, go to We
- Re: (Score:2)
  
  by mrmeval ( 662166 ) writes:
  
  Yes PDF is marvelous "No you can't print this document scumbag" "No you can't save a copy cretin" "No you can't extract the pretty pictures asshole". I've run into it and it's a show stopper.
  KDE is not windows and windows is here to stay for the foreseeable future. That is partly microsoft at work but it's also some serious usability and other problems with distributions using either a Linux or BSD kernel.
- Re: (Score:2)
  
  by bill_mcgonigle ( 4333 ) * writes:
  
  KDE's Okular does speech synthesis and has a high-contrast mode.
  yet can't output PS that my Brother BRScript can handle (Evince does OK).
  All of the open source PDF readers have come a long way in recent years - my only point is that PDF appears to be *hard* to implement. I don't know why somebody would need to, but my imagination is limited. Should a file format essential to government be such a hurdle to potential users?
- - Re: (Score:2)
    
    by lahvak ( 69490 ) writes:
    
    If you are going to suggest presenting word document, then complaining about pdf not being open standard is somewhat hypocritical. The problem actually has nothing to do with openness of the pdf format, but rather with the fact that Adobe Reader is closed. Anybody is free to implement a reader that will allow to save filled in forms, and will allow commenting of pdf file, there is nothing in the format that could prevent it.
    Also, creating pdf files from word documents is not the best way of doing it, in f
    - Re: (Score:2)
      
      by Blakey Rat ( 99501 ) writes:
      
      Anybody is free to implement a reader that will allow to save filled in forms
      Unless they're Microsoft. In that case, Adobe takes them to court and forces them to remove any PDF-relating features. PDF is an "open format" my ass. Adobe talks the talk, but they sure don't walk the walk.
Tremor (Score:3, Funny)

by sleeponthemic ( 1253494 ) writes: on Saturday October 31, 2009 @09:48AM (#29934087) Homepage

They also say government's priority should be to publish datasets and the APIs to interact with them, rather than choosing how they're displayed in fancy graphs and charts.
I felt a great disturbance in the Force, as if millions of IT workers suddenly cried out in terror, and were suddenly silenced.

Share
twitter facebook
PDF bad. Work on microformats please. (Score:4, Interesting)

by mattr ( 78516 ) writes: on Saturday October 31, 2009 @09:51AM (#29934101) Homepage Journal

GP is right. Government should focus on doing what government is needed for success, such as determining standards for formats that everyone can use, with input from academia and industry. For example a human readable parsable format that one could embed in a web page for semantic metadata. Or funding open source software to make it easy (cross platform) to input such data (I am thinking of information about cited papers or books). Typeset information is nice but we already are drowning in information - how many pages of Google results do you usually look at? And we need help before generating 10 times as much.
Why PDF is bad:
- It is a potable typeset document package. Not a data sharing package that could be pulled apart easily with tools automatically.
- PDF is extremely hard to parse, and using current free software does not always give good results.
- You destroy useful document structure, or in the case of ASCII text parsability and small size, when you convert to PDF. You can't just convert back to the original.
- It takes significant processing power and commercial software to display well and reliability as far as I can see. Having just gotten the latest Mac I feel like I'm in a dauntless battleship, but I have had many trouble with different unix tools in the past.
- Scientists publish PDF too but then also use other formats for data. For example on arxiv, one scientists recently published animations inside a zip but it was hard to find the link
- It is difficult to manage bibliographic information automatically.
- It is proprietary
- It requires a huge amount of data, and arcane knowledge, just to build a parser that works most of the time (such as for Asian languages especially).

Share
twitter facebook
- Re:PDF bad. Work on microformats please. (Score:5, Informative)
  
  by Anonymous Coward writes: on Saturday October 31, 2009 @10:07AM (#29934227)
  
  - It is proprietary
  FAIL.
  PDF is an ISO standard. See: ISO 32000-1, Document management – Portable document format – Part 1: PDF 1.7
  This doesn't change the fact that it is a portable typesetting document format though. It's good for read only documents from your word processor but it shouldn't be (ab)used to store tables or graphs or whatever other crap people use it for.
  ---
  As for Flash, lets not even go there. Flash is passable as a streaming video container, if you're making animated cartoons like Homestar Runner or as a platform for small web games but other than those use cases, you're using it wrong.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by beelsebob ( 529313 ) writes:
    
    Not only is it a standard, it's also *really* easy to parse. It's specifically structured so that any printer manufacturer can parse it and end up with *exactly* the same document as software displays.
    It still doesn't change the fact that it's not for data transfer, but for pristine document layout though.
- Re: (Score:2)
  
  by bcmm ( 768152 ) writes:
  
  The majority of those issues would be fixed by publishing LaTeX sources next to the PDFs generated from them.
- Re: (Score:2)
  
  by iris-n ( 1276146 ) writes:
  
  - Scientists publish PDF too but then also use other formats for data. For example on arxiv, one scientists recently published animations inside a zip but it was hard to find the link
  Err... also? I've never seen a scientist using pdf to publish data. We use pdf (and ps and div) to publish typeset papers. The actual data is in a lot of formats, dependent on the field and application. I've seen csv, matlab's .mat, xml, jpeg, tiff, proprietary crap, etc.
- Re:PDF bad. Work on microformats please. (Score:5, Funny)
  
  by Stormwatch ( 703920 ) writes: <rodrigogirao@nosPam.hotmail.com> on Saturday October 31, 2009 @10:47AM (#29934553) Homepage
  
  It is a potable typeset document package.
  So you can drink a PDF?!
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by NotBorg ( 829820 ) writes:
    
    Yeh see matey, if yeh leave out the R no one be respecting.
    Yarrrrrrr!
  - Re: (Score:2)
    
    by rdnetto ( 955205 ) writes:
    
    Sure you can. Why wouldn't you be able to drink a Properly Distilled Fluid?
WTF? (Score:2)

by dnaumov ( 453672 ) writes:

PDF is often "non-parsable by software, unfindable by search engines, and unreliable if text is extracted."
Have these people not heard of Google? Just because YOU can't write software to parse PDF files doesn't mean that nobody else can and that it doesn't already exist.
- Forget Google, every single Apple device does it (Score:2)
  
  by Ilgaz ( 86384 ) writes:
  
  If you look around, every single Apple computer, device (ipod/iphone) is actively indexing every single PDF thrown at them, instantly and keep database of it.
  It is the famous "Spotlight" technology. They don't even need to look at Google, some of them have same kind of indexing technology (minus relation) running on their laptops.
  One should check the TFA relations with MS. I am sure something will come up.
  - Re:Forget Google, every single Apple device does i (Score:2)
    
    by TheRaven64 ( 641858 ) writes:
    
    If you look around, every single Apple computer, device (ipod/iphone) is actively indexing every single PDF thrown at them, instantly and keep database of it.
    No it is not. It is indexing every PDF that has text in the metadata. Create a PDF by printing to PostScript and then converting to PDF (the easiest way of creating PDF on Windows or Linux machines) and watch Spotlight completely fail to index it. Spotlight does not index the text that you see when you browse the PDF, because that text is stored as a set of glyph indexes, not as streams of characters.
    And if you want some real fun, open up a PDF containing table in Preview and try to persuade it to cop
Depends on the purpose (Score:2)

by PineHall ( 206441 ) writes:

If you are publishing a document that can be printed then PDF is a good format. If you expect people to extract data from the document then you should look for a different format. It depends on the purpose of posting the document on the web.
- Re: (Score:2)
  
  by lahvak ( 69490 ) writes:
  
  Either you provide data, or you provide a document. Extracting textual data from pdf is not any harder than extracting them from a word file or an odf document.
  If you want to provide data, provide data, in a csv format of something simple like that.
  In fact, with pdf, you can do both, since you can attach the cvs or whatever format data to the document.
PDF Yes, Flash No (Score:5, Insightful)

by markdavis ( 642305 ) writes: on Saturday October 31, 2009 @10:11AM (#29934261)

I am OK with PDF. I would RATHER see documents in plain HTML, but there are times when formatting is important. In those cases, if it is to be read/print-only, PDF is the way to go. Otherwise, the gov should use ODF.
But Flash? Are you kidding? The last thing on earth we need is more Flash.
* Does not work on all devices
* Slow and/or consumes tons of CPU
* Consumes tons of RAM
* Consumes more bandwidth
* Makes it difficult or impossible to cut and paste
* Impossible to "search/find"
* Violates the native UI look and feel
* Fonts and font sizes are uncontrollable by the end user
* Can't scroll correctly much of the time
* Almost completely proprietary
* Rarely adjusts to screen size
* Often introduces extremely irritating animation.
* Doesn't allow text to be "seen" by the browser (or OS), making other plugins (like a screen reader) 100% useless
At least that SilverDark stuff isn't even on the radar- thank God for little favors.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by gaspyy ( 514539 ) writes:
  
  Most of what you say is implementation-related rather than format-related. It's like saying that C sucks because there are so many crappy programs. I know about feeding the trolls, but for all those who don't know better, here we go:
  Nothing "just works" on all devices and in this area flash fares better than most other technologies; agree is slow; not really agree on RAM usage.
  Flash uses less bandwidth than alternatives, it's quite very well optimized. Sure, someone can stuff some 10 min. mp3s encoded at 25
  - Re: (Score:3, Insightful)
    
    by Vexorian ( 959249 ) writes:
    
    I know Slashdot crowd loves to hate flash, but at least hate it for the right reasons: its lack of speed and real 3d hardware acceleration.
    Those are very lame reasons. We are talking about open government initiative here, not about "standard for web games" initiative. Flash is:
    Not portable: Many platforms lack proper support. Flash can't be legally redistributed, alternatives are poor. It is no open format in any way.
    Bad for accessibility.
    Not a web standard or anything close to it.
    Nothing "just works" on a
    - Re: (Score:2)
      
      by Vexorian ( 959249 ) writes:
      
      Oh, and no half assed "openish" attempts a la MS. The whole entirety of it would have to be open, including the codecs and the tools to generate them. Nothing about proprietary extensions making the standard optionally-open. Also, as a standard for open government initiative, giberish like DRM must be completely out of the question.
    - Re: (Score:2)
      
      by agnosticnixie ( 1481609 ) writes:
      
      I have no mod points, but, seriously, that - mod up people.
    - Re: (Score:2)
      
      by nahdude812 ( 88157 ) * writes:
      
      Maybe you haven't seen this: http://www.adobe.com/devnet/swf/ [adobe.com]
      This is 278 pages of very straightforward and in-depth documentation on the SWF file format.
  - Re: (Score:2)
    
    by markdavis ( 642305 ) writes:
    
    >Most of what you say is implementation-related rather than format-related. It's like saying that C sucks because there are so many crappy programs.
    I will agree that there are better and worse ways to IMPLEMENT Flash, but even properly implemented, it doesn't address all (or most) of my issues.
    >Nothing "just works" on all devices and in this area flash fares better than most other technologies; agree is slow; not really agree on RAM usage.
    HTML works fine on all devices. 95% of the time I see Flash us
- Re: (Score:2)
  
  by Lennie ( 16154 ) writes:
  
  Not only that, but so much more is possible these days with a browser that supports proper standards.
  
  Flash became populair by the web-development community when you had to do a lot of web-programming to get things done and the performance wasn't optimised for those kind of things.
  
  But that is ages (in internet time) ago.
- - Why blame Adobe when there are no alternatives (Score:2)
    
    by Ilgaz ( 86384 ) writes:
    
    I will ask one thing as you seem to miss why HTML is not considered a print/distro format: "When did we have an embeddable font standard for HTML webpages?" as with Flash: "Is there a way to have a single file and infrastructure to show embedded videos in HTML5 form?"
    They actually suggested people to use abandoned VP3 format for God's sake and the very same people have chosen TrueType (check why freetype exists) as font embedding format.
- - Re: (Score:3, Interesting)
    
    by markdavis ( 642305 ) writes:
    
    So there is a partial option for MS-Windows only. Great. Not exactly platform agnostic and open. I suppose it is better than nothing, though.
- - Re: (Score:2)
    
    by mini me ( 132455 ) writes:
    
    Microformats allow HTML to describe the data. But you are right, it is not the right tool for storing data.
Hate to Nitpick (Score:2)

by sehryan ( 412731 ) writes:

"...unfindable by search engines..."
That is absolutely not true. Anyone who uses Google knows that the search engine can read PDFs, identify if any of the keywords are located within, and then provide a link both directly to the PDF as well as to an HTML version.
If They Open the Formats (Score:2)

by RAMMS+EIN ( 578166 ) writes:

This could be a Good Thing, if it means that the formats will be made and remain open. IIRC, PDF is already an open standard, and supported by various programs from multiple sources. I would applaud it if the same were to happen to Flash. And if both formats are open and widely supported, the government could do a lot worse than using them.
Model View Controller (Score:2)

by foniksonik ( 573572 ) writes:

Just my 2 cents in regards to public records and data.
I'd like to say that the groups making decisions in this area really should consider a MVC architecture which will avoid the concerns iterated here on /. and by pundits for open data standards everywhere in regards to display aka View technologies.
With a Model View Controller methodology and pattern in place it really is not a concern what technology is being used to display data at any given time. If public data is *stored* (Model) and *accessed* (Contr
- Re:The future is ODF and html5 (Score:4, Insightful)
  
  by tepples ( 727027 ) writes: <tepples@[ ]il.com ['gma' in gap]> on Saturday October 31, 2009 @09:38AM (#29934023) Homepage Journal
  
  but specially html5+js+canvas+svg+ogg vorbis/theora for rich web content.
  Who has announced authoring tools for this stack that are anywhere near as capable as even Flash 3, let alone Flash CS4? Say I want to make an animated SVG like the Flash animations I see on Newgrounds. What package should I start with?
  
  Parent Share
  twitter facebook
  - Re:The future is ODF and html5 (Score:4, Funny)
    
    by oldspewey ( 1303305 ) writes: on Saturday October 31, 2009 @10:13AM (#29934289)
    
    This sort of authoring is easily handled in vi - or emacs - your choice.
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Interesting)
      
      by tepples ( 727027 ) writes:
      
      Yeah, and you can hex edit an SWF file too. But change a letter, refresh, change a letter, refresh, is not the kind of editing that graphic designers prefer to do. If that's what SVG has to offer, the market will choose SWF. I can only hope your comment was sarcasm.
  - - Re: (Score:2)
      
      by agnosticnixie ( 1481609 ) writes:
      
      i dont't know where and how adobe flash or the other cancer,ms-novell-silverlight-moonlight coud do that.
      It can't, adobe's accessibility recommendations is to keep a separate non-flash version of the site and that's it. It doesn't degrade well (so financial access is problematic), it doesn't work in screen readers (and I've even seen flash shit coders who thought a site with audio without text for its menus, assuming the person is of a particular ethnic group and hearing, was a smart idea), and it's barely searchable.
      - Re: (Score:2)
        
        by agnosticnixie ( 1481609 ) writes:
        
        wow, they revised it recently. Let me review and grade if they did their homework, shall we?
        Fail, barely, Fail entirely if you add the fact that the only 100% flash compatible screen reading tech is on Windows.
        Of 5 categories supported, they have 4 with exceptions. 3 with killing exceptions. And only a handful users who are smart enough to implement the solutons.
- Re: (Score:3, Insightful)
  
  by Cochonou ( 576531 ) writes:
  
  Right...
  
  In order to read a document, what I really need to replace the heavyweight Adobe Reader, is a bloated modern browser ! :D
- - Re: (Score:2)
    
    by tepples ( 727027 ) writes:
    
    On top of that, [HTML 5 video] requires the browser to implement basic navigation controls; producers are going to want to keep their own in-house player controls.
    That's still doable. JavaScript running in an HTML 5 page can disable the browser's built-in controls in a <video> element and control the video itself.
- Re:Tell Adobe to open-license PDF (Score:5, Informative)
  
  by TheRaven64 ( 641858 ) writes: on Saturday October 31, 2009 @10:13AM (#29934283) Journal
  
  What are you talking about? The PDF specification has been available as a free download from Adobe with no royalties payable by implementors since PDF was first created. More recently, the PDF/X family of specifications was approved by ISO. These define subsets of the PDF 1.4 specification for different uses (see ISO 15930). There are at least three open source PDF readers that I know of as well as several commercial viewers (Adobe Reader, FoxIt, Apple's Preview, and so on) and numerous tools can generate PDFs.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Insightful)
    
    by Blakey Rat ( 99501 ) writes:
    
    Yes, and then they SUED Microsoft for putting PDF support in Office. It's only "open" as long as you're not big enough to compete with Acrobat. If you even get within a mile of stepping on Adobe's business, you're sued up the wazzoo.
    "Free and open" my ass.
    - - Re: (Score:3, Insightful)
        
        by Blakey Rat ( 99501 ) writes:
        
        Bullshit.
        It's either an open standard, meaning anybody can use it-- ANY BODY-- or it's not. There's no such classification as "it's an open standard, except we don't let companies we don't like use it because they have a big marketshare, but other than that it's an open standard believe me!"
        By your argument, Microsoft should also be prevented from parsing HTML files in IE because they're a monopoly. Does that make sense? No. Does your argument make sense? No.
  - - Re: (Score:2)
      
      by TheRaven64 ( 641858 ) writes:
      
      There are a huge number of free programs that can create PDFs. Anything that uses Cairo for rendering can generate PDFs natively, although without some of the nice metadata. If you're using almost any modern operating system (Windows or anything that uses CUPS for printing, including Linux and OS X) then any application that can print can also generate PDFs. I use pdflatex very often and it produces beautiful PDFs with working hyperlinks and the table of contents in the bookmarks section, and it will hap
    - Re: (Score:2)
      
      by gyrogeerloose ( 849181 ) writes:
      
      Apple uses PDF as the basis of the OS X display engine. When they adopted the NeXT OS as their next-generation to replace the "Classic" Mac OS, they switched from NeXT's Display PostScript precisely because PDF was a free and open-source specification. An OS X user can create a PDF file from pretty much any document simply by beginning a print operation then selecting "save as PDF" from the print dialog box.
      - Re: (Score:3, Interesting)
        
        by TheRaven64 ( 641858 ) writes:
        
        PostScript is also a free specification, but NeXT was using the Display PostScript implementation licensed from Adobe. They switched to something closer to PDF because, it turned out, no one actually cared about the nicer features in PS. With DPS, you could write view objects entirely in PostScript and have them run on the display server. This was quite slow and had all sorts of problems in that the PS programs could (potentially) run forever. Most people just used the drawing subset of PS, which is als
        
        Re: (Score:2)
        
        by commodore64_love ( 1445365 ) writes:
        
        "Everybody uses it" is not the same as open. PDF is like VHS or CD. All are closed standards, requiring a license from their respective owners.
        BTW why was I modded "flamebait" for expressing an opinion? Silly, silly, silly.
        
        Re: (Score:2)
        
        by TheRaven64 ( 641858 ) writes:
        
        "Everybody uses it" is not the same as open. PDF is like VHS or CD. All are closed standards, requiring a license from their respective owners.
        You're moderated flamebait for being wrong and, as you usually do, aggressively defending your incorrect position when ten seconds of fact checking would indicate that you are wrong.
        You can, as I said in the original post, download the PDF specification and implement it without paying a royalty [adobe.com] and you've been able to do this for every version of the PDF specification since version 1.0. That page is linked to from the top link that you get if you Google for 'PDF specification' and it has been for some y
        
        Re: (Score:2)
        
        by commodore64_love ( 1445365 ) writes:
        
        >>>you are wrong. You can, as I said in the original post, download the PDF specification and implement it without paying a royalty and you've been able to do this for every version of the PDF specification since version 1.0 [1993]
        >>>
        Guess what? You are wrong too. (Surprised? You shouldn't be; nobody's perfect; not me nor you.) PDF did not become an open standard until version 1.7 [2008] according to wikipedia. That was only a year ago.
        Which is why, as others pointed out, various com
        
        Re: (Score:2)
        
        by commodore64_love ( 1445365 ) writes:
        
        the fact that you still haven't learned how to use quote tags
        You mean like that? I know how to use them just fine, but I've always preferred the old Usenet methodology. Typing >>> is a heck of a lot faster than typing 14-letter tags.
        .
        >>>your ill-informed ramblings.
        That's nice. You were still wrong when you said, "You've been able to do this for every version of the PDF specification since version 1.0." Adobe had the patents until 2008. That means it was closed. No one could legally publish a PDF Creator program prior to that year, as Micro
    - Re: (Score:2)
      
      by lahvak ( 69490 ) writes:
      
      How many free programs do you know of that create .pdf's?
      To lazy to count right now, but just what I use on more or less daily bases, about 20. Plus hundreds of others that I don't use.
- Digital Stewardship : PDF vs PDF/A (Score:3, Insightful)
  
  by SgtChaireBourne ( 457691 ) writes:
  
  PDF/A is already open. However, that doesn't mean that anyone knows how to produce it, especially some R.O.A.D. staffer or random hourly GS1.
  Open or not, PDF/A is a display format and, in most cases, useless for information retrieval or automated data processing. PDF/A is a useful alternative to paper [digitalpreservation.gov]. However, the open government initiative is not talking about paper. It's about 'born digital [wired.com]', machine readable data.
  - - screen-scraping a PDF/A wrapper (Score:2)
      
      by SgtChaireBourne ( 457691 ) writes:
      
      Useless is the wrong word. It took 15 lines of python wrapping xpdf for me to get a working system for dumping the transactions out of the last 6 years of my credit card statements.
      It's ugly, but it works just fin
      That would be because that particular PDF happened to accidentally be wrapping ASCII or ISO-8859 or UTF-8 or UTF-16 instead of some image format. Even then, that was just screen-scraping [xml.com] like can be done with old terminal sessions. It can be done, sometimes.
      Keep the data in machine readable formats, not a terminal format like PDF or paper.
- Re: (Score:2)
  
  by owlstead ( 636356 ) writes:
  
  "Further, the recent PDF specifications add DRM which shouldn't be allowed in government publications. If the govt agrees to use a PDF version that open source software can completely read, parse, and convert, then it is fine PROVIDED the raw data is available in open formats too."
  No, it's not fine because, as others have pointed out, PDF is mainly use for formatting documents. It's doing a pretty adequate job on that as well, and you can use third party software that can actually display it without the dra
  - Re: (Score:2)
    
    by lahvak ( 69490 ) writes:
    
    Hell, you can't even /select/ text normally using most PDF readers.
    People keep saying that. I never had a problem with this. I use 3 or 4 different pdf readers, including the one from Adobe, and I never had problems with selecting and cutting text from a pdf document.
    - Re: (Score:2)
      
      by owlstead ( 636356 ) writes:
      
      That's weird, because any time I cut anything around a page border, a table or more or less any other break in the page, everything gets screwed up. I won't even go into what happens when there is a watermark on the page. And with screwed up, I mean screwed up. Missing parts of text, text in wrong order, you name it. That and it crashes every so often, it doesn't live through power saving state on my computer, to name something. I won't go into the way it handles tabs, or form input or search or pop ups bec
- Re: (Score:2)
  
  by agnosticnixie ( 1481609 ) writes:
  
  Even Adobe knows it, in their "let's pay lip service to the ADA and pretend we and our users can code our way out of a paper bag" page, their sole recommendation is to keep a non-flash version linked to...
- - Re: (Score:2)
    
    by agnosticnixie ( 1481609 ) writes:
    
    The Internet doesn't belong to Adobe, moron.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

don't hate PDF 'cause it's beautiful (Score:2, Informative)

Re:don't hate PDF 'cause it's beautiful (Score:5, Insightful)

Re:don't hate PDF 'cause it's beautiful (Score:5, Informative)

Re: (Score:2)

Re:don't hate PDF 'cause it's beautiful (Score:4, Interesting)

Re:don't hate PDF 'cause it's beautiful (Score:4, Interesting)

Re:don't hate PDF 'cause it's beautiful (Score:4, Informative)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

PDF and Flash are massively multiplatform (Score:3, Interesting)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Free programs only work with some govt PDFs (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

What do you want? (Score:2)

Re: (Score:3, Interesting)

Re: (Score:3, Insightful)

Which idiot managed to do it? (Score:3, Informative)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re:don't hate PDF 'cause it's beautiful (Score:5, Insightful)

data formats independent of campaign donors (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Right you are, sir... (Score:2)

Nobody likes flash (Score:5, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Tremor (Score:3, Funny)

PDF bad. Work on microformats please. (Score:4, Interesting)

Re:PDF bad. Work on microformats please. (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:PDF bad. Work on microformats please. (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

WTF? (Score:2)

Forget Google, every single Apple device does it (Score:2)

Re:Forget Google, every single Apple device does i (Score:2)

Depends on the purpose (Score:2)

Re: (Score:2)

PDF Yes, Flash No (Score:5, Insightful)

Re: (Score:3, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Why blame Adobe when there are no alternatives (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Hate to Nitpick (Score:2)

If They Open the Formats (Score:2)

Model View Controller (Score:2)

Re:The future is ODF and html5 (Score:4, Insightful)

Re:The future is ODF and html5 (Score:4, Funny)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)