Aussie Government Gives PDF the Thumbs Down 179
littlekorea writes "The central IT office of the Australian Government has advised its agencies to offer alternatives to Adobe's Portable Document Format to ensure folks with impaired vision are able to consume information on the Web. A Government-funded study found that PDFs can present themselves as image-only files to screen readers, rendering the information contained within them unreadable for the vision impaired."
Re:A subset of PDF files? (Score:5, Informative)
ISO already has created the standardized PDF/X subsets [wikipedia.org] used widely in the publishing industry. They lack support for extra features like scripting and other extensions.
The main problem with PDF for document archives is that it is a presentation format and doesn't adequately preserve text structure since everything is broken down into lines of text or individually placed glyphs. Analysis of a page layout can only bring back so much. There are better ways to store data that offer more versatility.
Re:A subset of PDF files? (Score:1, Informative)
Adobe does not use the operating system functions to render the text, I guess that's the root of the problem.
Re:So can any format (Score:4, Informative)
No it doesn't sound like a bozo official since that style of pdf was specifically excluded from the user study they ran.
You could of course skim the report and know that, but I guess that would mean you couldn't launch into meaningless rants.
Of ocurse if you did that you'd know the report is available in PDF format which I guess would just launch you on a different meaningless rant.
Re:Throwing out the baby with the bath water (Score:5, Informative)
Not necessarily. PDF does not preserve text flow. It breaks up paragraphs into lines (or less if kerning has been altered), and places them accurately on the page. If you have a multi-column layout, then a pdf-to-text algorithm (first step in screen reading) is likely to put column-2-line-1 between column-1-lines-{1 and 2}. Best of luck sorting that out.
Re:So can any format (Score:4, Informative)
You do know that in Australia it is law that a company make their website accessible for vision impaired if at all possible.
Re:Throwing out the baby with the bath water (Score:4, Informative)
Not necessarily. PDF does not preserve text flow. It breaks up paragraphs into lines (or less if kerning has been altered), and places them accurately on the page.
This is not true. PDF is capable of preserving text flow if the document contains such information. See this as an example [hoboes.com]: if you open it in acrobat reader and move the text cursor using the down arrow, you'll see it travel correctly among columns and paragraphs.
No page description format will help if the page has been generated in a broken way: for instance, try extracting text from the tables of an html page generated by javascript.
If you have a multi-column layout, then a pdf-to-text algorithm (first step in screen reading) is likely to put column-2-line-1 between column-1-lines-{1 and 2}. Best of luck sorting that out.
In this case it is the pdf-to-text algorithm to be broken, and should be fixed.
Re:A subset of PDF files? (Score:2, Informative)
Last time I checked Adobe reader had built-in OCR and text-to-speech even in the free Acrobat Reader. The IT director was just plain lazy, or there's some lobbying.
Re:A subset of PDF files? (Score:4, Informative)
To make the documents accessible, they will need to create them in such a way that the screen reader can read the text for the blind person. Believe it or not, extracting the text contents from a pdf file is actually a very non-trivial problem. Mostly the problems are caused by pdf authoring tools that render each glyph separately. The text extractor then has no idea about which characters belong to each line and has to guess based on the baseline of the character. Another problem is non-ascii characters and how the authoring tool decides to render them. The venerable free software tool pdflatex uses composite characters (basically it renders multiple glyps on top of each other) which makes it impossible to accurately extract the text.
So no, it is not about stupidity or bad Microsoft softare. PDF just is unsuitable for accessable documents.