Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Patents

W3C Seeks Feedback on VoiceXML 151

Janet Daly of W3C sent along a note about the VoiceXML 2.0 draft specification. As you may recall, VoiceXML is useful to make your web server speak. Daly points out that as things stand, many members have declared that they have patents related to the standard and would require royalty payments. Like the other W3C/patent issues we've posted about recently, they're seeking public feedback.

jdaly writes: "Today, W3C announced that VoiceXML 2.0 has been issued as a first public Working Draft. Press materials went across various wire services. Rather than send simply a press release here, W3C would like to give more specific information of interest to Slashdot readers. Of note is a section from the "Status of the document" section of VoiceXML 2.0 draft:

"This document seeks Member and public comment on both the technical design and the patent licensing issues arising out of the disclosure and licensing statements that have been made. Our decision to publish this first public working draft has been made to secure early comments from the community, but does not imply that all questions of patent licensing have been resolved or clarified. They must be resolved or work on this document in W3C will stop.

As things stand at the time of publication of this specification, implementations conforming to this specification may require royalty bearing licenses for essential IPR. Further information can be found in the patent disclosures page. The patent policy for W3C as a whole is under wide discussion. A set of commitments by all participants in the Voice Browser Activity to royalty free is a possibility for the future but has NOT been made at time of publication."

As IPR issues are important to Slashdot readers, we are striving to make this information available to them as soon as possible. W3C strongly encourages those with an interest in this specification to consider using the comment list, www-voice@w3.org, which is archived. There is no deadline for comments on a first public Working Draft.

Regards, Janet Daly, W3C"

This discussion has been archived. No new comments can be posted.

W3C Seeks Feedback on VoiceXML

Comments Filter:
  • by Bruce Perens ( 3872 ) <bruce@perens.com> on Tuesday October 23, 2001 @03:49PM (#2468538) Homepage Journal
    HP will change its declaration in this matter to royalty-free in accordance with the royalty-free-only patent policy that HP urged on W3C.

    Thanks

    Bruce

  • ...that when we are browsing /. we will actually hear "first post"?

  • I thought it was a given that to get the patent issues resolved things will have to be changed. Why then seek public comment now rather than wait until it is more stable? Is it to create pressure on potential claimants? I can see how pressure would help but why would public comment create it?
  • This is crap. W3C policy should be that a company that wants to contribute to a standard must not, under any circunstace obtain a patent on any part of it, and that standards must avoid previous patents at all costs.

    Patens are evil, they'll ruin our ability to do free software work with all other software. Imagine if Shakespere had patented his own findings of how to put the english language to good use. We'd either be paying royalties for speaking or we'd be using diferent dialects of english to avoid patent issues.

    This is the exact same case. XML/HTML/XHTML/etc are the languages of the internet, they define the structure of our speech, it's grammar. Patent them and we'll be paying royalties for speaking through this wonderful electronic medium.

    This situation just plain sucks.

    • Re:W3C policy (Score:3, Interesting)

      by Gid1 ( 23642 )
      I back the view that the code _should_ be patented, but guaranteed royalty-free. AFAICR, Oracle stick to this policy. Even better, to transfer the patent to the W3C in trust.

      The reason being that obtaining a patent is far easier than not owning it and then having to prove prior art in court when some disreputable company patents it later and sues you (the inventor)!

      Defensive patenting, basically.
      • Sorry, this is completely stupid. Only the US (and a couple of other countries) allow software patents; the other one hundred and twenty-something countries don't. This is just another place where the US needs to step into line with the rest of the world.

        • Not sure whether you were replying to my post as completely stupid or the parent.

          To clarify, I agree with _some_ (very few) software patents... ones which are not the slightest bit obvious and cost a lot in development, like some of the stuff in JPEG, MP3, etc.

          However, I most definitely do not agree with stealth patenting, such as UNISYS's behaviour over LZW => GIF.

          Even so, since software patenting is, unfortunately, rife, I can't really blame companies such as Oracle patenting defensively to protect themselves from outside attack, as opposed to protecting their profits. It'd be nice if they handed over their patents to open groups to hold in trust though. That way, everyone can benefit.
          • since software patenting is, unfortunately, rife

            Software patenting is not rife. It isn't even legal in over 98% of the countries in the world. It's vanishingly rare. The main place it is legal is (not coincidentally) the most lawyer-ridden country on the planet.

            Frankly the rest of the world does not want the United States exporting it's beurocratic idiocies to us - we have enough of our own. This is your mess. Clean it up.

    • I would only say that misuse of patents is evil (and stupid). Wasn't there a small story recently (possibly an urban myth) about someone in Australia patenting the wheel? Heh.

      Of course, we all know that the patent owner is required to police their own patents... I imagine that suing everyone who used HTML would be an interesting job for someone.

    • Patents are death for the little guys, but just poker chips to the big guys.

      They can just trade patent licences with each other, and stop any newcomer from upsetting the apple cart.
  • Joy. (Score:2, Funny)

    by Renraku ( 518261 )
    A talking webserver..now instead of having to guess if its overheating, has a virus, or is under a denial of service attack, it can tell you. Does this have anything to do with that emotional car that was posted a while ago? I don't want a webserver under my control crying or frowning at me...a server farm is scary enough without having talking servers..
    • Dude, thanks for the mental image, google would get insane sysadmins. Imagine running around 3000 nodes of talking servers!

  • by 4of12 ( 97621 ) on Tuesday October 23, 2001 @03:59PM (#2468615) Homepage Journal

    I thought that feedback was one of the biggest problems with voices. My ears still ring from a Who concert years ago!

  • I currently hold a patent on the idea of an idea. If anybody out there has ideas, you must pay royalties to me. Money is okay, blood is preferred.
  • imagine if this weren't something as fringe-useful (yes, it is useful to hearing impaired and a token number of folks who desperately want to hear web pages or use TellMe) as VoiceXML... imagine if it were... SMTP (yeah, I know, IETF not W3C).

    what if the SMTP spec was approved and made an official "standard" with Micro$oft or $un claiming ownership? Would e-mail be the most widely used Net application? Would we be back in the days of LANs supporting 10 different e-mail standards?

    VoiceXML *is* cool (I occasionaly use TellMe to get movie times/locations), but what's the point of making it a "standard" if I'll have to license my software to the firm with the highet paid lawyers?
    we should give it all up if this is going to be the wave of the future for the W3C. Why not just develop and license apps to recognise and display docs written in QuarkXPress tags.

    better yet, let's all just switch the web to PDF and wait a year for it to d/l @ 56K.

    hat's off to payware "standards"...

    maybe i'll just go back to FTP and plain text unless someone manages to patent that.

    • Well, VoiceXML isn't necessarily going to be used to "voice-enable" web pages. It's also going to be used to replace propriatary speech recognition telephony systems. There are a few companies out there that are doing this already:

      VoiceGenie [voicegenie.com], Telera [telera.com], and TellMe [tellme.com].

      Browsing the web using speech for both input and output is stupid because of the limitations of human memory and the serial nature of how we perceive sound. Better alternatives are to speech enable processes, such as buying things or finding out information.

      I could see controlling a web page by voice. With a VXML enabled web site, you could conceivable make each link a voice command, which would then control the browser GUI. I mean, imaging having ./ read back to you!

      On a slightly different angle, it'd be great to have a system at home that did something like Wildfire [wildfire.com].

      Todd

  • by cnkeller ( 181482 ) <cnkeller@ g m a il.com> on Tuesday October 23, 2001 @04:05PM (#2468656) Homepage
    "I'm being slashdotted!" was the cry from web servers everywhere....

    Sorry couldn't resist.

  • by First Person ( 51018 ) on Tuesday October 23, 2001 @04:09PM (#2468683)

    OpenVXI 2.0 was released just last week. According to the message on the VXI-discuss mailing list:

    OpenVXI is a portable open source library that interprets the VoiceXML 1.0 dialog markup language. It provides a full implementation of the VoiceXML 1.0 specification, including all required features and nearly all optional features. Where the VoiceXML 1.0 specification is vague or incomplete, OpenVXI follows industry direction to fill the gaps.

    See http://www.speech.cs.cmu.edu:/openvxi/ for details and source and binary downloads.

    There is currently support for Windows (binaries are included) and Linux. Developers are currently working to add Solaris and Mac OS X.

    NOTE: This is a VoiceXML interpreter. A real system would require a full speech recognition engine and a full text-to-speech implementation. SpeechWorks International [speechworks.com] ships a commercial version [speechworks.com] which connects to their recognizer and TTS products. This is a good playground for experimentation.

  • The thing with VoiceXML is, we probably won't be seeing an open-sourced engine for it. VoiceXML is a standard which works over telephones and VoIP, and thus needs complicated software to run.

    Actually the price of IBM's VoiceServer (i think it's called) is around $40,000. All the ones I've found through research were aimed purely at large companies who'll likely host VoiceXML applications for others.

    In this sort of situation, I don't see any point in paying royalties to the developers of this technology. These companies are the same which'll be selling the server software. How much money could they possibly need?

    (Note: It would be really cool if somebody started developing a free-as-in-everything VoiceXML server.. I'm just not sure if anyone has that much time to devote, since the free text-to-speech technology is a little rough around the edges still)
    • Sorry, another post was posted while I was typing mine, announcing an open source VoiceXML interpreter. I suppose I spoke about 2 minutes too soon =)

      Here's the link for OpenVXI 2.0 [cmu.edu].
    • OpenVXI is an opensource VoiceXML Inteperter. It is available at SourceForge. It is AFAIK the browser Speechworks uses to help sell their ASR. The guys at CMU who are working on Sphynx are the ones managing the OpenVXI so the hope is that some connection between their ASR and OpenVXI will emerge.

      Jer,
  • As tightly bound up in patents as voice/sound is, unless W3C takes a truly RAND stance (free, no less), they may as well get out of the way. You may be looking at "non-discriminatory" license fees of $10,000 from half a dozen big companies. That's $10,000 EACH. (Or maybe more, depending on how greedy they are.)

    Each of these will negotiate a licensing agreement with the others, for free. But they won't discriminate against anybody else, oh, no!

    So why does W3C want to get their hands dirty? Let the big boys go off and negotiate it themselves; that's what they're doing now. This patent-encumbered "standard" will be rather like X was in its early days. And it will fall apart, just like X did when XFree86 started doing the real work, maintenance, and innovation.

    If there is a real RAND, free to anyone using the standard (as written, no Microsoft extensions), then the standard has a chance. That's what W3C should drive home before they promulgate a bunch of "open" (aka proprietary) standards.
  • The non-negotiable condition for inclusion of patented technology should be that the patent be provided on a royalty-free, non-exclusive basis. Any patent not available on such terms should be automatically disqualified.

    The message must be clear. Software patents do not serve the public interest. Instead, they constitute at the best roadblocks -- useful ideas off limits to the public, and at the worst, landmines -- when the patent office grants a patent on a widely used technology.

    • Software patents do not serve the public interest. Instead, they constitute at the best roadblocks -- useful ideas off limits to the public, and at the worst, landmines

      Absolutely.

      The best way expose the faults of the software patent system is to expose the damage it does, not just talk about it. Kudos to W3C if they make a policy of "no standard for royalty burdened patents. It may take a few years but eventually the comfortable computer community will notice that the available standards suck and are missing obvious and necessary solutions.

      If W3C makes a practice of including patented technology, they become a money-making tool for opportunists and big businesses. You don't think smart execs see the $ in getting their patented stuff in a W3C standard?
  • Granted june 2001. They must be joking, or we are looking at another case of "standard group members takes notes on meetings and writes stupid patent that is accepted by the patent office".

    I don't care how fair and square they enforce their RAND policy. A high tech company, especially one that has INNOVATE as their slogan, should be ashamed by filing such patents. Shows total lack of quality control.

    But not to worry. Fiorentina will run them to the ground with the Compaq merger, so some geek could buy the patents at the firesale, and then we could have a patent BBQ?
    • At least it seems to be apparent that, RAND or otherwise, these companies on the w3c board are coming forward and saying "Hey, we want to propose a standard spec XYZ, though we strongly believe that any implementation of it will be covered by our patent". In otherwords, these bodies don't appear to be submarining patents ala Rambus. They'd like to have RAND if they could, but in several cases, they appear to realize that a standard with their name on it is better publicity than a patent.

      And as Bruce Perens has pointed out, HP (a major patent holder for this particular spec) has already backed off on RAND, so it's not likely that this spec will be inaccessable due to patent licensing.

    • Patents normally take a LONG time to be granted.

      When was this applied for? That's what matters.
  • Sometimes the w3 comes out with something useful, clear and powerful. SVG [w3.org] and the original version of XML are examples of this. But they quickly forget their design goals and everything goes to hell. Example: XML is supposed to be a human readable, HTML like markup language for arbitrary data that is easy for a program to parse and understand. Then the committee does its thing and now with name spaces and the other additions, XML is about as readable as a binary file. W3's problem is that they are victims of feature creep. They take something simple and elegant and turn it into a monster. Features are good but they don't seem to know how to stop.
  • The easiest thing to do with VoiceXML would be to wait for Microsoft to appropriate it, embrace it, extend it, and make it a free download. They already have pretty decent speech recognition & synthesis (not the best, but servicable) so chances are they will have the majority of the niche users that actually want to talk to their computers.
  • I'm sure people out there need text to speech technology. I'm sure that VXML would be used by some niche that desperately needs it.

    I'm also sure that it'll never take over the internet, because it's a different medium, and has the same drawbacks as other spoken media, both citizen band and broadcast. Audio is linear, the web is random access. If you are interested in a portion of a web page, you will skip to that portion immediately, am I right? Besides, audio is almost as intrusive as Flash and Shockwave, only with VXML, it'll be a patented standard. The last thing I like is web sites with noise on them. If I wanted a multimedia experience, I'd play a good game, not Joe Generic's lame attempt at an interactive web page. I surf for information, not for a memorable experience.

    Hmmph. Seems to me W3C should be documenting emerging standards, not creating them.
    • You're missing the point of VoiceXML. It's not about making talking web pages that you surf on your computer. It's about human-computer interaction over voice networks (like, say, a telephone). It's a step up from "Press one for foo; press two for bar".

      And the w3c didn't *create* this standard. The working group started off as the VoiceXML Forum [voicexml.org], including Lucent, Motorola, IBM and AT&T, who were interested in standardizing the API used to create voice site deployments (which was already at that time big business).

    • I think you miss the point. VXML is probably not intended for HTML sites. It was probably made as a standard to repace all those "Hello, to XXXXX press X, to YYYYYYY press Y..." type phone menus. As well as alow phone services with a bit more power/flexability. E.G. you could ring up slashdot, and find out the latest headers, but not nessesarity post, or do other more complex things that you can do in a browser.
      Maybe ring up the cinima, and find out what time a certian move starts while you're on the bus etc...

      And anyway. You may surf the net for infomation only, but I surf for both info and fun. I can't see what makes you think the two can't exist together on the net, things have worked out OK so far.

      Don't flame new technologies just becasue a few people don't know how to use them properly.
      If we went with that ideology, we'd still be with plain HTML, no pics, no different colours, no differnt size fonts...just cause someone might make their text to small or big.

      And who said this new technology would be intrusive? Don't blame the audio as a medium for being intrusive, blame fucking macromedia for such a lack of user controls in their flash player.

    • I'm also sure that it'll never take over the internet, because it's a different medium

      I don't think that most people see it as being in competition with the web but rather in complement to it. Obviously, most of us sat at our computers all day would always prefer to read /. rather than having to listen to it. Text is just quicker to naviagate, you can skip bits and re-read bits as needed.

      One application that we are working on in the PreViking [telesave.net] project is to make a telephone gateway for the web. The VoiceXML (or CallXML) can be translated into IVR commands on the PreViking telephony gateway. You can then literally map a telephone number to a website.

      For example, you would dial some 800 number to access /. PreViking fetches the VoiceXML from the /. webserver. Uses a text-to-speech engine and reads the headlines out to you. You can then select, either by DTMF or voice recognition, the story you want and have the article read to you. The opportunities for this are endless.

      I would also much perfer when out-and-about to have voice interaction with the internet then having to fiddle around with a small and slow WAP interface. Companies can make there customers information, sales information, news stories available easily over the phone wihtout having to deploy any expensive telephony equipement. They just have to alter the web content to generate VoiceXML along with the HTML and have a Voice Application Service Provider to provide the voice facilities.

    • What you are overlooking here is that there are MANY more phones than there are personal computers.

      Jer,
  • Holy crap (Score:3, Funny)

    by fobbman ( 131816 ) on Tuesday October 23, 2001 @04:30PM (#2468801) Homepage
    It is usually the pr0n business that implements new technology, both on the Internet and home multimedia fronts. While it could be really cool to have those nekkid pictures talking to me, the idea of all of those pop-ups literally screaming out at me a dozen at a time would really freak me out.

  • I thought feedback was something to be avoided on a sound channel...
  • At least people are disclosing their patents right now, not after a standard has become de-facto. We don't need more companies like Rambuzz.
  • Clarification (Score:2, Informative)

    by sllort ( 442574 )
    There are enough posts already claiming that "my web server should yell for help when it gets slashdotted" that it's pretty obvious no one has read the article yet.

    VXML does not make your browser "talk". It is a markup language which allows a client known as a "voice browser" to interpret this markup language and speak to you locally.

    obligatory google cache of slasdotted article here [google.com].
  • The issue here isn't that they can't release a new VXML spec. It is that the new spec will logically include ideas that have been patented by other companies.

    The big problem is that VXML is currently at 1.0 and companies are pantenting extensions to that spec. Here is a prime example of how rather than getting involved with creating the spec and helping to push out new revisions, the companies start patenting every obvious thing missing from the 1.0 specification. This is obviously going to prevent further revision implementations from emerging from any company that isn't as rich as HP or IBM or MS etc.

    As for the usefulness of VXML whoever posted this story missed the boat. VXML isn't used to make your server speak it is used to quickly create a IVR system. This is really a useful ability that few slashdotters have realized.
    • Erm, not (quite) necessarily
      There are two reasons why these companies may have patented extensions to VXML 1.0.
      1. (If you are an optimist) These are defensive patents to stop them beeing screwed. They have no intention of enforcing them, but it does mean they can't be forced into licensing these bits of the technology themselves.
      2. (If you are paranoid) The idea is to create an effective cartel between the companies owning the patents by the use of cross licensing of the relevant patents, thus raising the financial bar on any new entrants to the market. This has been doen in the past with GSM.

      Best regards

      treefrog
  • by ckuhtz ( 87644 )
    One of the primary drivers behind VoiceXML or VoXML was vmail and unified messaging systems.

    *sigh*

  • Some /.ers don't seem to care so much about speech recognition - niche technology? When natural language parsers get more intelligent, speech recognition will be the internet in your car. Just think star trek.
  • I have been working in the CTI-IVR (Computer Telephony Integration - Interactive Voice response) business for a decade, and a huge amount of the patents are blatently obvious to practioners of the art (me). There is even prior art for some of this. For example:
    US Patent 6,035,275: Method and apparatus for executing a human-machine dialogue in the form of two-sided speech as based on a modular dialogue structure Published 2000-03-07 on behalf of inventors Brode, et al. Assignee: Philips

    Abstract:
    In a dialogue structure outputting speech items interrogating an access call while examining subsequently received human speech items for ascertaining an actual transaction instance further outputting speech in accordance with the ascertaining until either attaining a positive transaction result, or otherwise exiting the dialogue in case of failure. In particular, the dialogue is constructed from hierarchically arranged and callable subdialogues constituting respective mutually independent building blocks, which are arranged for generating a particular outcome if a positive result is attained by the subdialogue in question. The subdialogues offer interfaces for mutual coupling with a hierarchically superior subdialogue, so that the overall structure is formed as based on a selection of subdialogues and exclusively based on required partial results by each of the subdialogues in the structure.

    This completely describes any typical IVR scripting engine that has been around since the late 1980's (AT&T's Conversant IVRs on Unix systems come to mind). Visual Voice products that I used to create an IVR chat system, back in mid 1990's would do exactly the above - and since it was VB based, I could even pull up web pages for data (which I did just to provide a wather report option to feed to the TTS engine as a secret test menu option, as well as Tuxedo screen scraping of a virtual 3270 hooked to big iron). The patent quoted was applied for after that time. Its clearly bogus.
    The prompt element includes an announcement to be read to the user. The input element includes at least one input that corresponds to a user input. A method in accordance with the present invention includes the steps of creating a markup language document having a plurality of elements, selecting a prompt element, and defining a voice communication in the prompt element to be read to the user. The method further includes the steps of selecting an input element and defining an input variable to store data inputted by the user."

    This is more of the same fomr a differnt patent. Liek I said, this is all obvious and common practice for IVR script writers, and anyone that has a few brain cells going. Furthermore, "input variables" and things like that are not inventions, they are common sense. Its just not that hard.
    Damnit, how do they get away with patenting what are commonn practices? the patent examiners must be total f**king morons.
  • I looked at this article for two seconds, went to the w3c page, and spent the next five hours fixing my html so it was html 4.01 transitional compliant.
  • I would prefer to see better content, rather than seeing current just suck up more bandwidth. Sure, voice is neat, but in most cases it would just be used to add more widgets rather than fulfilling a needed function. Yes, there are legitimate uses for this, but most uses will just be for the 'gee whiz' factor.
  • I recently had to become a bit of a VoiceXML expert for a project at work. From what I have seen, using VoiceXML for talking web sites is actually not what most people are using it for. VoiceXML is used primarily for automated phone transactions. If you have ever called your bank or credit card company to get your balance or conduct some other type of transaction, then you know the type of phone system that I am talking about. If that system was capable of voice recognition, chances are that it was programmed using VoiceXML. VoiceXML is also quite capable of making outbound calls; it is conceivable that you might start receiving completely automated telemarketer calls in the next couple of years.

    I wrote my own VoiceXML app which prompted you to say your name and the name of someone whom you wanted to get ahold of and the system would hunt down that person on various devices (e-mail, AIM, pager, telephone, SMS, fax, etc.) and let that person know that you are looking for them. It worked unbelievably well and VoiceXML made the voice recognition part of it trivial. And if you need a VoiceXML solution, I would strongly suggest that you consder Voice Genie (www.voicegenie.com).

    • Good point, from reading and limited playing it seems like a more available version of IVR rathen than a talking web-page box. Analogies could be the drop in price and increase in availability when the web took off instead of gopher or Unix boxen instead of mainframes.
  • ...less chance of me seeing XML.
  • Ok, truth first: I haven't read the patent licence, but here's a though:

    Why the heck would I want to look over the public draft, suggest corrections and then (if my corrections are incorporated) pay a fee to use this standard?

    Isn't that a bit stupid? Like Microsoft asking you to write code for Windows, which it can sell back to you later?

    I say boycott this. W3C Patent = closed stadards = noone using them = we need another free body?

    Boky
  • I can't seem to find anything already posted, so I am gonna mention it...
    Didn't anyone notice that Slashdot was singled out specifically and appealed to for comment. Thats like a huge step, in gaining relavance in the community. Slashdot, is slow becoming a legit political force of sorts.
  • who cares about specifications...the whole idea is complety pointless.Its technology for technology's sake...I am struggling to think of any areas where speech has the advantage over a visually represented app...

    and besides think of the amount of training you have to do train voice recognition software...

    PLEASE FLAME ME

All warranty and guarantee clauses become null and void upon payment of invoice.

Working...