Data Mining And The CIA 107
Brotha Z writes "It seems that the CIA has developed a piece of software labeled "Oasis" that can convert the audio from television and radio broadcasts in to text. This software is stated to be able to determine the sex of the speaker, if the speaker is a different person than the original speaker - and if one of the speakers is named, it will continue to place the name next to the correct speaker from that point on. More information on this multi-faceted piece of software can be found here." Hmmm. Sounds like some nice speech recognition technology ("perfect demo" alert!), but as a taxpayer, something rings badly about it. If they're going to use my money to spy on me, can't they at least open source the code so I can dictate a letter?
Freedom of information (Score:1)
TV and radio my butt (Score:1)
Badly ringing things. (Score:1)
All your grammar are belong to us.
If this has been around for a while (Score:1)
Some good-ass weed. (Score:1)
Re:Human element (Score:1)
deeznutsclan wrote:
Besides, counterintelligence is the FBI's job, [not] the CIA's.
Counterintelligence is both the CIA's and the FBI's job. The CIA has the power to investigate any leaks within its walls, among its staff, etc. It's only after they find evidence that they turn matters over to the FBI, who build the case for prosecution and add to the evidence gathering efforts beyond the CIA's walls.
As for Ames and Howard, it was shown during congressional hearings that there had been plenty of warnings (i.e. alcoholism, spending beyond the officer's means, etc.) way before they were discovered (or caught in the case of Ames), and evidence of either incompetence or plain sabotage, yet their supervising officers did nothing but file a report that nobody looked at until it was too late.
Again, the most important part of CI work is the human element, not the fancy gizmos.
Cheers!
ERe: (Score:1)
Re:who's more stupid? (Score:1)
Re:Better than automated closed captioning? (Score:1)
Credit to the CIA (Score:1)
---
Re:well... (Score:1)
That's because spy agencies are in the habit of giving everything obfuscating code names. 'Oaisis' is just the code name for a system that is actually named "Horny Balrog on an Amorous Rampage", which is about as threatening as it gets.
Of course, it works both ways. The CIA code name for 'pillow' is "convulsing victim".
--
Re:There are many technologies that you pay for... (Score:1)
Yeah, when you spend a cold lonely night at a listening post you have to turn of the sound on your p0rn0 flicks so you can hear the people you are spying on, and this will let the agents follow the dialog in the movies, so they will understand the plot.
--
CIA, Virage, Pictron, and others. (Score:1)
This is absolutely nothing new.
---------------------------------
Only in America will someone order a
Big Mac, large fries, and a Diet Coke.
Re:Misleading story, but looks who is talking (Score:1)
Re:Misleading story, but looks who is talking (Score:1)
work
it
out
Re:Not terribly new or surprising (Score:1)
If anyone else is interested, you can get the paper on this here [nec.com]. Links to various formats on the top right corner. It also has many links to related documents.
Re:Actually.. (Score:1)
--
Obfuscated e-mail addresses won't stop sadistic 12-year-old ACs.
An actual transcript! (Score:1)
Transcript follows:
Captain: What happen?
Mechanic: Someone set up us the bomb
Operator: We get signal
Captain: What!
Operator: Main screen turn on.
Captain: It's you!!
Osama Bin Laden: How are you gentlemen!!
Osama Bin Laden: All your base are belong to us
Osama Bin Laden: You are on the way to destruction
Captain: What you say?
Osama Bin Laden: You have no chance to survive make your time
Osama Bin Laden: Ha ha ha
Operator: Captain!!
Captain: Take off every 'ZIG'!!
Captain: Move 'ZIG'.
Captain: For great justice.
- Twid
Re:source code (Score:1)
Re:What happens if... (Score:1)
You can't really rely on, say, most American news media for good in-depth international coverage, after all -- especially not our television. It's not like ours are state-run and therefore constrained to present, at length, the party line of a ruling government.
Re:Misleading story, but looks who is talking (Score:1)
Because that's where the CIA gets a majority of their information. As much as you would like to believe that they have thousands of secret agents running around, A good part of intel comes from Signal Intellegence. Ever see TV shows when they walk down the rows of country desks? There are hundreds of TVs tuned into "local" news. Read a Tom Clancy novel, he's really good at describing how the system works.
Re:Actually.. (Score:1)
There. As I got modded down for flamebait in the previous message, I figure I'm obligated to actually post some.
Re:Well, the technology is definitely there... (Score:1)
Even scarier yet-- (Score:1)
I fear that the transcriptions will be tampered with, and noone will double check the 'source code'. I mean, if I wanted a transcription of, say, Kevin Mitnick, I supose I'd get a buncha "BLEEP"s. It's scarry in a way, because we will depend on the data more, and double check less. It happens all the time. A slipped digit here, and granny is on the 10:00 news.
I don't find it bad that it can track conversations, and recognise voices, but if eschelon were to pipe this on a foreign server, they could send back the results to Intel(the real one duh!). One slight problem, with the eschelon system-the French accused industrial espionage due to losing an airline contract under questionable circumstances.
What I demand to know, "Who the hell is watching the watchers, eh?" What are their limitations(as per constitution, illegal search and seizure, confiscation of intellectual property, etc.)...
Anyone who gives up freedom for security deserves neither.
Ben Franklin
Re:Paranoia paranoia! (Score:1)
Re:Uhm, yes. (Score:1)
I remember somebody once saying that the only difference between a highly classified intelligence report and the story about the same subject or event on CNN is that the intelligence report would have more names and more correlating data. Sounds plausible.
--
News for geeks in Austin: www.geekaustin.org [geekaustin.org]
Re:TV and radio my butt (Score:1)
Nah, pan frying is better. Have you ever noticed how close paranioa and meglomania are? I do almost everytime I read /.
Kahuna Burger
Re:About coloinsts breaking the law (Score:1)
ah, but in the US, the "patriots" weren't willing to actually wait until the majority of the population agreed with them. Between the loyalists and the apathetic, revolution was not a majority position. Thats why the revolutionaries had to use propaganda (such as the yellow journalism surrounding the so called "boston massacre" that we just celebrated the aniversery of) and lies to whip up enough support to start something, then once they were violent enough to get a retaliation that pushed more of the apathetics into opposing the British "oppressors".
History is a funny thing. Things that everyone knows turn out not to be true sometimes.
Kahuna Burger
Re:About coloinsts breaking the law (Score:1)
well, in fairness, we don't have an "anti confederate day" celebration or anything. But if the colonists had lost, would there be hardcore yankees who called themselves "declarationists" and wanted to use the 13 star flag on license plates? And argue for the right to fly said flag over state capitols in the orriginal colony states? And would officials of the Angican church say that such was just "anti crown bigotry dressed up as cultural pride"?
Well, we can't take these things too far, I guess. :>
Kahuna Burger
Re: (Score:1)
this can be good (Score:1)
I don't quite like the idea that they have the ability to separate one voice from the next digitally, but I guess that in some distant future all that will be done flawlessly and we'll all be screwed as far as that goes.
And by the time that happens, they will also have a very firm grasp on digital voice production, meaning so much for the freedom of speech ... or so much against, I should say.
And open source? Hah! Let's just open source the Pentagon while we're at it... Actually, I wouldn't mind knowing what really happened to Marilyn Monroe...
Re:this can be good (Score:1)
Well, the technology is definitely there... (Score:1)
Re:Even scarier yet-- (Score:1)
who's watching the watchers?
CIA/NSA Deal? (Score:1)
Bart Simpson? (Score:1)
Bart Simpson: Male
Bart's Voice: Female (Nancy Cartwright)
Software says: Shemale?
Re:When it becomes public... (Score:1)
________
Re:That's funny... (Score:1)
as the sensor picks it up (and there's two sensitivity settings for it).
..Sounds good to me! (Score:1)
Pig Latin? (Score:1)
There must be many ways around this technology (encrypted voice over ip comes to mind) but the only ones who will take the time to do so are the supposed targets.
Open sourcing spying software. (Score:1)
do contribute our tax money towards it. But if
it is open sourced, don't you think we are
throwing hard earned technology to enemy
spies for nothing too?
whose spying on who. (Score:1)
Anyway, I work for a defense contractor on classified projects. The whole point of having the project classified is so that the other countries don't know our secrets. We do missile jammmers. It becomes pretty pointless if we open source the code/firmware so you can jam the police radar, because then other countries know how our jammer works, and simply make their missiles act in a way that our jammers can't stop.
The same may go for any software the CIA designs. If they explain thoroughly to you how they're spying on other countries, then the other countries know how they are being spied on, and they take actions to prevent it.
DAve
Yeah, it even translates Volapuk ! (Score:1)
Re:TV and radio my butt (Score:1)
"Everything that can be invented has been invented."
is FLUENT double speak? (Score:1)
Copyright (Score:1)
Let's just hope the RIAA doesn't hear about this!
---
Paranoia paranoia! (Score:1)
On the other hand something like this would be great with some controls in place as to how the government uses it. But I don't believe for an instant that such a device works as well as claimed. Maybe someday in the future, but not today.
Re:About coloinsts breaking the law (Score:1)
Re:Paranoia paranoia! (Score:1)
The point is, if the majority of the population does not believe something is right, they change. Or they put someone in government that will. That is the way democracy works (at least ideally, I know it's not that simple, but the concept holds true to some degree). If enough people wanted pot legal, you see people that want pot legal being elected into office. Get enough of them and you suddenly have legal pot.
Warning, my opinions do not come with a warrenty.
About coloinsts breaking the law (Score:1)
Eray:Enwhay itway ecomesbay ublicpay... (Score:1)
IWAY ersonallypay ouldway OVELAY otay eesay away ugehay earchablesay, onway-inelay atabaseday ofway everythingway everway aidsay ybay anyoneway atthay asway oadcastedbray. Imagineway ethay implicationsway. IWAY'day earchsay orfay allway ofway ymay ocallay oliticianspay otay eesay ifway eythay everway aidsay anythingway upidstay inway eirthay eviouspray ifelay asway away okedcay-outway-Iamimay-elevangelisttay. IWAY'day alsoway earchsay orfay ymay ownway amenay otay eesay ifway IWAY issedmay away ongsay edicationday orway anway NPRAY onsorshipspay inway ymay amenay.
IWAY uessgay away otablenay awbackdray isway atthay ethay IACAY ouldcay ettypray easilyway anscay ellcay-onephay andwidthsbay asway ellway... ocumentingday anyway 'otablenay' ivatepray onversationscay. Erhapspay eway ouldshay allway artstay alkingtay inway igpay-atinlay otay avoidway ethay IACAY'say attentionway, alway alay Apsternay?
I wonder if it's possible ... (Score:1)
Re:Actually.. (Score:1)
I wasn't implying any sort of exotic technology, just saying that, as it doesn't currently run on anything resembling normal hardware, it most likely isn't anywhere really near a point where it would be something that would be used by, say, the CIA.
I imagine that this *could* eventually be simulated by some sort of software- can't see any reason why not, but, as you mentioned, it would probably require a lot of processing balls to simulate in realtime, which is basically what I was saying.
Re:There are many technologies that you pay for... (Score:1)
I'm sure that there are a lot of TV programs and older shows/movies which are not closed captioned. This technology could make this type of media more accesible to the hearing deficient.
Also, if embedded into a cellphone, I could READ radio chats and interviews and stuff when I don't want to create noise for other people or look goofy wearing headphones.
And of course, militarily, it could be used to monitor every voice communication in the world to search for things that the deployer might want to know about. Eg, planned acts of terrorism, organized crime activity, exessive use of free speech, etc. And you wouldn't need to save it as a bulky audio file.
O'Toole's Commentary on Murphy's Law:
What happens if... (Score:1)
And it converts speech on TV, so what happens if The Terminator plays on Fox and 2 hour later Predator plays on TBS? Does Arnolds character in Predator get named "Terminator"? And suppose a comedian does a decent impression of Arnold, does The Terminator suddenly appear in the transcript of his comedy routine?
One other thing bothers me about this. The software was developed by the CIA right? Why is the CIA interested in knowing what happens on TV and radio? Well duh, they aren't. I'd bet this is going to be used to tap phones and eliminate the cliche of the agents smoking in the dark room above the bad guy's apartment anxiously listening to his phone conversations. (Or do they already have something for that purpose?)
"// this is the most hacked, evil, bastardized thing I've ever seen. kjb"
There are many technologies that you pay for... (Score:1)
Bah. (Score:1)
Maybe the CIA improved the AI somewhat, but it sounds pretty similar. At least they're telling us more about it, though.
However, I'm sure slashdot will still link to RealAudio every time someone gives a talk, so I don't care. If they think I have anything incriminating to say, then they're wasting my money.
Re:well... (Score:1)
Oasis: brackish pool of filthy disease-ridden sludge surrounded on all sides by a desolate wasteland under an unforgiving midday sun.
Suitable metaphor for The Company?
--
Sometimes nothing is a real cool hand.-- Cool Hand Luke
sounds really cool.. (Score:1)
Re:Misleading story, but looks who is talking (Score:1)
Well, it is more like 15%, especially if you want to do it fast (this typically means 10 times slower than realtime). One can't really afford to run at 300xRT for large-scale transcription. When we had to recognise 500 hours of broadcast news we ran our system at around 15xRT.
Gunnar
Re:Even scarier yet-- (Score:1)
It's the CIA. In theory they are not allowed to surveil US citizens on US soil. So most of that stuff doesn't apply.
Internet2 video datamining (see screenshots) (Score:1)
Internet2 [internet2.edu], a gigabit network [iu.edu] for education and research [internet2.edu] (see PDF map) [iu.edu], has a major future use as an audio-video storage library and distribution network [cmu.edu]. Video-napster? CMU's Internet2 Informedia Library project [cmu.edu] researchers are designing visual-video search software [cmu.edu] for faces, on-screen text, images and shapes. Computers finding on-screen people, text and similar programming... scary.
Check out this presentation with screen shots about Internet2 [internet2.edu], and its cool tools, uses and experiments. Slide 36 [internet2.edu] shows Facial Recognition and Optical Character Recognition (OCR) at work. It works so well, it finds text (bottom right) on the U.S. Capital's dome columns... whoops. Slide 37 "Similar Shapes/Content" shows examples of similar content of female news anchors [internet2.edu], and soccer / football [internet2.edu].
remove the nofreakinspam. to e-mail me.
well... (Score:1)
Others are working on similar projects (Score:1)
The US government aren't the only people to have cottoned on to the power of convertnig spoken words into text. The THISL [shef.ac.uk] project aims to provide broadcasters and other news gathering organisations with a powerful tool using this tchnique.
Firstly the news archive is passed through a speech recognition system which basically produces transcripts of every news item. Then a powerful text search may be applied to the database to locate information relevent to a particular topic. If this is being done for research it is most likely all the information required can be gleened from the transcript however the original recording may also be retrieved using the archive reference stored in the database.
Re:Uhm, yes. (Score:1)
The article referenced only mentions using it on TV and radio broadcasts. While I'm sure that it was probably developed and tested with TV and radio, it could just as easily be used to monitor cellular and private radio communications. This is, after all, a spy agency that we're talking about.
Now they don't need to have people whose job is to monitor the cellphone conversations of select suspected/known terrorists. Instead, they simply collect all the data and run it through OASIS. Then do a few keyword searches (using FLUENT if need be) for words like "bomb," "Iraq," "Jihad," or "assasination." They could maybe even do more complex searches (+bomb +embassy). Now that they've got a report, an agent looks over it to determine which conversations are interesting ("Did you get the bomb making supplies?" vs. "Yo! Kobe Bryant is da bomb!!!") and starts investigating.
Isn't technology wonderful?
Re:Better than automated closed captioning? (Score:1)
Re:Bah. (Score:1)
I'd like to know more about how accurate it is before I get all upset about it. Yes people have been doing stuff like this for years, but it hasnt been all that robust so far. If the CIA got it to work like the article makes it sounds and with good accuracy then they have some pretty nice software. Otherwise, its just more PR to make the president and congress give them a pat on the back and more money to play with.
I wouldnt be too suprised if it is really accurate though. The government, at least the part that likes to keep a low profile (CIA, NSA, etc.), always seems to be about 10 years ahead of the public on the technology curve
I wonder how it compares with (Score:1)
This has potential, but I suspect it (Score:1)
I'd really like to see this kind of technology applied to search engines. My experience on speech recognition and computer translation is rather limited, but I think the available software does needs lots of refinement before it can be applied to things like this. They (CIA, FBI, DEA) are after fool-proof evidence to be used in court, after all.
An experiment with normal translation software English-RANDOM()-English shows that sayings like 'Oh dear' may convert to likes of 'Butter love'. And this with grammatically correct and polished english. If the language used is less correct, what do you get? Imagine all the errors a typical speech-recognition software does.
Now, if this is applied in practise:
1. Tape all cell-phone calls of a suspect terrorist or drug-smuggler. Now, these might be in grammatically less-correct heavy-dialect spanish or arabic or sanskrit or whatever...
2. Let your speech-recognition software convert this to text.
3. Translate the text to english with translation software.
4. Arrest the guy and try to convice the jury he deserves a prison sentence.
5. Get (literally) laughed out of court.
Related reading (Score:1)
Gosh, and I thought OASIS... (Score:1)
Didn't echelon... (Score:1)
Re:Yeah, it even translates Volapuk ! (Score:1)
Pedantic, I know, but there you are.
It's been around... (Score:2)
The bottom line of this kind of technology is that although the speech recognition itself is relatively poor it is helped by the fact that most of the interesting words (names of people, places, etc.) occur very often in the same segment. So, it's all statistics. No accurate transcription needs to be made to achieve this kind of result. Therefore, applications such as automatic sub-titling are not possible with such systems. And I think they are still quite far away too.
As for the CIA claiming break throughs, well, I think other people can say wittier things about that.
Theo
Re:Misleading story, but looks who is talking (Score:2)
Large vocabulary (but somehow predictible), speaker trained to overarticulate, no superposition between different speakers, slightly simpler language model (complete phrases, language close to written).
State-of-the-art recognizers have an error rate of ~10% on that test, which until last year was one of the evaluation tests at the speech group at NIST. Check http://www.nist.gov/speech/tests/index.htm [nist.gov] for details.
Since the point of disminishing returns was reached, the test in going to be replaced with a new one, a audio/video recorded meeting transcription. Much, much harder.
OG.
Re:Uhm, yes. (Score:2)
Re:Not terribly new or surprising (Score:2)
Plus you can also find out what world leaders are thinking by reading the newspapers in a country and listening to the national radio station. I would imagine that something like a Tivo would make this much easer for them.
Human element (Score:2)
From the last sentence in the article:
Another intelligence official, on condition of anonymity, said: "If they have this kind of technology to plumb the depths of open sources, you can imagine what kind of technologies they have to track down spies."
All this technology wasn't good enough to track down Aldrich Ames, Edward Lee Howard, or the FBI's Hannsen, who together are probably the biggest moles in the history of espionage. People forget that tools are useful/automatic, but they aren't intelligent. Someone must be at the controls to interpret and act on the data. This tool sounds great, and there could be potential civilian uses beyond CI, but people must remember it's only a tool.
Cheers!
ERe:Actually.. (Score:2)
Also without bothering to RTFA, I'll repeat Paradise_Pete's question: Do you know what a neural net is?
You see, as I assume was his point, "computer chip neurons" work differently from central processing units, but not from the "data structure neurons" that can be trivially implemented in a program running on a "regular computer" to simulate the exact same neural net. The fact that they did it in hardware is interesting in its own right to someone interested in neural net research (I'll probably go read it later), and perhaps the speed factor is so great that a software version couldn't run in real time (which I guess could be what you meant) or would require an astoundingly powerful and expensive conventional computer in order to do so, but there is nothing special about "computer chip neurons" that in principle prevents the same thing from being done in software on a "regular computer".
Maybe this truly "doesn't run on regular computers" simply because they haven't implemented such a sofware-based simulator, but that's very different from implying that it's based on some kind of exotic technology that a Von Neumann machine is fundamentally incapable of duplicating, which is what it sounded like you were claiming and which is probably what Paradise_Pete objected to (and was wrongly punished for).
David Gould
Re:If this has been around for a while (Score:2)
My English teacher would cringe at that run-on.
Uhm, yes. (Score:2)
I mean, like, really, now, dude. Are they going to start scanning soap operas for the sake of national security? Is Jay Leno broadcasting national secrets? Someone clue me in on the intelligence application here.
I suppose it might be handy for transcribing the numbers stations, though somehow I doubt that they'll seem quite so glamorous in ASCII:
--
Open the code? Yeah, right. (Score:2)
Then again, stranger things have happened. But I would bet the proverbial farm that the guts of the software is Classified.
Not about speech recognition (Score:2)
Personally, I think this has been done before to a certain degree, the resources available to the CIA (and their counterparts) are just becoming incredibly huge. Given the increasing amount of traffic that is generated by Internet users, they're probably pretty happy about that.
On the terrorists who are being mentioned all the time in that article: they're probably using encryption technology anyway, so I'm not sure if the really dangerous people will be caught with that system.
Re:Actually.. (Score:2)
And why did the benchmark only involve a few words? Because that's all it can recognize. This thing isn't doesn't do speech recognition, it does sound recognition; IIRC, it can only handle single syllables words, and only four or five at that, and no sound-alikes. (I think "yes" and "no" were half its vocabulary.) It might be breakthrough for such a small ANN, but it's not that useful as a natlang system. I suppose something similar could make a good front-end to more complete system, though.
Re:Uhm, yes. (Score:2)
Re:..Sounds good to me! (Score:2)
What amuses the hell out of me though, is that this kind of works against them if their own theories hold true.
The way I see it, almost nobody else goes to such efforts no matter how paranoid they are, and even if some phone-listening machine was being put to use, all they're doing is ensuring that they will be listened to.
And it's not that I don't think this sort of thing goes on or anything, it's just that I don't bother fighting it anymore now that they're able to read (and control) all of our minds anyway.
"Everything you know is wrong. (And stupid.)"
okay, who will file the FOIA request? (Score:2)
Anyone got a couple of spare lawyers looking for a fun afternoon or twenty?
Oasis and Foreign Broadcast Information Service (Score:2)
I suspect the most common use of this sort of software is to monitor foreign broadcasts - something the CIA/OSS has been doing for more than 50 years. Traditionally, this has been done through a group (mentioned in the article) called the Foreign Broadcast Information Service (FBIS). FBIS monitors newspapers/broadcasts of many, many non-US media sources and makes this information available to US Government agencies.
For many years, FBIS made available to the public a daily paper copy product via the US Dept of Commerce's National Technical Information Service (NTIS [ntis.gov]) that was fedex'ed daily to hundreds of subscribers around the country/world. There were several issues, broken down by regions. For many years, it was one of the best public ways to track what was happening in the Soviet space program.
It's widely known that FBIS/CIA as been developing and using technology to aid the translation process for many years.
A few years ago, they dropped the paper product and moved to an electronic version.
The FBIS server to distribute the information to US Government users can be seen at http://199.221.15.211/ [199.221.15.211] and can be found via a simple Google search on "FBIS".
The public can access this information via NTIS's World News Connection system (http://wnc.fedworld.gov [fedworld.gov]). Yes, there is a charge to use WNC, because NTIS has to pay copyright (gasp!!!!!) to the foriegn sources (just because you steal the data stream doesn't mean you own it!) as well as operate the system. It's pretty well known that foriegn sources who complain loud enough also get paid by the Govt for the US govt use of the data.
Re:Not terribly new or surprising (Score:2)
It doesn't always have to be speaker-independent. Since it doesn't have to be real-time, all you need to do is identify the speaker, and then start over. If we're really talking about TV and radio sources, then there are going to be a large number of regularly-appearing speakers. Just a SWAG, but I'll bet that under a million people account for 80% of all the TV and radio minutes worldwide.
Re:well... (Score:2)
Observation At Several Interacting Scales
Operational Application of Special Intelligence Systems
Oracle Application Software Implementation Strategy
"My one oasis in the dust and drouth Of city life."--Tennyson
Re:Misleading story, but looks who is talking (Score:2)
I don't think you realize how boring and mundane most intelligence work is. Thousands of extremely junior people sit all day long translating newspapers and transcribing radio/TV broadcasts. Much of this stuff is made available through FBIS (pronounced "fibis") to further bore people slightly higher up the ladder throughout the government and contracting agencies.
However, it is useful once in a while. Especially when looking back and saying "Now how didn't we catch that?" If it could be brought online cheaper and more quickly, I can see how this would be well worth the money - without being particularly draconian (except insofar as the concentration of enough otherwise innocuous information can be quite powerful).
Sometimes, just sometimes, they mean what they say.
When it becomes public... (Score:2)
I personally would LOVE to see a huge searchable, on-line database of everything ever said by anyone that was broadcasted. Imagine the implications. I'd search for all of my local politicians to see if they ever said anything stupid in their previous life as a coked-out-Miami-televangelist. I'd also search for my own name to see if I missed a song dedication or an NPR sponsorship in my name.
I guess a notable drawback is that the CIA could pretty easily scan cell-phone bandwidths as well... documenting any 'notable' private conversations. Perhaps we should all start talking in pig-latin to avoid the CIA's attention, al la Napster?
Spy On Me? (Score:2)
I may be wrong, but doesn't the CIA's charter say that they cannot conduct operations on native soil?
That's funny... (Score:3)
Morning sarcasm. I'll get back to work.
TellMe (Score:3)
Beyond that, the TellMe service should also recognize the command "shut up" along with "stop" and "tell me more". I mean, if you're going to have a voice-activated phone portal, why not use "natural language" for commands? ("Shut the hell up you stupid bitch! I said "stock quotes" not "stock racing"!)
For those of you who have no idea what I'm talking about, dial 1-800-555-TELL. The service is free, for now.
Actually.. (Score:3)
and
I don't know about you, but I'm pretty damned impressed.
the article on this system [usc.edu]
Misleading story, but looks who is talking (Score:3)
I don't understand why they specifically mentioned TV and radio. If the audio is digitised before being pass to the software, it doesn't really matter where it comes from. Maybe they are trying to draw attention from the fact that it can be used on things like making transcripts of phone calls, normal conversations recorded with various listen devices?
About that feature that id the speaker, imagine a conversation that goes like this:
Speaker 1: You the Man.
Man: No, YOU the MAN.
Man: No no, you Da Bomb
Da Bomb: Hehe
Watch word: BOMB Alert! Alert!
As a final side note, I won-der... if... it... works... if... you... talk like... Cap-tain... K-irk... ;-)
====
Not terribly new or surprising (Score:4)
My guess is that it's really fairly poor speaker independent stuff. It probably does a quick, low quality word recognition algorithm - quite a few of those are around - and then some sort of Bayesian network to correct the transcription using lexical context. I know that ARPA was openly funding people doing exactly that a few years ago, and I'll bet their papers are on the web. It doesn't shock me greatly that someone has had some measure of success with it.
If it was 100% accurate transcription, then I wouldn't believe it. But as a time saving device for transcribers... that I find credible.
DARPA also funds a lot of automatic topic spotting research. One of my ex-profs received grants from them under just such a rubric and her papers are publicly available on the web. I'll bet whatever technology they are using, it was developed by a prof at an open university who publishes freely.
As for multilingual text searching and summarisation, the best technology of its kind known to me is Latent Semantic Analysis - the brain child of Thomas Landauer. It's a fairly recent, but hardly secret or obscure, indexing technique that's gaining ground commercially for data mining applications. It can certainly do the the small number of things being claimed by this article. All the relevant papers are on the web.
In short, this doesn't sound like super-secret spy stuff. I'll give long odds the real work is in journals and webpages that are publicly available. Having a couple billion dollars to speed up testing and implementation probably helps, but none of this sounds revolutionary or years ahead of the curve.
Listening to public broadcasts (Score:4)
Less well known is their Foreign Broadcast Monitoring Service [oss.net], for which generations of linguists have listened to the hype output of governments worldwide. (FBIS refers to this as "open source" material.)
They've been hoping for years to automate some of this stuff, and apparently they've succeeded. It doesn't require particularly good speech recognition, since the basic goal is to pull out the interesting stuff from the endless drivel.
This sort of info is used to answer questions like "Is country X changing their policy on Y", and "Who is speaking for country X on subject Y?" This is basic political intelligence information.
Re:Uhm, yes. (Score:5)
Actually what it sounds like the CIA is working on is trying to mine data out of public sources. There's good reason to think that you can discover a lot of what governments want to keep hidden if you can just go through enough publically available data and correlate it. For instance, you can probably get a good idea of a government's secret spending by figuring out how much money they're taking in taxes and borrowing and subtracting out expenditures- provided that you can actually track both of those things. It looks hopeless because there's so much data to go through, but with good computers it should be possible, especially if the other guys have a lot of secret spending. Or you can figure out what the inner circle of the government really thinks by looking at all of the news leaks from highly placed government officials.
This stuff scares the crap out of governments that are both required to be open but interested in hiding things from other countries. You simply can't hide everything, especially not anything big enough to be really interesting, because it has to interface with the world somehow. The CIA obviously wants to get really good at this kind of thing, and monitoring vast quantities of mundane stuff like TV news programs, budgets, and corporate annual reports is part of the process. The best part is that if you can do this effectively, you don't need spies as much, but you do need a lot of drones to go through huge piles of paper and TV to enter the raw data into the computers to process. There's probably some filtering out the interesting stuff from listening in on videoconferences, too, but it's amazing how many paper pushing drones wind up working in a sexy sounding business like spying.