Android and Linux

Saturday, October 30, 2010

Speech recognition on Linux, through cheating

Speech recognition on Linux is hard to come by. I've been looking for a long time and never found anything that is very useful.

Most of the solutions available are nothing more than libraries for developing speech recognition programs which require you to build your own language models. They're mostly meant to be used in other software. Many of them look like they're very well done but I'm not a credit card company looking to set up a call canter so incorporating them are well above my ability. After playing around with a few of these, I decided they were a no-go.

I was hoping there would be a solution in the cloud, like an API to Google's online voice recognition engine, but Google doesn't have one and there don't appear to be any others available either.

So I decided to cheat. All I really need is to be able to say a few simple words and have it translated to a text file. I have voice recognition on my phone that does that, so why not use it?

To avoid repeating myself, this, like everything else on my blog lately, uses ssh with keys, Tasker with the Locale Execute Plugin, the Google voice recognition API called with Python that appear in my last few posts and a short script on the phone.

The short script we need is this, which I will call "vhome" for my examples:
#! /system/bin/sh
cat /sdcard/.voice | ssh USER@IP -i /PATH/TO/SSH/KEY 'cat > /PATH/TO/A/FILE && /PC/COMMAND/SCRIPT'
All this script does is place the text from the /sdcard/.voice file into a file on your computer. From my previous posts, it should be clear how Tasker/Python/Google all work together to get your words into a file on your phone and this simply gets that file to your computer.

The "&& ..." is optional and used for executing a script on the computer which will do something with the text. This example assumes you do want to use that because I didn't pre-filter it on the phone so the file on the computer will literally contain "computer [commands to be carried out]".

The script on the computer would get rid of the first word and parse the rest for keywords linked to commands to execute. It could be as simple as one if/elif loop to look for matches and run commands. This can then be easily added to for new commands by inserting a new elif/then line for the keywords to match and commands to run.

Now it's as simple as setting up the Tasker task to trigger the script on a keyword, like "computer" for example and say "computer [commands to be carried out]".

An example of using this would be saying "computer search more common hades." The script on the computer would get rid of "computer" and using an if/then loop would recognize "search" as a keyword, read the rest of the text into a variable and run this command:
firefox "$VAR"
The biggest downside is speed. It takes about 8-9 seconds from start of the voice API to command execution on the computer. One way to make this useful may be to set up a task in Tasker that keeps looping until you say a keyword to stop it. A quick example would be:

1- Write File .voice (this is used to zero out the file)
2- Run Script
3- Read Paragraph file: .voice to %VOICE
4- Goto Action 3 if %VOICE matches EOF
5- Stop If %VOICE matches halt
6- Locale Execute Plugin execute vhome If %VOICE matches computer
7 Goto 1

This task would show the voice recognition prompt. When you say "computer [commands to carry out]", it will execute vhome then show the voice recognition prompt again, unless you said "halt", in which case it dumps out of the loop and exits. Using this, you can place the phone on your desk and have voice commands at the ready.

So, that's about all. You now have speech recognition on Linux, thanks to cheating.

If you're also interested in speech synthesis on Linux, I suggest grabbing a good TTS engine like Festival and replacing the default diphone voice with a better one. Some of the most popular are the ARCTIC voices from Carnegie Mellon's Language Technologies Institute or the HTS version of those voices from the HTS working group at Nagoya Institute of Technology. The voice I use is "cmu_us_slt_arctic_hts". I haven't tried many TTS engines because Festival seems to do everything I want, especially taking text piped to it from a shell. This makes it pretty easy to set up a script that can be called to make the computer talk on certain events. You can add it to the examples above so your computer says "Yes sir, I'm opening the search results for More Common Hades."

Monday, October 25, 2010

Voice command/Tasker errata

More natural commands

Having played around with voice commands a little after my last post, I decided my speech wasn't natural enough. Wait, what? With voice commands and speech synthesis, isn't it usually the computer that doesn't sound natural? Well in this case, it certainly doesn't feel natural to look at my phone and say "Forecast. Tuesday."

Based on the commands and tasks in my last post, I would say "forecast tuesday" and the word "forecast" would trigger the forecast task and the second word would be used to prepare the forecast for the correct day.

I've gravitated toward this solution for filtering my voice commands, when they need filtering.
awk '{print $NF}' /sdcard/.voice > /sdcard/.voicetmp
then changing the task from this:

5- Perform Action WXDAY if %VOICE matches forecast**

to this:

5- Perform Action WXDAY if %VOICE matches **forecast**

No big difference, but where it did depend on the first and second word before, now the first word can be anywhere in the sentence and the second word only needs to be at the end. Now "What's the forecast for friday?" or "Boy, I sure do wish my phone could tell me what the forecast is supposed to be for the day of the 29th, which just happens to be a friday" will both work and sound a lot better than "Forecast. Friday."

I'm only using the weather task as an example since it's been posted here. This can be used with any command which needs to extract a trigger and a variable from your speech.

I haven't had the need to input multiple variables yet, but I did whip up a couple scripts to give that flexibility:
#! /system/bin/sh
if test -z "$1"
tr ' ' '\n' < /sdcard/.voice | tail -n1 > /sdcard/.voicetmp
tr ' ' '\n' < /sdcard/.voice | tail -n${1} > /sdcard/.voicetmp
You can execute this script followed by the number of items you want to extract, or followed by nothing for extracting one item. For example, if you want to extract three items and the script is named "vfilter," you'd put "vfilter 3" in the Locale Execute Plugin then set up your task accordingly.

I'm not sure what use this is, but I hate being constrained to only being able to use a single word so I hammered this out and put it on my sdcard in case I need it.

Another approach, if you need more flexibility, would be to say the number, and execute either of these commands:
#! /system/bin/sh
awk '{ for(i=$NF;(NF-i)<NF;i--) { printf "%s%s",$(NF-i),FS } printf RS }' /sdcard/.voice > /sdcard/.voicetmp
awk '{ for(i=$NF;(NF-i)<NF;i--) { printf "%s%s",$(NF-i),FS } printf RS }'  /sdcard/.voice | tr ' ' '\n' > /sdcard/.voicetmp
The only difference is that the first one will output everything on one line and the second will output each word on it's own line.

Using this, you can say "Please google more common hades three" and using google as a keyword to open a search URL, it will input the last three words "more common hades."

Of course, this is more easily accomplished by using "google" as a variable split point and using VAR2 as the search term, but, who knows, it might come in handy for a flexible task where you need to input a different number of items on the fly without changing a handful of actions in the task.

Temp files

You may have noticed I used .voice and .voicetmp as temp files. You can bypass temp files by putting the words directly into the system clipboard by editing the Python script to this:
import android
droid = android.Android()
Speech = droid.recognizeSpeech()[1]
I toyed around with this and decided it was simpler to use files. Other tasks may set or use the clipboard and there may be something important in the clipboard that I don't want to erase by an unrelated task, and it's easier to set up tasks without worrying about backing up the clipboard every time.

Now that the human side sounds better...

Oh yeah, I haven't really mentioned speech synthesis here. The only thing worth mentioning is that the best voices I've heard are made by Svox and are available in the Market. They have many voices and a free app that gives a sample of them all. I personally like the British female voice. When using Tasker, I set the tone to 6 and speed to 10 and o-la-la does she sound good.

Do I have any cool voice tasks?

Probably not. I actually have 15 tasks that are controlled by voice, but some are for controlling my home computer over ssh and are probably only interesting to me. Here are a few that may be useful. I won't post the whole task, just the idea. The rest should be easy to figure out.

Saying "pic" takes a photo.

Saying "text" runs the voice script twice more, once to get the SMS recipient, again to get the SMS body, then opens the SMS app and fills out the text.

Saying "map [address]" opens the map to the address I specify.

Saying "search [phrase]" opens google search of the phrase.

I'm trying to buy a house so saying "mls search [number]" opens the browser to[number] to look up homes by their MLS listing.

I have all those in a separate task which the voice task executes. After that, it executes another task which is just for loading apps. I have a dozen set up to open on command, like saying "terminal" opens the terminal, saying "mail" opens gmail, etc. I haven't really used them, but it's another example of using voice control. Other voice apps can open apps by name, but how can they possibly interpret names like "BTEP" or "QuickSSHD?" Using your own voice control, you can open them by nicknames, which is much more powerful.

Monday, October 18, 2010

Speech recognition with Tasker

The Android Scripting Environment, now called SL4A, is now working with Tasker and on the Tasker Google group, a user named baudi posted an example of using a Python script for calling the Google voice search API.

His example uses the camera button to execute the script. Since the Nexus One doesn't have a camera button, I just set up the task and used it with a widget instead and combined both tasks into one. I left the Python alone (probably a good idea since I don't know Python) except to rename it as and rename the output file to /sdcard/.voice because I hate typing long filenames.

Here's a rundown of my Task:

1- Write File .voice (this is used to zero out the file)
2- Run Script
3- Read Paragraph file: .voice to %VOICE
4- Goto Action 3 if %VOICE matches EOF
5 ... whatever you want

When you run this, it executes the python script which pops up the Google voice search API which translates your speech to text then puts that text in a file which Tasker reads into a variable.

Ho hum, what good is this?

Well for starters, you can carry out tasks if that text matches a predefined string. For example, using my last two posts about sharing the clipboard contents between a Linux computer and the phone, I can speak the words "copy" and "paste." Once Tasker recognizes them, it carries out the tasks to copy the clipboard from the computer to the phone, or send the phone's clipboard to the computer's. This can be done simply by adding this as step 5, where XXX is the task to send or pull the clipboard contents and YYY is the word to trigger it:

5- Perform Action XXX if %VOICE matches YYY

Another example, I took this simple script and changed it quite a bit to get the weather current conditions and forecast and use Tasker to download it every hour and put it in a text file called "/sdcard/.metar" (the actual script also grabs METAR weather data from another site). The forecast looks like this:

Saturday...Partly sunny. Highs in the lower 70s.

Saturday Night...Mostly cloudy. Lows in the upper 40s.

Sunday...Partly sunny. Highs in the mid 70s.

Sunday Night...Mostly cloudy. Lows in the lower 50s.


Now, make this script called "wxday"
#! /system/bin/sh
grep -i $(awkc '{print $2}' /sdcard/.voice) /sdcard/.metar > /sdcard/.wxday
In the voice task above, step 5 can be:
5- Perform Action WXDAY if %VOICE matches forecast**

Here is the WXDAY task:

1- Locale Execute Plugin: execute @! wxday
2- Read Paragraph file: .wxday to var %WXDAY
3- Goto 2 if %WXDAY matches EOF
4- Say %WXDAY

Now, I can tap the voice widget, say "forecast Thrusday," it will see that I said "forecast" and call the WXDAY action. That action extracts Thursday from my weather file and speaks the forecast for Thursday.

Is Android/Tasker cool or what?

Sunday, October 3, 2010

Android to Linux clipboard, again

(Note: I did a more complete write-up on the XDA forum which may be easier to follow than this.)

Using Tasker, my previous method can be improved and, unlike pasting from the PC to phone, this one works perfectly because it's not limited by Tasker's inability to read an entire file.

Using the link above, get the computer ready with the "n1_to_comp" script, then make a little script on the phone:
#! /system/bin/sh
cat /sdcard/tohomeclip | ssh USER@IP -i /PATH/TO/SSHKEY 'n1_to_comp'
In this example, I'll name that script "2homeclip."

Here is the Task to set up with Tasker:

1- Write File > tohomeclip > text %CLIP
2- using the Locale Execute Plugin execute @! 2homeclip

That's it!

How are this post and the last one an improvement? If it's not clear, it's my fault for having a messy blog, but if you click the link above then click the link at the top of that post, you'll see that copying and pasting in the terminal previously needed an intermediary app that could save the clipboard in a plain text file. And doing it in a terminal wasn't my goal. I wanted to share clipboards system-wide. Since Tasker can interact with the clipboard and run scripts, we can do away with that other app.

But isn't it still using an extra app? Yes and no. Tasker is required, but it can do everything I wanted. Where it previously required running the script and opening the other app to interact with the clipboard (or vice-versa) by setting up these two copy/paste tasks as two widgets (or one widget which pops up both and allows you to pick), copying and pasting between the computer and phone only requires a single click.