Looks like the Roomba has beaten Google in real world indexing.
When it comes to intelligently indexing the entire web, Google's Googlebot is simply the best. Now, if you want to index all those files and text in your desktop, then I highly recommend Google Desktop. However, if what you really need is a robot that can index your real live world, then you can count Google out.
This is certainly now a job for a robotic vacuum cleaner - the Roomba.
... well, actually a modified one.
A guy who goes by the name of Wael Chatela has hacked his Roomba, converting it into a mobile webcam; apparently for spying on his wife. Reminds me of the Rovio, except that the modified Roomba is definitely more inconspicuous. The unsuspecting subject won't know the Roomba's actually doing more than just cleaning the room.
When his Roomba is off duty from espionage work, it can be found going around, busy capturing images, converting text in the images to machine-editable text, and indexing them. Then when Chatela wants to find something, he simply enters the word in the browser-based user interface. The user interface is simply made up of a search box and an "I'm feeling lucky." button, just like Google.

If the text has been successfully indexed, then the Roomba can find it. This might actually come in handy in libraries. Although the shelves might have to be quite low for the Roomba's camera to focus and capture images.
I tried out the web-based interface and noticed that it could only find the words "Robot" and "of". You can try it out yourself on the GåågleBot search page.
Indexing is a task performed by search engines to make searching faster. It practically has the same purpose as the index section at the end of a book. Need to find a word in a particular book? You can search for it manually, page by page, or you can go straight to the index. There you'll find the page(s) of the word you're looking for, assuming of course it has been included in the index (i.e., it has been 'indexed').
What a search engine adds is the benefit of a search box. With a search box, you don't have to go through an entire list of words that has been indexed. You simply type in the word, and the search engine will take you to the page or pages that has it.
This is what the GåågleBot does. But while Google and other search engines can simply index words and phrases from texts, GåågleBot can capture images of surfaces that have text on them, extract the text, and index them.
If you think this is an easy task, think again. Capturing images is the easy part since all the GåågleBot needs for this is its camera. What is actually difficult is the task of extracting the text from the captured image.
Let me demonstrate.
You can start by highlighting some text from this article. Now, copy and paste it on your word processor (e.g. MS Word for Windows users). You'll notice that the pasted text is even editable. Now, try to do the same thing with the text in the image below. Not possible, is it? You can't even highlight the text found on the image.
To extract letters and words from an image, you'll need an OCR (Optical Character Recognition) program. OCRs ideally translate typewritten, printed, or, in some extremely advanced programs, handwritten text into machine-editable text.
For Latin-script typewritten text, the translation accuracy is now quite high for as long as the image is very clear and upright.
In the case of the GåågleBot, the inventor reported to have encountered problems due to blurry images, background issues, and skew/rotation issues. As a result, there were instances when no text was translated.
Wael Chatela was generous enough to share his source code. Apparently, it's written in Java. However, he mentioned how he regretted not using C instead. The main problem lay in setting up Java rather than writing the code itself.

The main components of the GåågleBot includes a gumstix, a C328 digital camera, a Roomba, his homebrewn circuit which lets all three communicate with one another, and the of course the OCR software module.
Gumstix is a company with products ranging from computer-on-modules, expansion boards, packs, and accessories. Its first product was single-board computer the size of a chewing gum stick; hence the name. Chatela's unit is equipped with 802.11b wi-fi, allowing him to communicate with it remotely.
The C328 is a line of video/JPEG compressed still camera modules. Most of the lenses are mounted on standard 14x14mm lens holders. Chatela drilled 1 cm hole on the front bumper of the Roomba, and fastened the camera from the inside.
If you want to gather more information about GåågleBot, you can go visit www.gaaglebot.com. There you'll find links to the downloadable source code, OCR software used, components, and other relevant info.
Comments
Stalker...
Submitted by bhylak on Wed, 03/10/2010 - 6:09pm.