Recent Events for foo.be MainPageDiary (Blog) Previous Next

2010-06-19 Searching Google From The Command Line

Bookseller in Rennes

(shortest path for) Searching Google From the Command Line

Looking at the recent announce from Google about their "Google Command Line Tool", this is nice but missing a clear functionality : searching Google… I found various software to do it but it's always relying on external software or libraries and not really the core Unix tools. Now can we do it but just using standard Unix tools? (beside "curl" but this can be even replaced by a telnet doing an HTTP request if required)

To search google from an API, you can use the AJAX interface to do the search (as the old Google search API is not defunct). The documentation of the interface is available but the output is JSON. JSON is nice for browser but again funky to parse on command line without using external tools like jsawk. But it's still a text output, this can be parsed by the wonderful awk (made in 1977, a good year)… At the end, this is just a file with comma separated values for each "key/value". After, you can through away the key and you display the value.

curl -s "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&start=0&rsz=large&q=foo" | sed -e 's/[{}]/''/g' | awk '{n=split($0,s,","); for (i=1; i<=n; i++) print s[i]}' | grep -E "("url"|"titleNoFormatting")" | sed -e s/\"//g | cut -d: -f2-

and the output :

http://en.wikipedia.org/wiki/Foobar
Foobar - Wikipedia
http://en.wikipedia.org/wiki/Foo_Camp
Foo Camp - Wikipedia
http://www.foofighters.com/
Foo Fighters Music
...

Now you can put the search as a bash function or as an alias (you can replace foo by $1). Do we need more? I don't think beside a Leica M9…

Update Saturday 19 June 2010: Philippe Teuwen found an interesting limitation (or a bug if you prefer) regarding unicode and HTML encoding used in titles. Sometimes, you may have garbage (especially with unicode encoding of ampersand HTML encoding) in the title. The solve the issue, Philippe piped the curl output in json_xs and in a recode html. This is solving the issue but as my main goal is to avoid the use of external tools. You can strip them violently with an ugly "tr" or "grep [[:alpha]]". I'm still digging into "pure core" Unix alternative…

Tags: