Sunday, January 12, 2014

BYG (Bing, Yahoo, Google) Search Wrapper

One small section of my Aria project will be to interface with the current search engines out there. To do this I will require a module that will give me a consistent interface to work with the 3 main providers; Bing, Yahoo! and Google. (and any future ones I may want to add). This is a basic example or that module.

First thing required is to set up accounts / projects and the like with the relevant providers.
I won't describe this process as they were all pretty well documented.

Bing Developer Center
Yahoo Developer Network
Google Developers Console


A couple tips for the above sites.

  • Bing: Setup both the web and synonym searches.
  • Yahoo: In the BOSS console, under manage account, put in a daily limit $ amount (or turn of limit), as they only allow 1 free query a day... so only the first request works.
  • Google: It doesn't seem that you can set it up to search the whole web, but after creating your custom search engine, you can select  "Search the entire web but emphasize included sites" so don't worry about that.

All these providers allow for many options while searching ( e.g. images, location, news, video etc.) , however in this initial example I have limited it to just a pure and simple web search.

All the code will be available in my blog Github repository.

Going through the main points.
There is a BasicWebSearch interface, that takes the search term and returns SearchResults. 
SearchResults contains results in a map based on a result type enum. 
The implementations of BasicWebSearch namely: BingSearch, GoogleSearch and YahooSearch call the relevant search engine with the search term and then convert the results into a SearchResult. In the case of Yahoo and Bing, I map the JSON result to the SearchResult. Google however does that in their search client included in the dependencies.

Now for the main code bits:

SearchSettings
As this is just an example, I use included the search settings in the following class, be sure to replace with the relevant values.

UrlConnectionHandler
As both Bing and Yahoo use an HttpUrlConnection, I figured I would centralise the handling of that, the only difference between the 2 is that Bing used basic authentication and Yahoo I went with the OAuth implementation.

BingSearch

BingResultParser

YahooSearch

YahooResultParser

GoogleSearch

GoogleSearchResult
Google has a whole bunch of extra information being returned so I extended the base SearchResult so add all the information just in case I ever need it.

Maven Dependencies


4 comments:

  1. Very interesting. Thanks for the post.

    So, only Google returns the extra meta (pagemap) data? That is unfortunate, as their service is ridiculously expensive, while the others are more reasonable.

    ReplyDelete
  2. It is the best post on making a customized own search engine and I think that it will be worth to try all of them. Thanks for sharing the good and worthy post.

    ReplyDelete
  3. It's cool that you report such things.

    ReplyDelete

Popular Posts

Followers