This actually took me longer than I'd like to admit to get working, but in the end the solution is quite neat and simple, so it was probably worth it, and hopefully this could save other people some time.
The Amazon Docker file looks like this:
AWS Elastic Beanstalk Dockerfile - Github
This installs the contents of the root folder requirements.txt before running your Docker file. So for my application the basic "non-sci" packages could be installed simply enough.
Root Folder: requirements.txt:
Then to install the sci related packages... numpy, scipy, pandas, scikit-learn and nltk. I created another requirements.txt in an aws-post-install folder. This is to be run once the Amazon linux OS has been updated and all the required OS dependencies have been installed.
Post Docker requirements.txt:
My custom docker file, that builds ontop of the Amazon image looked as follows:
Docker File:
Next step is to get my docker image to be used directly so that the Elastic Beanstalk app doesn't have to do all the downloads and installs every time should be simple enough according the AWS you tube channel:
https://www.youtube.com/watch?v=pLw6MLqwmew
Tuesday, December 23, 2014
Tuesday, August 26, 2014
Why Jython when you can microservice with Flask
Over the last little while I have been working on Sibbly it's my little pet project to try summarize, group, filter and target software development information on the web. All 'n all a rather ambitious task, but the worst thing that could happen is that I learn something, so there is really no risk.
It is still currently in a very closed beta, only occasionally showing it to fellow work colleagues and getting some input.
After initially starting development for Sibbly on Ubuntu, as I was always planning on deploying on Ubuntu, I had migrated back to windows, and after a couple weeks of work when finally deploying to Ubuntu... Surprise! it obviously did work right off the bat.
The issue I ended up with was, there seems to be a classpath issue between Spring Boot, it's embedded Tomcat instance and Jython. The reason I use Jython is for an awesome library called Pygments.
So after much dismay and checking all the Java alternatives and attempted Pygment ports (jygments, jgments), I started thinking of alternate solutions.
Having recently read: Microservices I decided to look at a way of interacting with Python more indirectly.
This lead me to: Flask
Within a couple minutes thanks to: Awesome Flask Example
I had the following up and running:
What this little bit of Python does is wrap and expose the highlight and guess functionality from Pygments via a RESTful service accepting and producing JSON.
I deploy Sibbly on DigitalOcean
To install Python on my droplet, I followed the process below:
After initially starting development for Sibbly on Ubuntu, as I was always planning on deploying on Ubuntu, I had migrated back to windows, and after a couple weeks of work when finally deploying to Ubuntu... Surprise! it obviously did work right off the bat.
The issue I ended up with was, there seems to be a classpath issue between Spring Boot, it's embedded Tomcat instance and Jython. The reason I use Jython is for an awesome library called Pygments.
So after much dismay and checking all the Java alternatives and attempted Pygment ports (jygments, jgments), I started thinking of alternate solutions.
Having recently read: Microservices I decided to look at a way of interacting with Python more indirectly.
This lead me to: Flask
Within a couple minutes thanks to: Awesome Flask Example
I had the following up and running:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from flask import Flask, jsonify | |
from flask import abort | |
from flask import make_response | |
from flask import request | |
from pygments import highlight | |
from pygments.formatters.html import HtmlFormatter | |
from pygments.lexers import get_lexer_by_name | |
from pygments.lexers import guess_lexer | |
app = Flask(__name__) | |
@app.route('/pygmentCode', methods=['POST']) | |
def pygment_code(): | |
if not request.json: | |
abort(400) | |
lexer = get_lexer_by_name(request.json['lexer'], stripall=True) | |
formatter = HtmlFormatter(linenos=False) | |
result = highlight(request.json['code'], lexer, formatter) | |
return jsonify({'result': result}), 201 | |
@app.route('/guessCode', methods=['POST']) | |
def guess_code(): | |
if not request.json: | |
abort(400) | |
lexer = guess_lexer(request.json['code']) | |
print(lexer.name) | |
return jsonify({'result': lexer.name}), 201 | |
@app.errorhandler(404) | |
def not_found(error): | |
return make_response(jsonify({'error': 'Not found'}), 404) | |
if __name__ == '__main__': | |
app.run(debug=True) |
What this little bit of Python does is wrap and expose the highlight and guess functionality from Pygments via a RESTful service accepting and producing JSON.
I deploy Sibbly on DigitalOcean
To install Python on my droplet, I followed the process below:
sudo apt-get install python-dev build-essential sudo apt-get install zlib1g-dev sudo apt-get install libssl-dev openssl sudo apt-get install python-pip sudo pip install virtualenv sudo pip install virtualenvwrapper export WORKON_HOME="$HOME/.virtualenvs" source /usr/local/bin/virtualenvwrapper.sh sudo mkdir /opt/python3.4.1 wget http://python.org/ftp/python/3.4.1/Python-3.4.1.tgz tar xvfz Python-3.4.1.tgz cd Python-3.4.1 ./configure --prefix=/opt/python3.4.1 make sudo make install mkvirtualenv --python /opt/python3.4.1/bin/python3 py-3.4.1 workon py-3.4.1 pip install flask pip install pygmentsOnce that was done to run the Flask app:
python app.py & disown
Sunday, August 10, 2014
Upgrading Spring 3.x and Hibernate 3.x to Spring Platform 1.0.1 (Spring + hibernate 4.x)
I recent volunteered to upgrade our newest project to the latest version of Spring Platform. What Spring Platform gives you is dependency & plugin management across the whole Spring framework's set of libraries.
Since we had fallen behind a little the upgrade did raise some funnies. Here are the things I ran into:
Maven:
Our pom files were still referencing:
hibernate.jar
ehcache.jar
These artefacts don't exit on the latest version, so replaced those with
hibernate-core.jar and ehcache-core.jar
We also still use the hibernate tools + maven run plugin to reverse engineer our db object.
This I needed to update to a release candidate:
The code: "Hibernate.createBlob"... no longer exists
replaced with:
On the HibernateTemplate
return types are now List; not element...So needed to add casts for the lists being returned.
Added:
And configure the settings in the cfg.xml for it:
org.hibernate.hql.internal.classic.ClassicQueryTranslatorFactory.
Spring:
Amazingly some of our application context files still referenced the Spring DTD ... replaced with XSD
Maven:
Our pom files were still referencing:
hibernate.jar
ehcache.jar
These artefacts don't exit on the latest version, so replaced those with
hibernate-core.jar and ehcache-core.jar
We also still use the hibernate tools + maven run plugin to reverse engineer our db object.
This I needed to update to a release candidate:
Hibernate:
replaced with:
On the HibernateTemplate
return types are now List; not element...So needed to add casts for the lists being returned.
import org.hibernate.classic.Session;
replaced with:
import org.hibernate.Session;
Reverse engineer works a little differently...
Assigns Long to numeric...replaced with:
import org.hibernate.Session;
Reverse engineer works a little differently...
Added:
Possible Errors:
- Caused by: org.hibernate.service.UnknownUnwrapTypeException: Cannot unwrap to requested type [javax.sql.DataSource]
And configure the settings in the cfg.xml for it:
- Caused by: java.lang.ClassNotFoundException: org.hibernate.engine.FilterDefinition
Probably still using a reference to hibernate3 factory / bean somewhere, change to hibernate4:
org.springframework.orm.hibernate3.LocalSessionFactoryBean
org.springframework.orm.hibernate3.HibernateTransactionManager
org.springframework.orm.hibernate3.LocalSessionFactoryBean
org.springframework.orm.hibernate3.HibernateTransactionManager
- Caused by: java.lang.ClassNotFoundException: Could not load requested class : org.hibernate.hql.classic.ClassicQueryTranslatorFactory There is minor change in new APIs, so this can be resolved by replacing property value with:
org.hibernate.hql.internal.classic.ClassicQueryTranslatorFactory.
Spring:
Amazingly some of our application context files still referenced the Spring DTD ... replaced with XSD
In Spring configs added for c3p0:
Spring removed the "local"=: so needed to just change that to "ref"=
Spring removed the "local"=: so needed to just change that to "ref"=
Spring HibernateDaoSupport no longer has: "releaseSession(session);", which is a good thing so was forced to update the code to work within a transaction.
Possible Errors:
- getFlushMode is not valid without active transaction; nested exception is org.hibernate.HibernateException: getFlushMode is not valid without active transaction
Removed from hibernate properties:
<prop key="hibernate.current_session_context_class">thread</prop>
Supply a custom strategy for the scoping of the "current"Session . See Section 2.5, “Contextual sessions” for more information about the built-in strategies |
- org.springframework.dao.InvalidDataAccessApiUsageException: Write operations are not allowed in read-only mode (FlushMode.MANUAL): Turn your Session into FlushMode.COMMIT/AUTO or remove 'readOnly' marker from transaction definition.
Another option is :
<bean id ="productHibernateTemplate" class="org.springframework.orm.hibernate4.HibernateTemplate">
<property name="sessionFactory" ref="productSessionFactory"/>
<property name="checkWriteOperations" value="false"/>
</bean>
<property name="sessionFactory" ref="productSessionFactory"/>
<property name="checkWriteOperations" value="false"/>
</bean>
- java.lang.NoClassDefFoundError: javax/servlet/SessionCookieConfig
Servlet version update:
- Then deploying on weblogic javassist: $$_javassist_ cannot be cast to javassist.util.proxy.Proxy
The issue here was that there were different versions of javassist being brought into the ear. I all references removed from all our poms, so that the correct version gets pulled in from from Spring/Hibernate...
and then configured weblogic to prefer our version:
Saturday, July 19, 2014
TDD, Hamcrest, Shazamcrest
Recently we have started to try get a more TDD culture started at work, having always believed in thorough testing and decent code coverage it shouldn't have been too hard. However... teaching a old dog new tricks can sometimes require quite a bit of patience. Turns out breaking coding habits formulated of more than a decade of keyboard bashing is harder than it seems.
So with generating an enormous amount of test code, comes the usual task code & test maintenance and reuse.
One of the tools / libraries we have included is Hamcrest, which not only improves the readability of assertion failures, but allows you to create and extend custom matchers, which you can then reuse across multiple test scenarios.
I am not going to go into too much detail on Hamcrest here, where are a bunch of great resources / blogs / tutorials out there.. just a few:
http://www.baeldung.com/hamcrest-collections-arrays
https://weblogs.java.net/blog/johnsmart/archive/2011/12/12/some-useful-new-hamcrest-matchers-collections
http://edgibbs.com/junit-4-with-hamcrest/
http://www.planetgeek.ch/2012/03/07/create-your-own-matcher/
While creating a custom type safe matcher for one of our domain objects, I realised that was insane.. really.. this.getA == that.getA... mmmm no.
So I went searching for something could help and and after a bit, I found: Shazamcrest (bonus points for the name)
What Shazamcrest does is:
Serialize the objects to compare.
Compares them and then on fail throws a ComparisonFailure, which the major IDE's allow you use their build in diff display.
So with generating an enormous amount of test code, comes the usual task code & test maintenance and reuse.
One of the tools / libraries we have included is Hamcrest, which not only improves the readability of assertion failures, but allows you to create and extend custom matchers, which you can then reuse across multiple test scenarios.
I am not going to go into too much detail on Hamcrest here, where are a bunch of great resources / blogs / tutorials out there.. just a few:
http://www.baeldung.com/hamcrest-collections-arrays
https://weblogs.java.net/blog/johnsmart/archive/2011/12/12/some-useful-new-hamcrest-matchers-collections
http://edgibbs.com/junit-4-with-hamcrest/
http://www.planetgeek.ch/2012/03/07/create-your-own-matcher/
While creating a custom type safe matcher for one of our domain objects, I realised that was insane.. really.. this.getA == that.getA... mmmm no.
So I went searching for something could help and and after a bit, I found: Shazamcrest (bonus points for the name)
What Shazamcrest does is:
Serialize the objects to compare.
Compares them and then on fail throws a ComparisonFailure, which the major IDE's allow you use their build in diff display.
Great... no manual bean compares.
So I add the maven dependency, try it out on our complex domain object....
StackOverflowError.... It was a known limitation at the time. The json provider Shazamcrest was using:
GSON does not cater for circular reference serialization.
As both Shazamcrest and GSON being opensource, I decided to have a look and see if I could contribute, anything is better that writing a manual bean matcher. After some investigation I found that the guys on the GSON project have created a fix GraphAdapterBuilder, it is just not distributed with the actual library.
So after fork on the Shazamcrest GitHub project, a little bit of code and submitting a pull request:
The guys on the Shazamcrest project very quickly merged my changes in and published a new version to the maven repo (Thanks for that).
So be sure to use the 0.8 version if you are struggling with circular references.
Monday, May 26, 2014
Playing with Java 8 - Lambdas, Paths and Files
I needed to read a whole bunch of files recently and instead of just grabbing my old FileUtils.java that I and probably most developers have and then copy from project to project, I decided to have quick look at how else to do it...
Yes, I know there is Commons IO and Google IO, why would I even bother? They probably do it better, but I wanted to check out the NIO jdk classes and play with lambdas aswell.. and to be honest, I think this actually ended up being a very neat bit of code.
So I had a specific use case:
I wanted to read all the source files from a whole directory tree, line by line.
What this code does, it uses Files.walk to recursively get all the paths from the starting point, it creates a stream, which I then filter to only files that end with the required extension. For each of those files, I use Files.lines to create a stream of Strings, one per line. I trim that, filter out the empty ones and add them to the return collection.
All very concise thanks to the new constructs.
Yes, I know there is Commons IO and Google IO, why would I even bother? They probably do it better, but I wanted to check out the NIO jdk classes and play with lambdas aswell.. and to be honest, I think this actually ended up being a very neat bit of code.
So I had a specific use case:
I wanted to read all the source files from a whole directory tree, line by line.
What this code does, it uses Files.walk to recursively get all the paths from the starting point, it creates a stream, which I then filter to only files that end with the required extension. For each of those files, I use Files.lines to create a stream of Strings, one per line. I trim that, filter out the empty ones and add them to the return collection.
All very concise thanks to the new constructs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.blog.java8.io; | |
import org.apache.commons.logging.Log; | |
import org.apache.commons.logging.LogFactory; | |
import java.io.IOException; | |
import java.nio.charset.Charset; | |
import java.nio.file.FileVisitOption; | |
import java.nio.file.Files; | |
import java.nio.file.Path; | |
import java.nio.file.Paths; | |
import java.util.ArrayList; | |
import java.util.List; | |
import java.util.stream.Stream; | |
/** | |
* RecursiveFileLineReader | |
* Created by Brian on 2014-05-26. | |
*/ | |
public class RecursiveFileLineReader { | |
private transient static final Log LOG = LogFactory.getLog(RecursiveFileLineReader.class); | |
/** | |
* Get all the non empty lines from all the files with the specific extension, recursively. | |
* | |
* @param path the path to start recursion | |
* @param extension the file extension | |
* @return list of lines | |
*/ | |
public static List<String> readAllLineFromAllFilesRecursively(final String path, final String extension) { | |
final List<String> lines = new ArrayList<>(); | |
try (final Stream<Path> pathStream = Files.walk(Paths.get(path), FileVisitOption.FOLLOW_LINKS)) { | |
pathStream | |
.filter((p) -> !p.toFile().isDirectory() && p.toFile().getAbsolutePath().endsWith(extension)) | |
.forEach(p -> fileLinesToList(p, lines)); | |
} catch (final IOException e) { | |
LOG.error(e.getMessage(), e); | |
} | |
return lines; | |
} | |
private static void fileLinesToList(final Path file, final List<String> lines) { | |
try (Stream<String> stream = Files.lines(file, Charset.defaultCharset())) { | |
stream | |
.map(String::trim) | |
.filter(s -> !s.isEmpty()) | |
.forEach(lines::add); | |
} catch (final IOException e) { | |
LOG.error(e.getMessage(), e); | |
} | |
} | |
} |
Saturday, April 26, 2014
Playing with Java 8 - Lambdas and Concurrency
So Java 8 was released a while back, with a ton of features and changes. All us Java zealots have been waiting for this for ages, all the way back to from when they originally announced all the great features that will be in Java 7, which ended up being pulled.
I have just recently had the time to actually start giving it a real look, I updated my home projects to 8 and I have to say I am generally quite happy with what we got. The java.time API the "mimics" JodaTime is a big improvement, the java.util.stream package is going useful, lambdas are going to change our coding style, which might take a bit of getting used to and with those changes... the quote, "With great power comes great responsibility" rings true, I sense there may be some interesting times in our future, as is quite easy to write some hard to decipher code. As an example debugging the code I wrote below would be "fun"...
The file example is on my Github blog repo
What this example does is simple, run couple threads, do some work concurrently, then wait for them all to complete. I figured while I am playing with Java 8, let me go for it fully...
Here's what I came up with:
Test:
Output:
0 [pool-1-thread-1] Starting: StringInputTask{taskName='Task 1'}
0 [pool-1-thread-5] Starting: StringInputTask{taskName='Task 5'}
0 [pool-1-thread-2] Starting: StringInputTask{taskName='Task 2'}
2 [pool-1-thread-4] Starting: StringInputTask{taskName='Task 4'}
2 [pool-1-thread-3] Starting: StringInputTask{taskName='Task 3'}
3003 [pool-1-thread-5] Done: Task 5
3004 [pool-1-thread-3] Done: Task 3
3003 [pool-1-thread-1] Done: Task 1
3003 [pool-1-thread-4] Done: Task 4
3003 [pool-1-thread-2] Done: Task 2
3007 [Thread-0] WaitingFuturesRunner - complete... adding results
Some of the useful articles / links I found and read while doing this:
Oracle: Lambda Tutorial
IBM: Java 8 Concurrency
Tomasz Nurkiewicz : Definitive Guide to CompletableFuture
I have just recently had the time to actually start giving it a real look, I updated my home projects to 8 and I have to say I am generally quite happy with what we got. The java.time API the "mimics" JodaTime is a big improvement, the java.util.stream package is going useful, lambdas are going to change our coding style, which might take a bit of getting used to and with those changes... the quote, "With great power comes great responsibility" rings true, I sense there may be some interesting times in our future, as is quite easy to write some hard to decipher code. As an example debugging the code I wrote below would be "fun"...
The file example is on my Github blog repo
What this example does is simple, run couple threads, do some work concurrently, then wait for them all to complete. I figured while I am playing with Java 8, let me go for it fully...
Here's what I came up with:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.blog.java8.futures; | |
import org.apache.commons.logging.Log; | |
import org.apache.commons.logging.LogFactory; | |
import java.util.Collection; | |
import java.util.List; | |
import java.util.concurrent.*; | |
import java.util.stream.Collectors; | |
/** | |
* Generified future running and completion | |
* | |
* @param <T> the result type | |
* @param <S> the task input | |
*/ | |
public class WaitingFuturesRunner<T, S> { | |
private transient static final Log logger = LogFactory.getLog(WaitingFuturesRunner.class); | |
private final Collection<Task<T, S>> tasks; | |
private final long timeOut; | |
private final TimeUnit timeUnit; | |
private final ExecutorService executor; | |
/** | |
* Constructor, used to initialise with the required tasks | |
* | |
* @param tasks the list of tasks to execute | |
* @param timeOut max length of time to wait | |
* @param timeUnit time out timeUnit | |
*/ | |
public WaitingFuturesRunner(final Collection<Task<T, S>> tasks, final long timeOut, final TimeUnit timeUnit) { | |
this.tasks = tasks; | |
this.timeOut = timeOut; | |
this.timeUnit = timeUnit; | |
this.executor = Executors.newFixedThreadPool(tasks.size()); | |
} | |
/** | |
* Go! | |
* | |
* @param taskInput The input to the task | |
* @param consolidatedResult a container of all the completed results | |
*/ | |
public void go(final S taskInput, final ConsolidatedResult<T> consolidatedResult) { | |
final CountDownLatch latch = new CountDownLatch(tasks.size()); | |
final List<CompletableFuture<T>> theFutures = tasks.stream() | |
.map(aSearch -> CompletableFuture.supplyAsync(() -> processTask(aSearch, taskInput, latch), executor)) | |
.collect(Collectors.<CompletableFuture<T>>toList()); | |
final CompletableFuture<List<T>> allDone = collectTasks(theFutures); | |
try { | |
latch.await(timeOut, timeUnit); | |
logger.debug("complete... adding results"); | |
allDone.get().forEach(consolidatedResult::addResult); | |
} catch (final InterruptedException | ExecutionException e) { | |
logger.error("Thread Error", e); | |
throw new RuntimeException("Thread Error, could not complete processing", e); | |
} | |
} | |
private <E> CompletableFuture<List<E>> collectTasks(final List<CompletableFuture<E>> futures) { | |
final CompletableFuture<Void> allDoneFuture = CompletableFuture.allOf(futures.toArray(new CompletableFuture[futures.size()])); | |
return allDoneFuture.thenApply(v -> futures.stream() | |
.map(CompletableFuture<E>::join) | |
.collect(Collectors.<E>toList()) | |
); | |
} | |
private T processTask(final Task<T, S> task, final S searchTerm, final CountDownLatch latch) { | |
logger.debug("Starting: " + task); | |
T searchResults = null; | |
try { | |
searchResults = task.process(searchTerm, latch); | |
} catch (final Exception e) { | |
e.printStackTrace(); | |
} | |
return searchResults; | |
} | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.blog.java8.futures; | |
import net.briandupreez.blog.java8.futures.example.StringInputTask; | |
import net.briandupreez.blog.java8.futures.example.StringResults; | |
import org.apache.log4j.BasicConfigurator; | |
import org.junit.Assert; | |
import org.junit.BeforeClass; | |
import org.junit.Test; | |
import java.util.ArrayList; | |
import java.util.List; | |
import java.util.concurrent.TimeUnit; | |
/** | |
* Test | |
* Created by brian on 4/26/14. | |
*/ | |
public class CompletableFuturesRunnerTest { | |
@BeforeClass | |
public static void init() { | |
BasicConfigurator.configure(); | |
} | |
/** | |
* 5tasks at 3000ms concurrently should not be more than 3100 | |
* @throws Exception error | |
*/ | |
@Test(timeout = 3100) | |
public void testGo() throws Exception { | |
final List<Task<String, String>> taskList = setupTasks(); | |
final WaitingFuturesRunner<String, String> completableFuturesRunner = new WaitingFuturesRunner<>(taskList, 4, TimeUnit.SECONDS); | |
final StringResults consolidatedResults = new StringResults(); | |
completableFuturesRunner.go("Something To Process", consolidatedResults); | |
Assert.assertEquals(5, consolidatedResults.getResults().size()); | |
for (final String s : consolidatedResults.getResults()) { | |
Assert.assertTrue(s.contains("complete")); | |
Assert.assertTrue(s.contains("Something To Process")); | |
} | |
} | |
private List<Task<String, String>> setupTasks() { | |
final List<Task<String, String>> taskList = new ArrayList<>(); | |
final StringInputTask stringInputTask = new StringInputTask("Task 1"); | |
final StringInputTask stringInputTask2 = new StringInputTask("Task 2"); | |
final StringInputTask stringInputTask3 = new StringInputTask("Task 3"); | |
final StringInputTask stringInputTask4 = new StringInputTask("Task 4"); | |
final StringInputTask stringInputTask5 = new StringInputTask("Task 5"); | |
taskList.add(stringInputTask); | |
taskList.add(stringInputTask2); | |
taskList.add(stringInputTask3); | |
taskList.add(stringInputTask4); | |
taskList.add(stringInputTask5); | |
return taskList; | |
} | |
} |
0 [pool-1-thread-1] Starting: StringInputTask{taskName='Task 1'}
0 [pool-1-thread-5] Starting: StringInputTask{taskName='Task 5'}
0 [pool-1-thread-2] Starting: StringInputTask{taskName='Task 2'}
2 [pool-1-thread-4] Starting: StringInputTask{taskName='Task 4'}
2 [pool-1-thread-3] Starting: StringInputTask{taskName='Task 3'}
3003 [pool-1-thread-5] Done: Task 5
3004 [pool-1-thread-3] Done: Task 3
3003 [pool-1-thread-1] Done: Task 1
3003 [pool-1-thread-4] Done: Task 4
3003 [pool-1-thread-2] Done: Task 2
3007 [Thread-0] WaitingFuturesRunner - complete... adding results
Some of the useful articles / links I found and read while doing this:
Oracle: Lambda Tutorial
IBM: Java 8 Concurrency
Tomasz Nurkiewicz : Definitive Guide to CompletableFuture
Sunday, February 16, 2014
Local Wikipedia with Solr and Spring Data
Continuing with my little AI / Machine Learning research project... I wanted to have a decent sized repo of English text, that was not in a complete mess like a large percentage of data on the internet. I figured I would try Wikipedia, but what to do with about 40Gb of XML? how do I work / query with all that data. I figured based on recent work implementation where we load something like 200 000 000 records on into a Solr cache, Solr would be the way to go, so the is an example of my basic implementation.
Required for this example:
Wikipedia download (warning it is a 9.9Gb file, extracts to about 42Gb)
Solr
Spring Data (Great Blog / Examples on Spring Data: Petri Kainulainen's blog)
All the code and unit test for this post is on my blog GitHub Repo
When setting up Solr from scratch, you can have a look at Solr's wiki or documentation, their documentation is pretty good. There is also an example of importing Wikipedia here, I started with that and made some minor modifications.
For this specific example the Solr config needed (/conf):
For this example (and in the below config files),
Solr home: /Development/Solr
Index / Data: /Development/Data/solr_data/wikipedia
Import File: /Development/Data/enwiki-latest-pages-articles.xml
The full import into Solr took about 48 hours on my old 2011 i5 iMac and the index on my current setup is about 52Gb.
Data Config for the import:
Schema:
Solr Config:
The code for this ended up being quite clean, Spring Data - Solr, gives 2 main interfaces SolrIndexService, and SolrCrudRespository, you simply extend / implement these 2, wrap that in a single interface, autowire from a Spring Java context and you good to go.
Repository:
IndexService:
SolrService:
SpringContext:
Next thing for me to look at for sourcing data is Spring Social.
Required for this example:
Wikipedia download (warning it is a 9.9Gb file, extracts to about 42Gb)
Solr
Spring Data (Great Blog / Examples on Spring Data: Petri Kainulainen's blog)
All the code and unit test for this post is on my blog GitHub Repo
When setting up Solr from scratch, you can have a look at Solr's wiki or documentation, their documentation is pretty good. There is also an example of importing Wikipedia here, I started with that and made some minor modifications.
For this specific example the Solr config needed (
For this example (and in the below config files),
Solr home: /Development/Solr
Index / Data: /Development/Data/solr_data/wikipedia
Import File: /Development/Data/enwiki-latest-pages-articles.xml
The full import into Solr took about 48 hours on my old 2011 i5 iMac and the index on my current setup is about 52Gb.
Data Config for the import:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<dataConfig> | |
<dataSource type="FileDataSource" encoding="UTF-8" /> | |
<document> | |
<entity name="page" | |
processor="XPathEntityProcessor" | |
stream="true" | |
forEach="/mediawiki/page/" | |
url="/Development/Data/enwiki-latest-pages-articles.xml" | |
transformer="RegexTransformer,DateFormatTransformer" | |
> | |
<field column="id" xpath="/mediawiki/page/id" /> | |
<field column="title" xpath="/mediawiki/page/title" /> | |
<field column="revision" xpath="/mediawiki/page/revision/id" /> | |
<field column="user" xpath="/mediawiki/page/revision/contributor/username" /> | |
<field column="userId" xpath="/mediawiki/page/revision/contributor/id" /> | |
<field column="text" xpath="/mediawiki/page/revision/text" /> | |
<field column="timestamp" xpath="/mediawiki/page/revision/timestamp" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" /> | |
<field column="$skipDoc" regex="^#REDIRECT .*" replaceWith="true" sourceColName="text"/> | |
</entity> | |
</document> | |
</dataConfig> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" ?> | |
<schema name="wikipediaCore" version="1.1"> | |
<types> | |
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> | |
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/> | |
<fieldType name="pint" class="solr.IntField"/> | |
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"/> | |
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/> | |
</types> | |
<fields> | |
<field name="id" type="string" indexed="true" stored="true" required="true"/> | |
<field name="title" type="string" indexed="true" stored="true"/> | |
<field name="revision" type="pint" indexed="false" stored="false"/> | |
<field name="user" type="string" indexed="false" stored="true"/> | |
<field name="userId" type="pint" indexed="false" stored="true"/> | |
<field name="text" type="text_en" indexed="true" stored="true"/> | |
<field name="timestamp" type="date" indexed="false" stored="true"/> | |
<field name="_version_" type="long" indexed="true" stored="true"/> | |
</fields> | |
<uniqueKey>id</uniqueKey> | |
<defaultSearchField>title</defaultSearchField> | |
<solrQueryParser defaultOperator="OR"/> | |
</schema> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8" ?> | |
<config> | |
<luceneMatchVersion>4.6</luceneMatchVersion> | |
<lib dir="/Development/Solr/lib" regex="solr-dataimporthandler-.*\.jar" /> | |
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/> | |
<dataDir>${solr.wikipedia.data.dir:/Development/Data/solr_data/wikipedia}</dataDir> | |
<schemaFactory class="ClassicIndexSchemaFactory"/> | |
<updateHandler class="solr.DirectUpdateHandler2"> | |
<updateLog> | |
<str name="dir">${solr.wikipedia.data.dir:}</str> | |
</updateLog> | |
</updateHandler> | |
<requestHandler name="/get" class="solr.RealTimeGetHandler"> | |
<lst name="defaults"> | |
<str name="omitHeader">true</str> | |
</lst> | |
</requestHandler> | |
<requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy" /> | |
<requestDispatcher handleSelect="true" > | |
<requestParsers enableRemoteStreaming="false" multipartUploadLimitInKB="2048" formdataUploadLimitInKB="2048" /> | |
</requestDispatcher> | |
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true" /> | |
<requestHandler name="/analysis/field" startup="lazy" class="solr.FieldAnalysisRequestHandler" /> | |
<requestHandler name="/update" class="solr.UpdateRequestHandler" /> | |
<requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" /> | |
<requestHandler name="/admin/ping" class="solr.PingRequestHandler"> | |
<lst name="invariants"> | |
<str name="q">solrpingquery</str> | |
</lst> | |
<lst name="defaults"> | |
<str name="echoParams">all</str> | |
</lst> | |
</requestHandler> | |
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> | |
<lst name="defaults"> | |
<str name="config">data-config.xml</str> | |
</lst> | |
</requestHandler> | |
<admin> | |
<defaultQuery>*:*</defaultQuery> | |
</admin> | |
<unlockOnStartup>true</unlockOnStartup> | |
</config> | |
Repository:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.solr.wikipedia; | |
import net.briandupreez.solr.documents.WikipediaDocument; | |
import org.springframework.data.solr.repository.Query; | |
import org.springframework.data.solr.repository.SolrCrudRepository; | |
import org.springframework.stereotype.Repository; | |
import java.util.Collection; | |
/** | |
* Wikipedia repo. | |
* Created by Brian on 2014/01/26. | |
*/ | |
@Repository | |
public interface WikipediaDocumentRepository extends SolrCrudRepository<WikipediaDocument, String> { | |
@Query("title:*?0*") | |
Collection<WikipediaDocument> findByTitleContains(final String title); | |
@Query("text:?0*") | |
Collection<WikipediaDocument> findByTextContains(final String text); | |
@Query("title:*?0* OR text:?0*") | |
Collection<WikipediaDocument> findByAllContains(final String text); | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.solr.wikipedia; | |
import net.briandupreez.solr.SolrIndexService; | |
import net.briandupreez.solr.documents.WikipediaDocument; | |
import org.apache.commons.logging.Log; | |
import org.apache.commons.logging.LogFactory; | |
import org.springframework.stereotype.Service; | |
import org.springframework.transaction.annotation.Transactional; | |
import javax.annotation.Resource; | |
/** | |
* Wikipedia index | |
* Created by Brian on 2014/01/26. | |
*/ | |
@Service | |
public class WikipediaIndexService implements SolrIndexService<WikipediaDocument, String> { | |
private transient final Log logger = LogFactory.getLog(this.getClass()); | |
@Resource | |
private WikipediaDocumentRepository repository; | |
@Transactional | |
@Override | |
public WikipediaDocument add(final WikipediaDocument entry) { | |
final WikipediaDocument saved = repository.save(entry); | |
logger.debug("Saved: " + saved); | |
return saved; | |
} | |
@Transactional | |
@Override | |
public void delete(final String id) { | |
repository.delete(id); | |
logger.debug("Deleted ID: " + id); | |
} | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.solr.wikipedia; | |
import net.briandupreez.solr.documents.WikipediaDocument; | |
import org.apache.commons.logging.Log; | |
import org.apache.commons.logging.LogFactory; | |
import org.springframework.stereotype.Service; | |
import org.springframework.transaction.annotation.Transactional; | |
import javax.annotation.Resource; | |
import java.util.Collection; | |
import java.util.Date; | |
/** | |
* Solr Service. | |
* Created by Brian on 2014/01/26. | |
*/ | |
@Service | |
public class WikipediaSolrServiceImpl implements WikipediaSolrService { | |
private transient final Log logger = LogFactory.getLog(this.getClass()); | |
@Resource | |
private WikipediaIndexService indexService; | |
@Resource | |
private WikipediaDocumentRepository repository; | |
@Transactional | |
@Override | |
public WikipediaDocument add(final String id, final String title, final String user, final String userId, final String text, final Date timestamp) { | |
final WikipediaDocument wikipediaDocument = new WikipediaDocument(); | |
wikipediaDocument.setId(id); | |
wikipediaDocument.setTitle(title); | |
wikipediaDocument.setText(text); | |
wikipediaDocument.setUserId(userId); | |
wikipediaDocument.setUser(user); | |
wikipediaDocument.setTimestamp(timestamp); | |
wikipediaDocument.setAll(wikipediaDocument.toString()); | |
return indexService.add(wikipediaDocument); | |
} | |
@Transactional | |
@Override | |
public void deleteById(final String id) { | |
indexService.delete(id); | |
} | |
@Transactional(readOnly = true) | |
@Override | |
public WikipediaDocument findById(final String id) { | |
final WikipediaDocument wikipediaDocument = repository.findOne(id); | |
logger.debug("FOUND: " + wikipediaDocument); | |
return wikipediaDocument; | |
} | |
@Transactional(readOnly = true) | |
@Override | |
public Collection<WikipediaDocument> findByTitleContains(final String title) { | |
return repository.findByTitleContains(title); | |
} | |
@Transactional(readOnly = true) | |
@Override | |
public Collection<WikipediaDocument> findByTextContains(final String text) { | |
return repository.findByTextContains(text); | |
} | |
@Transactional | |
@Override | |
public Collection<WikipediaDocument> findByAllContains(final String text) { | |
return repository.findByAllContains(text); | |
} | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.solr; | |
import org.springframework.context.annotation.Bean; | |
import org.springframework.context.annotation.ComponentScan; | |
import org.springframework.context.annotation.Configuration; | |
import org.springframework.context.annotation.PropertySource; | |
import org.springframework.core.env.Environment; | |
import org.springframework.data.solr.core.SolrTemplate; | |
import org.springframework.data.solr.repository.config.EnableSolrRepositories; | |
import org.springframework.data.solr.server.support.HttpSolrServerFactoryBean; | |
import org.springframework.transaction.PlatformTransactionManager; | |
import org.springframework.transaction.jta.JtaTransactionManager; | |
import javax.annotation.Resource; | |
/** | |
* Solr Context | |
* Created by Brian on 2014/01/26. | |
*/ | |
@Configuration | |
@EnableSolrRepositories(basePackages = "net.briandupreez.solr.wikipedia") | |
@ComponentScan(basePackages = "net.briandupreez.solr") | |
@PropertySource("classpath:solr.properties") | |
public class SolrContext { | |
@Resource | |
private Environment environment; | |
/** | |
* Solr Factory bean | |
* @return factory bean | |
*/ | |
@Bean | |
public HttpSolrServerFactoryBean solrServerFactoryBean() { | |
final HttpSolrServerFactoryBean factory = new HttpSolrServerFactoryBean(); | |
factory.setUrl(environment.getRequiredProperty("solr.server.url.wiki")); | |
return factory; | |
} | |
/** | |
* The Solr Template... used in WikipediaDocumentRepository. | |
* @return created template | |
* @throws Exception error. | |
*/ | |
@Bean | |
public SolrTemplate solrTemplate() throws Exception { | |
return new SolrTemplate(solrServerFactoryBean().getObject()); | |
} | |
@Bean | |
public PlatformTransactionManager transactionManager() throws Exception { | |
return new JtaTransactionManager(); | |
} | |
} |
Sunday, January 12, 2014
BYG (Bing, Yahoo, Google) Search Wrapper
One small section of my Aria project will be to interface with the current search engines out there. To do this I will require a module that will give me a consistent interface to work with the 3 main providers; Bing, Yahoo! and Google. (and any future ones I may want to add). This is a basic example or that module.
First thing required is to set up accounts / projects and the like with the relevant providers.
I won't describe this process as they were all pretty well documented.
Bing Developer Center
Yahoo Developer Network
Google Developers Console
A couple tips for the above sites.
First thing required is to set up accounts / projects and the like with the relevant providers.
I won't describe this process as they were all pretty well documented.
Bing Developer Center
Yahoo Developer Network
Google Developers Console
A couple tips for the above sites.
- Bing: Setup both the web and synonym searches.
- Yahoo: In the BOSS console, under manage account, put in a daily limit $ amount (or turn of limit), as they only allow 1 free query a day... so only the first request works.
- Google: It doesn't seem that you can set it up to search the whole web, but after creating your custom search engine, you can select "Search the entire web but emphasize included sites" so don't worry about that.
All these providers allow for many options while searching ( e.g. images, location, news, video etc.) , however in this initial example I have limited it to just a pure and simple web search.
All the code will be available in my blog Github repository.
Going through the main points.
There is a BasicWebSearch interface, that takes the search term and returns SearchResults.
SearchResults contains results in a map based on a result type enum.
The implementations of BasicWebSearch namely: BingSearch, GoogleSearch and YahooSearch call the relevant search engine with the search term and then convert the results into a SearchResult. In the case of Yahoo and Bing, I map the JSON result to the SearchResult. Google however does that in their search client included in the dependencies.
Now for the main code bits:
SearchSettings
As this is just an example, I use included the search settings in the following class, be sure to replace with the relevant values.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.search; | |
/** | |
* All search settings | |
* Created by Brian on 2014/01/05. | |
*/ | |
public class SearchSettings { | |
public static final String YAHOO_BASE = "http://yboss.yahooapis.com/ysearch"; | |
public static final String YAHOO_CONSUMER_KEY = "REPLACE ME - consumer key"; | |
public static final String YAHOO_CONSUMER_SECRET = "REPLACE ME - secret"; | |
public static final String GOOGLE_API_KEY = "REPLACE ME - google api key"; | |
public static final String GOOGLE_CX = "REPLACE ME - numbers : alphanumeric"; | |
//"SearchWeb" , "Search".... using both can give you 10 000 free queries... | |
public static final String BING_SEARCH_BASE = "https://api.datamarket.azure.com/Bing/Search/v1/Web"; | |
public static final String BING_WEB_BASE = "https://api.datamarket.azure.com/Bing/SearchWeb/v1/Web"; | |
public static final String BING_SYNONYM_BASE = "https://api.datamarket.azure.com/Bing/Synonyms/v1/GetSynonyms"; | |
public static final String BING_API_KEY = "REPLACE ME - Bing API key"; | |
public static final String ENCODE_FORMAT = "UTF-8"; | |
public static final int HTTP_STATUS_OK = 200; | |
} |
UrlConnectionHandler
As both Bing and Yahoo use an HttpUrlConnection, I figured I would centralise the handling of that, the only difference between the 2 is that Bing used basic authentication and Yahoo I went with the OAuth implementation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.search; | |
import oauth.signpost.OAuthConsumer; | |
import oauth.signpost.exception.OAuthCommunicationException; | |
import oauth.signpost.exception.OAuthExpectationFailedException; | |
import oauth.signpost.exception.OAuthMessageSignerException; | |
import org.apache.commons.codec.binary.Base64; | |
import org.apache.commons.logging.Log; | |
import org.apache.commons.logging.LogFactory; | |
import java.io.BufferedReader; | |
import java.io.IOException; | |
import java.io.InputStreamReader; | |
import java.net.HttpURLConnection; | |
import java.net.URL; | |
/** | |
* Handle a URL Connection. | |
* Created by Brian on 2014/01/08. | |
*/ | |
public class UrlConnectionHandler { | |
private transient final Log logger = LogFactory.getLog(this.getClass()); | |
/** | |
* Base Auth, used with Bing search | |
* | |
* @param url the url | |
* @param apiKey the API Key | |
* @return http url connection | |
*/ | |
public HttpURLConnection createBasicConnection(final String url, final String apiKey) { | |
final HttpURLConnection connection = createConnection(url); | |
final byte[] accountKeyBytes = Base64.encodeBase64((apiKey + ":" + apiKey).getBytes()); | |
final String accountKeyEnc = new String(accountKeyBytes); | |
final String s1 = "Basic " + accountKeyEnc; | |
connection.setRequestProperty("Authorization", s1); | |
return connection; | |
} | |
/** | |
* A Signed OAuth Connection, used with yahoo | |
* | |
* @param url the url | |
* @param consumer the oauth consumer | |
* @return http url connection | |
*/ | |
public HttpURLConnection createOAuthConnection(final String url, final OAuthConsumer consumer) { | |
final HttpURLConnection connection = createConnection(url); | |
if (consumer != null) { | |
try { | |
logger.info("Signing the oAuth consumer"); | |
consumer.sign(connection); | |
connection.connect(); | |
return connection; | |
} catch (OAuthMessageSignerException | OAuthExpectationFailedException | OAuthCommunicationException e) { | |
logger.error("OAuth Error signing the consumer", e); | |
throw new RuntimeException("OAuth Error", e); | |
} catch (final IOException e) { | |
logger.error("Connection Error", e); | |
throw new RuntimeException("Connection Error", e); | |
} | |
} | |
return null; | |
} | |
private HttpURLConnection createConnection(final String url) { | |
try { | |
final URL u = new URL(url); | |
final HttpURLConnection uc = (HttpURLConnection) u.openConnection(); | |
return uc; | |
} catch (final Exception e) { | |
logger.error("Create Connection Exception.", e); | |
throw new RuntimeException("Connection Error", e); | |
} | |
} | |
/** | |
* Process connection | |
* @param connection the connection | |
* @return the result | |
*/ | |
public RequestResult processConnection(final HttpURLConnection connection) { | |
RequestResult result = null; | |
try { | |
final int responseCode = connection.getResponseCode(); | |
if (200 == responseCode || 401 == responseCode || 404 == responseCode) { | |
BufferedReader rd = null; | |
try { | |
rd = new BufferedReader(new InputStreamReader(responseCode == 200 ? connection.getInputStream() : connection.getErrorStream())); | |
final StringBuilder sb = new StringBuilder(); | |
String line; | |
while ((line = rd.readLine()) != null) { | |
sb.append(line); | |
} | |
result = new RequestResult(responseCode, sb.toString()); | |
} catch (final IOException e) { | |
logger.error("Stream Error", e); | |
throw new RuntimeException("Stream Error", e); | |
} finally { | |
if (rd != null) { | |
rd.close(); | |
} | |
} | |
} | |
} catch (final IOException e) { | |
logger.error("Connection Exception", e); | |
throw new RuntimeException("Connection Exception", e); | |
} | |
return result; | |
} | |
public static class RequestResult { | |
private final int responseCode; | |
private final String response; | |
public RequestResult(final int responseCode, final String response) { | |
this.responseCode = responseCode; | |
this.response = response; | |
} | |
public int getResponseCode() { | |
return responseCode; | |
} | |
public String getResponse() { | |
return response; | |
} | |
} | |
} |
BingSearch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.search.bing; | |
import net.briandupreez.search.BasicWebSearch; | |
import net.briandupreez.search.SearchResults; | |
import net.briandupreez.search.UrlConnectionHandler; | |
import org.apache.commons.httpclient.util.URIUtil; | |
import org.apache.commons.logging.Log; | |
import org.apache.commons.logging.LogFactory; | |
import java.net.HttpURLConnection; | |
import static net.briandupreez.search.SearchSettings.*; | |
/** | |
* Bing search api integration | |
* Created by Brian on 2014/01/02. | |
*/ | |
public class BingSearch implements BasicWebSearch { | |
private static transient final Log log = LogFactory.getLog(BingSearch.class); | |
@Override | |
public SearchResults search(final String searchTerm) throws Exception { | |
final String bingUrl = String.format("%s?Query=%%27%s%%27&$format=JSON", BING_WEB_BASE, URIUtil.encode(searchTerm, null, ENCODE_FORMAT)); | |
SearchResults searchResults = new SearchResults(searchTerm); | |
try { | |
final UrlConnectionHandler urlConnectionHandler = new UrlConnectionHandler(); | |
final HttpURLConnection basicConnection = urlConnectionHandler.createBasicConnection(bingUrl, BING_API_KEY); | |
final UrlConnectionHandler.RequestResult result = urlConnectionHandler.processConnection(basicConnection); | |
if (result.getResponseCode() == HTTP_STATUS_OK) { | |
final BingResultParser bingResultParser = new BingResultParser(); | |
searchResults = bingResultParser.parseWeb(searchTerm, result.getResponse()); | |
} else { | |
searchResults.setFailed(true); | |
log.error("Error in response due to status code = " + result.getResponseCode() + "Response:\n" + result.getResponse()); | |
} | |
} catch (final Exception e) { | |
searchResults.setFailed(true); | |
log.error("Search Error", e); | |
} | |
return searchResults; | |
} | |
} |
BingResultParser
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.search.bing; | |
import com.fasterxml.jackson.databind.JsonNode; | |
import com.fasterxml.jackson.databind.ObjectMapper; | |
import net.briandupreez.search.SearchResult; | |
import net.briandupreez.search.SearchResultParser; | |
import net.briandupreez.search.SearchResults; | |
import net.briandupreez.search.SearchSynonymResults; | |
import org.apache.log4j.Logger; | |
import java.io.IOException; | |
/** | |
* Parse the results | |
* Created by Brian on 2014/01/04. | |
*/ | |
public class BingResultParser implements SearchResultParser { | |
private static final Logger log = Logger.getLogger(BingResultParser.class); | |
@Override | |
public SearchResults parseWeb(final String searchTerm, final String searchResults){ | |
final ObjectMapper mapper = new ObjectMapper(); | |
final SearchResults response = new SearchResults(searchTerm); | |
final JsonNode input; | |
try { | |
input = mapper.readTree(searchResults); | |
final JsonNode webResults = input.get("d").get("results"); | |
for (final JsonNode element: webResults) { | |
final SearchResult result = new SearchResult(); | |
result.setUrl(element.get("Url").asText()); | |
result.setDisplay(element.get("DisplayUrl").asText()); | |
result.setDescription(element.get("Description").asText()); | |
result.setTitle(element.get("Title").asText()); | |
response.addResult(SearchResults.ResultType.WEB, result); | |
} | |
} catch (final IOException e) { | |
log.error("Parser Error", e); | |
throw new RuntimeException("Result Parser Failure", e); | |
} | |
return response; | |
} | |
public SearchSynonymResults parseSynonym(final String searchTerm, final String synonymResults){ | |
final ObjectMapper mapper = new ObjectMapper(); | |
final SearchSynonymResults response = new SearchSynonymResults(searchTerm); | |
final JsonNode input; | |
try { | |
input = mapper.readTree(synonymResults); | |
final JsonNode webResults = input.get("d").get("results"); | |
for (final JsonNode element: webResults) { | |
response.addSynonym (element.get("Synonym").asText()); | |
} | |
} catch (final IOException e) { | |
log.error("Parser Error", e); | |
throw new RuntimeException("Result Parser Failure", e); | |
} | |
return response; | |
} | |
} |
YahooSearch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.search.yahoo; | |
import net.briandupreez.search.BasicWebSearch; | |
import net.briandupreez.search.SearchResults; | |
import net.briandupreez.search.UrlConnectionHandler; | |
import oauth.signpost.OAuthConsumer; | |
import oauth.signpost.basic.DefaultOAuthConsumer; | |
import org.apache.commons.httpclient.util.URIUtil; | |
import org.apache.commons.logging.Log; | |
import org.apache.commons.logging.LogFactory; | |
import java.net.HttpURLConnection; | |
import static net.briandupreez.search.SearchSettings.*; | |
/** | |
* Yahoo! Search BOSS | |
*/ | |
public class YahooSearch implements BasicWebSearch { | |
private transient final Log log = LogFactory.getLog(this.getClass()); | |
/** | |
* Search | |
* | |
* @return results | |
*/ | |
@Override | |
public SearchResults search(final String searchTerm) throws Exception { | |
SearchResults searchResults = new SearchResults(searchTerm); | |
//replace the + with %20... seems OAuth doesn't like it | |
final String url = String.format("%s/web?q=%s", YAHOO_BASE, URIUtil.encode(searchTerm, null, ENCODE_FORMAT)).replace("+", "%20"); | |
final OAuthConsumer consumer = new DefaultOAuthConsumer(YAHOO_CONSUMER_KEY, YAHOO_CONSUMER_SECRET); | |
final String responseBody; | |
try { | |
final UrlConnectionHandler connectionHandler = new UrlConnectionHandler(); | |
final HttpURLConnection oAuthConnection = connectionHandler.createOAuthConnection(url, consumer); | |
log.info("sending get request to: " + url + " Decoded: " + URIUtil.decode(url)); | |
final UrlConnectionHandler.RequestResult result = connectionHandler.processConnection(oAuthConnection); | |
if (result.getResponseCode() == HTTP_STATUS_OK) { | |
responseBody = result.getResponse(); | |
log.info("Response: " + responseBody); | |
if (!responseBody.contains("yahoo:error")) { | |
final YahooResultParser yahooResultParser = new YahooResultParser(); | |
searchResults = yahooResultParser.parseWeb(searchTerm, responseBody); | |
} | |
} else { | |
searchResults.setFailed(true); | |
log.error("Error in response due to status code = " + result.getResponseCode() + "Response:\n" + result.getResponse()); | |
} | |
} catch (final Exception e) { | |
searchResults.setFailed(true); | |
log.error("Search Error", e); | |
} | |
return searchResults; | |
} | |
} |
YahooResultParser
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.search.yahoo; | |
import com.fasterxml.jackson.databind.JsonNode; | |
import com.fasterxml.jackson.databind.ObjectMapper; | |
import net.briandupreez.search.SearchResult; | |
import net.briandupreez.search.SearchResultParser; | |
import net.briandupreez.search.SearchResults; | |
import org.apache.log4j.Logger; | |
import java.io.IOException; | |
/** | |
* Parse the results | |
* Created by Brian on 2014/01/04. | |
*/ | |
public class YahooResultParser implements SearchResultParser { | |
private static final Logger log = Logger.getLogger(YahooResultParser.class); | |
@Override | |
public SearchResults parseWeb(final String searchTerm, final String searchResults){ | |
final ObjectMapper mapper = new ObjectMapper(); | |
final SearchResults response = new SearchResults(searchTerm); | |
final JsonNode input; | |
try { | |
input = mapper.readTree(searchResults); | |
final JsonNode webResults = input.get("bossresponse").get("web").get("results"); | |
for (final JsonNode element: webResults) { | |
final SearchResult result = new SearchResult(); | |
result.setDescription(element.get("abstract").asText()); | |
result.setTitle(element.get("title").asText()); | |
result.setDisplay(element.get("dispurl").asText()); | |
result.setUrl(element.get("url").asText()); | |
response.addResult(SearchResults.ResultType.WEB, result); | |
} | |
} catch (final IOException e) { | |
log.error("Parser Error", e); | |
throw new RuntimeException("Result Parser Failure", e); | |
} | |
return response; | |
} | |
} |
GoogleSearch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.search.google; | |
import com.google.api.client.http.HttpRequest; | |
import com.google.api.client.http.HttpRequestInitializer; | |
import com.google.api.client.http.javanet.NetHttpTransport; | |
import com.google.api.client.json.jackson.JacksonFactory; | |
import com.google.api.services.customsearch.Customsearch; | |
import com.google.api.services.customsearch.model.Result; | |
import com.google.api.services.customsearch.model.Search; | |
import net.briandupreez.search.BasicWebSearch; | |
import net.briandupreez.search.SearchResults; | |
import org.apache.commons.logging.Log; | |
import org.apache.commons.logging.LogFactory; | |
import java.io.IOException; | |
import java.util.List; | |
import static net.briandupreez.search.SearchSettings.*; | |
/** | |
* Created by Brian on 2014/01/04. | |
*/ | |
public class GoogleSearch implements BasicWebSearch { | |
private transient final Log logger = LogFactory.getLog(this.getClass()); | |
@Override | |
public SearchResults search(final String query) throws Exception { | |
final Customsearch customsearch = new Customsearch(new NetHttpTransport(), new JacksonFactory(), new DisableTimeoutRequest()); | |
final SearchResults searchResults = new SearchResults(query); | |
try { | |
final Customsearch.Cse.List list = customsearch.cse().list(query); | |
list.setKey(GOOGLE_API_KEY); | |
list.setCx(GOOGLE_CX); | |
final Search results = list.execute(); | |
final List<Result> items = results.getItems(); | |
for (final Result result : items) { | |
final GoogleSearchResult searchResult = new GoogleSearchResult(); | |
searchResult.setTitle(result.getTitle()); | |
searchResult.setDisplay(result.getDisplayLink()); | |
searchResult.setUrl(result.getFormattedUrl()); | |
searchResult.setDescription(result.getSnippet()); | |
searchResult.setPagemap(result.getPagemap()); | |
searchResult.setMime(result.getMime()); | |
searchResult.setLink(result.getLink()); | |
searchResult.setKind(result.getKind()); | |
searchResult.setHtmlTitle(result.getHtmlTitle()); | |
searchResult.setHtmlSnippet(result.getHtmlSnippet()); | |
searchResult.setFormattedUrl(result.getFormattedUrl()); | |
searchResult.setFileFormat(result.getFileFormat()); | |
searchResults.addResult(SearchResults.ResultType.WEB, searchResult); | |
} | |
} catch (final IOException e) { | |
searchResults.setFailed(true); | |
logger.error("Google Search Error", e); | |
} | |
return searchResults; | |
} | |
public class DisableTimeoutRequest implements HttpRequestInitializer { | |
public void initialize(final HttpRequest request) { | |
request.setConnectTimeout(0); | |
request.setReadTimeout(0); | |
} | |
} | |
} |
GoogleSearchResult
Google has a whole bunch of extra information being returned so I extended the base SearchResult so add all the information just in case I ever need it.
Maven Dependencies
Google has a whole bunch of extra information being returned so I extended the base SearchResult so add all the information just in case I ever need it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package net.briandupreez.search.google; | |
import net.briandupreez.search.SearchResult; | |
import java.util.List; | |
import java.util.Map; | |
/** | |
* Google specific return | |
* Created by Brian on 2014/01/05. | |
*/ | |
public class GoogleSearchResult extends SearchResult { | |
private String fileFormat; | |
private String formattedUrl; | |
private String htmlSnippet; | |
private String htmlTitle; | |
//private Image image; | |
private String kind; | |
private String link; | |
private String mime; | |
private Map<String, List<Map<String, Object>>> pagemap; | |
public Map<String, List<Map<String, Object>>> getPagemap() { | |
return pagemap; | |
} | |
public void setPagemap(final Map<String, List<Map<String, Object>>> pagemap) { | |
this.pagemap = pagemap; | |
} | |
public String getMime() { | |
return mime; | |
} | |
public void setMime(final String mime) { | |
this.mime = mime; | |
} | |
public String getLink() { | |
return link; | |
} | |
public void setLink(final String link) { | |
this.link = link; | |
} | |
public String getKind() { | |
return kind; | |
} | |
public void setKind(final String kind) { | |
this.kind = kind; | |
} | |
public String getHtmlTitle() { | |
return htmlTitle; | |
} | |
public void setHtmlTitle(final String htmlTitle) { | |
this.htmlTitle = htmlTitle; | |
} | |
public String getHtmlSnippet() { | |
return htmlSnippet; | |
} | |
public void setHtmlSnippet(final String htmlSnippet) { | |
this.htmlSnippet = htmlSnippet; | |
} | |
public String getFormattedUrl() { | |
return formattedUrl; | |
} | |
public void setFormattedUrl(final String formattedUrl) { | |
this.formattedUrl = formattedUrl; | |
} | |
public String getFileFormat() { | |
return fileFormat; | |
} | |
public void setFileFormat(final String fileFormat) { | |
this.fileFormat = fileFormat; | |
} | |
@Override | |
public String toString() { | |
return "GoogleSearchResult{" + | |
"fileFormat='" + fileFormat + '\'' + | |
", formattedUrl='" + formattedUrl + '\'' + | |
", htmlSnippet='" + htmlSnippet + '\'' + | |
", htmlTitle='" + htmlTitle + '\'' + | |
", kind='" + kind + '\'' + | |
", link='" + link + '\'' + | |
", mime='" + mime + '\'' + | |
", pagemap=" + pagemap + | |
'}'; | |
} | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<project xmlns="http://maven.apache.org/POM/4.0.0" | |
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | |
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | |
<parent> | |
<artifactId>Blog</artifactId> | |
<groupId>Blog</groupId> | |
<version>1.0-SNAPSHOT</version> | |
</parent> | |
<modelVersion>4.0.0</modelVersion> | |
<artifactId>BYGSearch</artifactId> | |
<dependencies> | |
<dependency> | |
<groupId>com.google.guava</groupId> | |
<artifactId>guava</artifactId> | |
<version>15.0</version> | |
</dependency> | |
<dependency> | |
<groupId>commons-codec</groupId> | |
<artifactId>commons-codec</artifactId> | |
<version>1.8</version> | |
</dependency> | |
<dependency> | |
<groupId>com.fasterxml.jackson.core</groupId> | |
<artifactId>jackson-core</artifactId> | |
<version>2.3.0</version> | |
</dependency> | |
<dependency> | |
<groupId>com.fasterxml.jackson.core</groupId> | |
<artifactId>jackson-databind</artifactId> | |
<version>2.3.0</version> | |
</dependency> | |
<dependency> | |
<groupId>commons-httpclient</groupId> | |
<artifactId>commons-httpclient</artifactId> | |
<version>3.1</version> | |
</dependency> | |
<dependency> | |
<groupId>oauth.signpost</groupId> | |
<artifactId>signpost-core</artifactId> | |
<version>1.2</version> | |
</dependency> | |
<dependency> | |
<groupId>com.google.apis</groupId> | |
<artifactId>google-api-services-customsearch</artifactId> | |
<version>v1-rev32-1.17.0-rc</version> | |
</dependency> | |
<dependency> | |
<groupId>com.google.http-client</groupId> | |
<artifactId>google-http-client-jackson</artifactId> | |
<version>1.17.0-rc</version> | |
</dependency> | |
<dependency> | |
<groupId>com.google.oauth-client</groupId> | |
<artifactId>google-oauth-client-java6</artifactId> | |
<version>1.17.0-rc</version> | |
</dependency> | |
<dependency> | |
<groupId>org.slf4j</groupId> | |
<artifactId>slf4j-api</artifactId> | |
<version>1.6.1</version> | |
</dependency> | |
<dependency> | |
<groupId>org.slf4j</groupId> | |
<artifactId>slf4j-log4j12</artifactId> | |
<version>1.6.1</version> | |
</dependency> | |
<dependency> | |
<groupId>log4j</groupId> | |
<artifactId>log4j</artifactId> | |
<version>1.2.16</version> | |
</dependency> | |
<dependency> | |
<groupId>commons-logging</groupId> | |
<artifactId>commons-logging</artifactId> | |
<version>1.1.1</version> | |
</dependency> | |
<dependency> | |
<groupId>junit</groupId> | |
<artifactId>junit</artifactId> | |
<version>4.10</version> | |
<scope>test</scope> | |
</dependency> | |
</dependencies> | |
</project> |
Subscribe to:
Posts (Atom)
Popular Posts
-
I have recently been slacking on content on my blog, between long stressful hours at work and to the wonderful toy that is an iPhone, I have...
-
I make no claim to be a "computer scientist" or a software "engineer", those titles alone can spark some debate, I regar...
-
I saw an article (well more of a rant) the other day, by Rob Williams Brain Drain in enterprise Dev . I have to say, I do agree with some o...
-
Update: Check out my updated re-certification on the new 2019 exam... here Let me start by saying, for this certification I studied and...
-
This series of posts will be about me getting to grips with JBoss Drools . The reasoning behind it is: SAP bought out my company's curre...