Continuing with Programming Collection Intelligence (PCI) the next exercise was using the distance scores to pigeonhole a list of blogs based on the words used within the relevant blog.
I had already found Encog as the framework for the AI / Machine learning algorithms, for this exercise I needed an RSS reader and a HTML parser.
The 2 libraries I ended up using were:
ROME
JSoup
For general other utilities and collection manipulations I used:
Google Guava
I kept the list of blogs short, included some of the software bloggers I follow, just to make testing quick, had to alter the %'s a little from the implementation in (PCI), but still got the desired result.
Blogs Used:
http://blog.guykawasaki.com/index.rdf
http://blog.outer-court.com/rss.xml
http://flagrantdisregard.com/index.php/feed/
http://gizmodo.com/index.xml
http://googleblog.blogspot.com/rss.xml
http://radar.oreilly.com/index.rdf
http://www.wired.com/rss/index.xml
http://feeds.feedburner.com/codinghorror
http://feeds.feedburner.com/joelonsoftware
http://martinfowler.com/feed.atom
http://www.briandupreez.net/feeds/posts/default
For the implementation I just went with a main class and a reader class:
Main:
The Results:
*** Cluster 1 ***
[http://www.briandupreez.net/feeds/posts/default]
*** Cluster 2 ***
[http://blog.guykawasaki.com/index.rdf]
[http://radar.oreilly.com/index.rdf]
[http://googleblog.blogspot.com/rss.xml]
[http://blog.outer-court.com/rss.xml]
[http://gizmodo.com/index.xml]
[http://flagrantdisregard.com/index.php/feed/]
[http://www.wired.com/rss/index.xml]
*** Cluster 3 ***
[http://feeds.feedburner.com/joelonsoftware]
[http://feeds.feedburner.com/codinghorror]
[http://martinfowler.com/feed.atom]
Sunday, June 16, 2013
Blog Categorisation using Encog, ROME, JSoup and Google Guava
Labels:
AI,
Encog,
Google,
Java,
Machine Learning
Subscribe to:
Post Comments (Atom)
Popular Posts
-
I have recently been slacking on content on my blog, between long stressful hours at work and to the wonderful toy that is an iPhone, I have...
-
I make no claim to be a "computer scientist" or a software "engineer", those titles alone can spark some debate, I regar...
-
I saw an article (well more of a rant) the other day, by Rob Williams Brain Drain in enterprise Dev . I have to say, I do agree with some o...
-
This series of posts will be about me getting to grips with JBoss Drools . The reasoning behind it is: SAP bought out my company's curre...
-
Update: Check out my updated re-certification on the new 2019 exam... here Let me start by saying, for this certification I studied and...
This message is perfect.
ReplyDeleteMua vé tại đại lý vé máy bay Aivivu, tham khảo
ReplyDeletevé máy bay đi Mỹ bao nhiêu
giá vé máy bay từ mỹ về vn
vé máy bay khứ hồi từ đức về việt nam
giá vé máy bay nga về việt nam
các chuyến bay từ anh về việt nam
chuyến bay từ paris về hà nội
chuyến bay chuyên gia trung quốc
Lovely ppost
ReplyDelete