02 5 / 2010

CSS Selector for HTTPBuilder

If you use HTTPBuilder to crawl web pages and extract information, you would have noticed that it uses the Groovy’s XML Support for parsing HTML. Groovy’s GPath is powerful, but HTML has something more powerful (not to mention simple, easy and intuitive) for selection, CSS Selectors. jQuery has proved css selectors are indeed, the best way for DOM manipulation.

CSS Selectors are available for Java using this library

http://github.com/chrsan/css-selectors

I wrote a small facade class CSSSelector to expose css selectors, the Groovy way.

http://code.google.com/p/css-selector-httpbuilder/

Here’s an example,

The CSSSelector class has no dependencies on HTTPBuilder. It can be used with any library in Groovy. If you want to use it with a Java library, you can use css-selectors in github