Blogs by topic Programming and user alois
| • Accessibility | • Ajax | • Blogging | • Business | • Community |
| • Databases | • Deployment | • Distributed | • Eclipse | • Education |
| • EJB | • Extreme Programming | • Games | • GlassFish | • Grid |
| • GUI | • IDE | • Instant Messaging | • J2EE | • J2ME |
| • J2SE | • Jakarta | • JavaFX | • JavaOne | • Jini |
| • JSP | • JSR | • JXTA | • LDAP | • Linux |
| • Mobility | • NetBeans | • Open Source | • OpenSolaris | • OSGi |
| • P2P | • Patterns | • Performance | • Porting | • Programming |
| • Research | • RMI | • RSS Feeds | • Search | • Security |
| • Servlets | • Struts | • Swing | • Testing | • Tools |
| • Virtual Machine | • Web Applications | • Web Design | • Web Development Tools | • Web Services and XML |
Programming
Here is a little code challenge !
I'm actually working on a text-mining/semantic web application focused (for the moment) on biomedical informations and developed in Java. We are using external tools for text-mining analysis and unfortunatly theses tools don't handle HTML pretty well ... If we send raw HTML to the text-mining service, he simply break. So we must convert HTML to plain-text before processing text, and because the tools return identified words by giving their positions, we must translate theses position (or indexes) to find corresponding word in the original HTML.
I created a simply implementation and posted it on gist.github.com ... can you make it better ?
I'v just published an integration module for using GridGain with Spring Batch. Using this module you can distribute Spring Batch processing inside a GridGain grid with the implementation of remote chunking.



