OpenSymphony [12]'s ClickStream [13] is a user tracking component for Java web applications. This means you can take a look and analyze the traffic paths and the sequence of pages that users have generated as they browsed your site. This traffic path is called a clickstream and it is the logical grouping of a HTTP session identifier and the requests associated with it, until the end of this session. The good news is you can easily add this feature to your application by embedding OpenSymphony's ClickStream to take advantage of this site usage information.
We'll look first at how ClickStream works and what information it collects. Then we'll proceed to configure your application to use ClickStream. Finally we'll log the ClickStream-generated information to a database and exploit it with standard database queries.
ClickStream starts tracking the user's activity as soon as the
web container creates an HTTP session. As you might know, the J2EE
specification defines a listener model. Session listeners are one
of the Servlet-2.3-specified listeners and are notified each time
an HTTP session is created or destroyed. ClickStream's session
listener is called ClickstreamListener, and it and ClickstreamFilter are the
fundamental components of ClickStream.
ClickStream's main element is a servlet filter called
ClickstreamFilter, which intercepts all the requests
to a defined web resource (a single page) or resource sets (a set
of pages) designated by a page pattern. Both components are
configured inside your application's web.xml file, but
don't worry about this yet; we'll look into the configuration of
this filter and session listener in a future section.
For now, we'll take a look at the information gathered from each clickstream.
ClickStream logs specific information from each request and
accumulates it in the corresponding ClickStream
object. This object is the one sent to the
ClickStreamLogger for logging when the session ends.
ClickStream uses the following information as the ClickStream
header data:
request.getRemoteHost(). Remember that if
your application runs behind a reverse proxy (a common scenario for
firewalled applications), only the proxy's IP address is
logged.? in the URL and separated by
&. It also logs the parameter data submitted via
the HTTP POST method as well.request.getRemoteUser() method.The first step is obviously to download the ClickStream distribution from the OpenSymphony site [12]. Then, to embed ClickStream into your application, start by adding clickstream.jar and commons-logging.jar (if your project doesn't already use this component) into the WEB-INF/lib directory of your WAR application.
Then edit the web.xml descriptor from your application
with any text editor. You must add ClickStream's session listener
and filter. The filter is defined for each resource wildcard you
want to track with ClickStream. For example, if you want to track
every page hit, you must define the /* wildcard. On the
other hand, if you want to record only the hits directed to the
/MyServlet path, use a /MyServlet/*
wildcard. See the servlet
specification [14] for more wildcard examples.
The next part of the web.xml instructs the filter to record only the hits directed to JSP and HTML pages.
<filter>
<filter-name>clickstream</filter-name>
<filter-class>com.opensymphony.clickstream.ClickstreamFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>clickstream</filter-name>
<url-pattern>*.jsp</url-pattern>
</filter-mapping>
<filter-mapping>
<filter-name>clickstream</filter-name>
<url-pattern>*.html</url-pattern>
</filter-mapping>
<listener>
<listener-class>com.opensymphony.clickstream.ClickstreamListener</listener-class>
</listener>
The <filter-mapping> element associates the
ClickStream filter with both extensions.
After adding the listener and filter, you can put the included ClickStream JSPs into your web application's root directory. Both clickstreams.jsp and viewstream.jsp are needed to browse the ClickStream information online. Figure 1 illustrates clickstreams.jsp, which shows all the active clickstreams:
Figure 1. ClickStream clickstreams.jsp page
The clickstreams.jsp file lists all the active clickstreams of the application ordered by the remote host IP. When you click one of the host IPs, ClickStream's viewstream.jsp appears, as shown in Figure 2:
Figure 2. ClickStream viewstream.jsp page
These two pages allow you to browse the not-yet-stored-in-the-database clickstreams, and are very useful for quick browsing. The next section shows how to set up ClickStream to log the user traffic data into a database for further processing and analysis.
By default, ClickStream uses the Commons Logging component to
store the tracking information to the console or to logging files.
In this example, we'll use a custom ClickStreamLogger
to save the information to a database. First we'll configure
ClickStream to use our logger and then we'll create the
corresponding database schema.
ClickStream offers the ability to change the logging strategy by
creating a new logging class, which implements the
ClickStreamLogger interface, and configuring its use
in the clickstream.xml file located in the
WEB-INF/classes folder. You can find the
DatabaseClickStreamLogger custom database logger and
the sample clickstream.xml configuration file in the
included source code [11]. Our
clickstream.xml will look like this:
<clickstream>
<logger class="net.java.cs.DatabaseClickStreamLogger"/>
<bot-host name="inktomi.com"/>
...
thousands of bots' names skipped for brevity.
</clickstream>
The configuration of the logger is done through a database.properties file. This property file is also included in the sample code, and looks like this:
jdbc.driver.class=org.postgresql.Driver
jdbc.url=jdbc:postgresql://localhost/clickstream-db
jdbc.user=jdoe
jdbc.pass=secret
Just replace the URL, JDBC driver class, user, and password with the appropriate values for your database. Our configuration is ready, so let's create the ClickStream's database schema. The database model is made up of only two tables: one with the header clickstream data, and the other with the detailed request information of each clickstream. Figure 3 graphically shows the structure.
[15]
Figure 3. ClickStream DB schema (click for full-size
image) [15]
Execute the included SQL script, clickstream.sql, to create the tables in your favorite database.
We are all set up; now when your application starts, it'll begin to log the clickstream information to your database. The following section shows how to exploit the tracking information using some very useful metrics.
The fact that we've stored the user tracking information inside a database server means that we can classify, measure, and manipulate it at will. Some metrics you'll find very useful are:
select * from
clickstream_requests where sessionid = 'nlggs2ccbeb2').
You can visualize the interactions by looking at the sequence diagram in Figure 4, which depicts the complete lifecycle of ClickStream inside your web application.
[16]
Figure 4. ClickStream lifecycle sequence diagram (click for
full-size image) [16]
As you can figure out from the UML sequence diagram, the
ClickStream activity starts when an HTTP request arrives. If an
HTTP session is not associated in any way to the request, the web
container creates one and calls the session listeners; in this case
ClickstreamListener is notified. ClickstreamListener generates a new
ClickStream object to collect the user page track and
stores it in the newly created session.
Then, if the request matches one of the resources defined by the
ClickstreamFilter wildcard inside the web.xml
file, the web container calls the ClickstreamFilter.
This filter adds the request information to the session's
ClickStream object. This cycle continues until the
session is explicitly invalidated or the session expires due to
user inactivity. Each page or resource the user requests is logged
into the same ClickStream object.
When the ClickStreamListener is notified about the
end of a user session, it logs the ClickStream by
calling ClickStreamLogger. ClickStream configures
this component with a Jakarta Commons Logging Logger
by default, but this can be overridden with a custom
ClickStreamLogger, as we saw earlier.
Of course, you don't need to wait until the session expires to see the clickstream information gathered during the application's uptime. You can browse and list your users' clickstreams and page hits with the provided viewstream.jsp JSP page.
In this article we have covered the embedding of ClickStream into your web application, and we've seen how to exploit this information once stored in a database. Be aware that even slight user activity can generate a massive amount of tracking information, so it's highly recommended to do some pruning of this information every two or three days, depending on your users' activity. You'll probably encounter more uses for this information: finding unused pages and bottlenecks, spike predictions, etc.
Links:
[1] http://www.java.net/author/diego-naya
[2] http://www.java.net/article/2007/09/03/instant-user-tracking-clickstream
[3] http://www.java.net/article/2007/09/03/instant-user-tracking-clickstream#introducing-clickstream
[4] http://www.java.net/article/2007/09/03/instant-user-tracking-clickstream#understanding-the-clickstream-life-cycle
[5] http://www.java.net/article/2007/09/03/instant-user-tracking-clickstream#the-logged-clickstream-data
[6] http://www.java.net/article/2007/09/03/instant-user-tracking-clickstream#embedding-clickstream-into-your-application
[7] http://www.java.net/article/2007/09/03/instant-user-tracking-clickstream#logging-the-user-tracking-information-to-a-database
[8] http://www.java.net/article/2007/09/03/instant-user-tracking-clickstream#exploiting-the-user-tracking-information
[9] http://www.java.net/article/2007/09/03/instant-user-tracking-clickstream#under-the-hood
[10] http://www.java.net/article/2007/09/03/instant-user-tracking-clickstream#where-to-go-from-here
[11] http://www.java.net/article/2007/09/03/instant-user-tracking-clickstream#resources
[12] https://opensymphony.dev.java.net/
[13] https://clickstream.dev.java.net/
[14] http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/Servlets8.html
[15] http://www.java.net/images/2007/09/figure4.PNG
[16] http://www.java.net/images/2007/09/figure3.png
[17] http://www.java.net/today/2007/09/06/clickstream-article-source-code.zip
[18] http://www.opensymphony.com/clickstream/