XQuery For Java, An Enabler For SOAPortable data is a main concern in service-oriented architecture (SOA) but is no longer rocket science since XML has been doing the job perfectly fine. What is more of a concern are the overall engineering steps involved in retrieving data from some persistent store (we do need a data store, as we cannot live with volatile data), massaging it, and then transforming to a portable format (XML), adhering to some schema agreed upon by both the consumer and producer. N number of combinations of steps can do this job, but in some cases we need to deal with questions like:
In this article we are going to talk about XQuery and its derivatives including the XQJ (XQuery API for Java) specification, which is under development as part of JSR-225: XQuery API for Java. The first section of this article will introduce both XQuery and XQJ and equip the reader with some code and tools to get their hands dirty. Then we will revisit the questions raised above, taking a particular context as example. We proceed by first understanding the real pain points experienced by developers in data transformations and then we take the reader through a simple case study, again with some working code. Throughout the article we will use XQuery as implemented by Saxon to demonstrate the concepts in code. In doing so, we also introduce Saxon SQL extensions with an intention to set reader expectation towards few forthcoming implementations.
XQuery is a declarative query language for XML, just like SQL plays a similar role for relational data. XQuery 1.0 is a query language being developed by the W3C XML Query Language Work Group. At least a few of you should be familiar with the SAX and/or DOM APIs to manipulate XML data. We are happy with these APIs and here we will look at how XQuery-based XQJ makes life even better for developers. XQJ will conform to the XQuery 1.0 specification and will define a set of interfaces and classes that enable an application to submit XQuery queries to an XML data source and process the results of these queries. XQJ will also facilitate submitting XPath 2.0 expressions to an XML data source. Contrary to a general-purpose programming language like Java or C# using the SAX or DOM model to manipulate XML data, XQuery is specific to a particular domain (querying XML) itself. Due to this specific nature of XQuery, using a single line of XML language like XSLT or XQuery we can produce the same effect as produced by hundreds of lines of code of Java, C#, or some other general-purpose language. XQuery is thus a declarative language and designed upfront to work with XML data. Perhaps we should also look at how XQuery is different from its counterparts, like XPath and XSLT.
XPath is optimized for accessing sections or parts of an XML document. Thus we can immediately use XPath if the requirement is just to select a node from within an XML document. But XPath cannot return a part of the selected node (like the node element tag alone, omitting content) and it cannot create new XML. XSLT includes XPath as a subset to address XML document parts and also includes many other features. XSLT can contain variables and namespaces and can create new documents. XSLT is optimized for recursively processing an XML document or translating XML into HTML, WML, VoiceXML, etc. But writing a user-defined function or other common operations are tedious in XSLT, and XQuery scores here by expressing joins and sorts. It can also manipulate sequences of values and nodes in arbitrary order, not just in the order in the document. It is also easy to write user-defined and recursive functions in XQuery. An introductory XML.com article answers the question "What Is XQuery," and another one explains "Generating XML and HTML using XQuery."
XQJ defines a set of interfaces and classes that enables a Java application to submit XQuery queries to an XML data source and process the results of these queries. Queries may be executed against individual XML documents or collections of XML documents. The XQuery standard provides a great degree of freedom for implementers in how they choose to implement many of its features. This means different implementations can differ in how they handle a temporary intermediate result as long as the query produces the correct, final "answer." A few XQuery implementations are available now, amongst which Qexo is worth mentioning. Similarly, Saxon is a collection of XML processing tools by Saxonica for XSLT 2.0, XPath 2.0, XQuery 1.0, and XML Schema 1.0. Saxon also offers two other APIs for XQuery processing: Saxon's own native API, and an early implementation of the XQJ. Saxon is available for both the Java and .NET platforms as two packages: Saxon-B and Saxon-SA. Saxon-B and all its features are available under an open source license to all users, whereas Saxon-SA requires activation by a license key.
XQJ specifies that a data source may be obtained from a JNDI
source or through other means, but is not very clear on the allowable
"other methods." Once instantiated, an XQDataSource
can act as a factory for creating XQuery connection objects,
sequences, and items. The XQDataSource has three
overloaded getConnection() methods to get a
connection, as shown below:
public XQConnection getConnection() throws XQException;
public XQConnection getConnection (java.lang.String username,
java.lang.String passwd) throws XQException;
public XQConnection getConnection(java.sql.Connection con)
throws XQException;
The last one is promising because the XQJ spec recommends
attempting to create a connection to an XML data source using an
existing JDBC connection. Even though an XQJ implementation is not
required to support this method, if supported, the XQJ and JDBC
connections will operate under the same transaction context. Once
an XQConnection is retrieved, we can now call
prepareExpression() to compile a query. The resulting
XQPreparedExpression object has a method called
executeQuery() (which allows the query to be evaluated),
which then returns an XQSequence. The
XQSequence can act as a cursor with a
next() method that allows us to change the cursor
position, and a getItem() method that allows us to
retrieve the item at the current position. The result of
getItem() is an XQItem object with
methods that allow us to determine the item type and convert the
item into a suitable Java object or value.
One issue with Saxon is that Saxon generally only recognizes its
own implementation of XQJ interfaces.
SaxonXQDataSource is Saxon's XQDataSource
and an XQJ client application has to instantiate a
SaxonXQDataSource directly. There is no factory class,
and hence an application that does not want compile-time references
to the Saxon XQJ implementation needs to instantiate this class
dynamically using the reflection API (e.g., with a call to
Class.newInstance()). We will look at the steps in executing an
XQuery using XQJ in the code listing below.
String content = null;
XQDataSource ds = new SaxonXQDataSource();
/* or
InitialContext ctx = new InitialContext();
XQDataSource ds = (XQDataSource) ctx.lookup("java:compe:/env/ddxq/ds");
*/
XQConnection conn = ds.getConnection();
XQPreparedExpression exp =
conn.prepareExpression("doc(\"books.xml\")/BOOKLIST/BOOKS/ITEM/TITLE");
XQResultSequence result = exp.executeQuery();
while (result.next()) {
content = result.getItemAsString();
}
We can't have a detailed discussion on the power of XQuery in an article like this; nor we will attempt to solve complex problems here. Instead, this section will introduce simple expressions and then hook them into XQJ to get the queries evaluated. For any detailed discussion on XQuery expressions the readers, are directed to the books XQuery from the Experts and XQuery: Rough Cuts Version. For any discussions in this section, we will use this sample XML data. Let us now look at few expressions and understand what they will fetch:
/BOOKLIST/BOOKS/ITEM/TITLE: Retrieves the titles
of all the book items in the book list./BOOKLIST/BOOKS/ITEM/TITLE)[2]: Retrieves the
title of the second book item in the book list./BOOKLIST/BOOKS/ITEM[TITLE="The Big Over
Easy"]/AUTHOR: Retrieves the author of the book item with
title "The Big Over Easy" in the book list./BOOKLIST/BOOKS/ITEM/@CAT: Retrieves all the
available categories of book items in the book list./BOOKLIST/BOOKS/ITEM[2]/*: Retrieves all the
elements of the second book item in the book list.BOOKLIST/BOOKS/ITEM/*/@*: Retrieves any
attributes of any elements of book items in the book list.To get the sample code working, download the attached
XQueryForJavaEnablerForSOASrc.zip file (see the Resources section for sample code), and unzip it
to some folder in your local file system. Go to the
PathExpressions directory, and type ant run,
which will print out the results of the above XQuery into the
console, as shown in Figure 1.
(Click thumbnail to view full-sized image)
How many times in your life have you converted objects into XML format and vice versa? We have been doing this for many years, and continue today. Most of the time, the business tier exposes data as XML, either in SOAP format or in some other ad hoc XML format, in which case we don't care about the interoperability of our data with some client that is consuming the data. Needless to say, we have been also using relational databases for many years as our safe, transaction-aware, and concurrently-accessible data stores. Hmm. Now we need an object-relational (OR) mapping tool (like Hibernate, Toplink, etc.) to convert our relational data to Java objects, and then some Java-XML binding tools (like Castor, XML Beans, etc.) to convert Java objects to XML and vice versa. The full dynamic is shown in Figure 2:
Figure 2. Data transformation dynamics (Click on thumbnail to
view full-sized image)
At least some of you should be raising your eyebrows now about the relevance of the intermediate conversion of data to "objects." We will list out our usual justifications here:
So far so good. Now, we remember at least one requirement to build a Data Access Layer (DAL) over a relational database. The DAL in this case study has to function as the data provider for an Enterprise Service Bus (ESB) through which all kinds of clients (data consumers) will route their requests (queries). Since the normalized message format within the ESB is XML and no major processing needs to be done at the provider side, a feasible architecture is to make the data access layer as a thin, shim layer with minimum overhead. This layer will then retrieves data from the database and convert them into XML format. We first looked into ways by which SQL can be used to do this. SQL is a query language for relational data. Relational databases usually host unordered sets of "flat" rows, and SQL is best to operate on this data model. On the contrary, XML data structures contain hierarchical nodes and XQuery is best for this data structure. Thus SQL as such cannot be directly used over XML data; nor is XQuery meant to be directly acting over relational data.
Of course there are more than one way to do XML-relational
transformation, but let us look at how we can use Saxon SQL
extensions for the same. Using Saxon SQL extensions, we can enhance
the capability of the processor to access SQL databases. The first
step in doing this is to define a namespace prefix (for example,
sql) in the extension-element-prefixes attribute of the
xsl:stylesheet element, and then to map this prefix to
namespace URI that ends in
net.sf.saxon.sql.SQLElementFactory. Now we have seven
new stylesheet elements at our disposal to do SQL operations:
sql:connectsql:querysql:insertsql:updatesql:deletesql:columnsql:closeThe sql:connect element will returns a database connection
as a value, specifically a value of the type external object, which
can be referred to using the type
java:java.sql.Connection.
<xsl:param name="driver" select="'oracle.jdbc.driver.OracleDriver'"/>
<xsl:param name="database" select="'jdbc:oracle:thin:@127.0.0.1:1521:orcl'"/>
<xsl:param name="user">scott</xsl:param>
<xsl:param name="password">tiger</xsl:param>
<xsl:variable name="connection" as="java:java.sql.Connection"
xmlns:java="http://saxon.sf.net/java-type">
<sql:connect driver="{$driver}" database="{$database}"
user="{$user}" password="{$password}"
xsl:extension-element-prefixes="sql"/>
</xsl:variable>
Once the connection is retrieved, we can now do CRUD (create, read, update, delete) operations in the SQL database.
Figure 3. Customer order LineItem DB schema
Let us first look at how we can insert a few rows into the table shown in Figure 3. We will use the customer_insert.xml to demonstrate our insert operations. The simple Java code to do the insert operation is shown below:
import net.sf.saxon.Transform;
public class CustomerOrder{
public void insert(){
Transform transformData = new Transform();
String[] args = {"customer_insert.xml",
"customer_insertupdate.xsl"};
transformData.doTransform(args, null);
}
}
The magic lies in the sql:insert tag in customer_insertupdate.xsl Here, we first check
whether the customer is already present in the database; if they are not we
do an insert operation:
<xsl:variable name="customerid" select="CUSTOMERID"/>
<xsl:variable name="customer-table">
<sql:query connection="$connection" table="customer"
where="CUSTOMERID='{$customerid}'" column="*"
row-tag="CUSTOMERORDER" column-tag="col"/>
</xsl:variable>
<xsl:if test="count($customer-table//CUSTOMERORDER) = 0">
<sql:insert table="customer" connection="$connection">
<sql:column name="CUSTOMERID" select="CUSTOMERID"/>
<sql:column name="CUSTOMERLASTNAME" select="CUSTOMERLASTNAME"/>
<sql:column name="CUSTOMERFIRSTNAME" select="CUSTOMERFIRSTNAME"/>
<sql:column name="CUSTOMEREMAIL" select="CUSTOMEREMAIL"/>
</sql:insert>
</xsl:if>
The CustomerOrderLineItem folder in the attached .zip file
contains the code for this. Make sure to create the database tables
and make any relevant changes in the .xsl files to suit your
database settings (driver, URL, username, and password). Then
execute ant insert, which will create rows in relevant
tables in the database as shown in Figure 4.
Figure 4. XQuery-inserted data (Click on thumbnail to view
full-sized image)
For read we make use of sql:query. We use customer_query.xml
with the following XML content to pass the required query
parameters to customer_query.xsl.
<?xml version="1.0"?>
<CUSTOMERORDERS>
<CUSTOMERORDER>
<CUSTORDER>
<ORDERID>456</ORDERID>
</CUSTORDER>
</CUSTOMERORDER>
</CUSTOMERORDERS>
The aim here is to retrieve all the order items for the
customer with ID 456. Obviously, when you need to use these
techniques in your own applications, you may have to dynamically
generate those XML documents with query parameters instead of using
static XML files. The customer_query.xsl is having following two
template match blocks:
<xsl:template match="CUSTOMERORDERS">
<xsl:message>customer_query.xsl :
Connecting to <xsl:value-of select="$database"/>...</xsl:message>
<xsl:message>customer_query.xsl : query records....</xsl:message>
<xsl:apply-templates select="CUSTOMERORDER" mode="Query"/>
<sql:close connection="$connection"/>
</xsl:template>
<xsl:template match="CUSTOMERORDER" mode="Query">
<xsl:variable name="orderid" select="CUSTORDER/ORDERID"/>
<xsl:variable name="orderitem-table">
<sql:query connection="$connection" table="ORDERITEM" where="ORDERID=
'{$orderid}'" column="*" row-tag="ORDERITEM" column-tag="col"/>
</xsl:variable>
<xsl:message>There are now <xsl:value-of
select="count($orderitem-table//ORDERITEM)"/> orderitems.</xsl:message>
<ORDER>
<xsl:copy-of select="$orderitem-table"/>
</ORDER>
<sql:close connection="$connection"/>
</xsl:template>
The main match block is CUSTOMERORDER. Here we query the
ORDERITEM table and select all columns matching the query
parameter. We then display them to the console. Executing ant
query will demonstrate this as shown in Figure 5.
Figure 5. XQuery data (Click on thumbnail to view full-sized
image)
The ant update command will use customer_update.xml to demonstrate the
update operations. The notable change here is that the customer
email has been changed from
<CUSTOMEREMAIL>sowmya.hubert<{at}>ustri.com</CUSTOMEREMAIL>
in customer_insert.xml to
<CUSTOMEREMAIL>hubertsowmya<{at}>yahoo.co.in</CUSTOMEREMAIL>
in customer_update.xml. As earlier, we first check whether
the customer is already present in the database; if the customer
exists, we do an update instead of insert, using
sql:update. Figure 6 shows the result.
<xsl:if test="count($customer-table//CUSTOMERORDER) > 0">
<sql:update table="customer" connection="$connection" where="CUSTOMERID='{$customerid}'">
<sql:column name="CUSTOMERLASTNAME" select="CUSTOMERLASTNAME"/>
<sql:column name="CUSTOMERFIRSTNAME" select="CUSTOMERFIRSTNAME"/>
<sql:column name="CUSTOMEREMAIL" select="CUSTOMEREMAIL"/>
</sql:update>
</xsl:if>
Figure 6. XQuery updated data (Click on thumbnail to view
full-sized image)
Again, since we don't want to complicate the examples, we'll do
a simple table delete only using sql:delete. For this,
we just pass an empty XML DELETE element as a command
in customer_delete.xml, as shown below:
<?xml version="1.0"?>
<CUSTOMERORDERS>
<DELETE></DELETE>
</CUSTOMERORDERS>
Now, customer_delete.xsl will contain the required
sql:delete command to empty the tables one by one,
which is shown below:
<xsl:template match="CUSTOMERORDERS">
<xsl:apply-templates select="DELETE" />
</xsl:template>
<xsl:template match="DELETE">
<sql:delete table="address" connection="$connection" />
<sql:delete table="orderitem" connection="$connection" />
<sql:delete table="custorder" connection="$connection" />
<sql:delete table="customer" connection="$connection" />
</xsl:template>
It is also possible to hook into external .xq files like customerorder.xq. Here, we will use an XML document (customer_insert.xml) as the data source. The XQuery is listed below:
xquery version "1.0";
declare copy-namespaces no-preserve, inherit;
declare variable $custid as xs:integer external;
declare variable $ordid as xs:integer external;
for $customerorder in //CUSTOMERORDERS/CUSTOMERORDER,
$customer in $customerorder/CUSTOMER,
$order in $customerorder/CUSTORDER,
$orderitem in $order/ORDERITEM
where $customer/CUSTOMERID = $custid and $order/ORDERID = $ordid
order by string-length($customer/CUSTOMERID) , string-length($order/ORDERID)
return <customerorder>
<customer>
{ $customer/CUSTOMERID }
{ $customer/CUSTOMERFIRSTNAME }
{ $customer/CUSTOMERLASTNAME }
{ $customer/CUSTOMEREMAIL }
</customer>
<order>
{ $order/ORDERID }
{ $order/ORDERDATE }
<orderitem>
{ $orderitem/ITEMID }
{ $orderitem/NUMBER }
{ $orderitem/INSTRUCTIONS }
</orderitem>
</order>
</customerorder>
We have to pass customer ID and order ID as parameters to query the details. This we do from our Java code as follows:
final Configuration config = new Configuration();
final StaticQueryContext sqc = new StaticQueryContext(config);
final XQueryExpression exp = sqc.compileQuery(
new FileReader("customerorder.xq"));
final DynamicQueryContext dynamicContext = new DynamicQueryContext(config);
Properties props = new Properties();
dynamicContext.setContextItem(sqc.buildDocument(
new StreamSource("customer_insert.xml")));
dynamicContext.setParameter("custid", new Long(452));
dynamicContext.setParameter("ordid", new Long(461));
final SequenceIterator iter = exp.iterator(dynamicContext);
props.setProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
props.setProperty(OutputKeys.INDENT, "yes");
QueryResult.serializeSequence(iter, config, System.out, props);
Typing ant queryXQ will run the sample and Figure 7 shows the
query results.
Figure 7. Query using external XQ file (Click on thumbnail to
view full-sized image)
Going by our case-study objective, we've now realized XML generation from a relational store and we hope you will agree that we haven't written much Java code for this. Of course, we have XSLT code, but it only increases the system flexibility since we are no longer constrained by the specifics of relational schema. If the table schema changes, it is just a matter of updating the respective XSLT files.
The XML data generated here is arbitrary, but we can leverage XML Schema to enable B2B participants to express shared vocabularies and allow machines to carry out rules made by people.
The next step is to expose XML data for consumption. We will not go into further detail here since this is outside the scope of this article. Still, there are multiple options available as below (please note that this list is not exhaustive):
This article introduced the concepts of XQuery and XQJ. Accepting the fact that we're ignoring some significant issues like benchmarking performance, we have a working data access layer. As we have already mentioned, there are multiple products available in both the open source and commercial XML worlds. All of them, among other things, are trying to ease XML operations, especially bridging the XML and relational worlds. Even though the above case study implementation is based on direct XQuery, XQJ is supposed to be even more powerful and brings new promises especially when it comes to performing operations against data stores (look at Jonathan Bruce and Jonathan Robie's XQJ tutorial for more information). DataDirect XQuery is also worth mentioning here, and the quest for such frameworks is on the rise since we can rarely find an enterprise class application without the need to work on XML data. The promise of some of these new generation frameworks is also to do transactional ACID operations on XML-based databases. These transactions can even be a part of a bigger, global transaction. If so, developers can cheer up, since lot of code would get reduced; code that would have been otherwise transforming relational data to objects to XML and the reverse.
|
|