Skip to main content

Suggestion

19 replies [Last post]
mohitanchlia
Offline
Joined: 2006-04-24
Points: 0

I have a simple question, for a given xml below:

---

one

two

three

---
Parse the xml and write out every node of book except when title is "two". So after app. spits out xml it should look like:
---

one

three

---
What would be the best way of doing this (SAX/DOM)? Any links to examples would help.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
joehw
Offline
Joined: 2004-12-15
Points: 0

If the xml file is small, using DOM is easier since the DOM parser returns a complete xml document in a tree structure. You can go though the nodes and remove the one that contains the text, in this case, "two". The following statement would do the trick:
bookNode.getParentNode().removeChild(bookNode);

After that, you may use a transformer to write the DOM document back into a file:
xformer.transform(new DOMSource(doc), new StreamResult(file));

If the xml file is very big, using StAX would be more efficient. Take a look at the event reader/writer. It may be necessary to cache a few events before the Characters event "two" is found, and so all the events starting from the "book" start element to the end element may be eliminated.

Hope that helps.
Joe

mohitanchlia
Offline
Joined: 2006-04-24
Points: 0

Using DOM how can I write entire tree structure after removing "two". Reason I ask is because I know there is element "title" but there could be more children or children's children under element "book". In DOM, is there a way to write complete XML without knowing the tree structure ?

joehw
Offline
Joined: 2004-12-15
Points: 0

You can use the transformer API. A transformer can be use to process XML from a variety of sources and write the transformation output to a variety of results. In this case, your source is a DOMSource and the "result" a StreamResult. The result of performing the transformation is effectively writing the whole Document out to a file:

String outputFile = "out_books.xml";
Source source = new DOMSource(document);
Result result = new StreamResult(new File(outputFile));

Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(source, result);

Hope that works for you.

Joe

Message was edited by: joehw

mohitanchlia
Offline
Joined: 2006-04-24
Points: 0

As you pointed our to get selected nodes I need to use XSLT. Since I am new to XSLT I thought of first trying out an example from J2EETutorial.pdf. I am getting exception, here is what I did:

article.xml
---



Title of my (Docbook) article


Title of Section 1
This is a paragraph


----
article.xsl
--








<xsl:apply-templates/>








---
Transform.java
--
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import javax.xml.transform.dom.*;
import org.w3c.dom.*;

// using DOM
public class Transform {

public static void main(String ...argv){
try {
File f = new File(argv[0]);
if (!f.exists()){ System.out.println("File doesn't exist\n"); return;}
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(f);
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(new StreamSource(argv[1]));
transformer.transform(new DOMSource(doc), new StreamResult(new FileOutputStream("o.
out")));
} catch (Exception e) { e.printStackTrace();}

}
}
----
Exception:
$ java Transform article.xml article.xsl
[Fatal Error] article.xml:1:20: A pseudo attribute name is expected.
org.xml.sax.SAXParseException: A pseudo attribute name is expected.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:264)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:292)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:172)
at Transform.main(Transform.java:18)
----

joehw
Offline
Joined: 2004-12-15
Points: 0

Try correcting in article.xml to .

mohitanchlia
Offline
Joined: 2006-04-24
Points: 0

Thanks. that works. Now I am trying to do what I initially asked for in my original post:

Input XML
---



Title of my (Docbook) article
Body of my (Docbook) article


Title of my (Docbook1) article
Body of my (Docbook1) article


---
If Title has Docbook1 then don't print the complete Node. Output should look like:



Title of my (Docbook) article
Body of my (Docbook) article


---
I don't see a way this could be done using XSLT. Because if I define Xpath /ArtHeader/Title then there is no way to refer it back to ArtHeader. How do I include or exclude complete Parent node ?

joehw
Offline
Joined: 2004-12-15
Points: 0

This expression will do the trick for you:

--Joe

mohitanchlia
Offline
Joined: 2006-04-24
Points: 0

Thanks this looks like would work. But, when I do the following I still get the text printed even though it doesn't match the criteria:







<xsl:apply-templates/>








----
I get:

Title of my (Docbook) article


Title of Section 1
This is a paragraph

<br /> Title of my (Docbook1) article<br />

Title of Section 1
This is a paragraph


---
I even tried to remove all the xsl tags, but still I get the text with no tags.

joehw
Offline
Joined: 2004-12-15
Points: 0

I'm confused about the xml/xsl files you're using -- this xsl is a little different from the one in your last post. I can't guess that's exactly your xml file.

Anyway, the expression in my last post would virtually remove any ArtHeader that has a child element "Title" with a text containing Docbook1.

mohitanchlia
Offline
Joined: 2006-04-24
Points: 0

I changed the xsl to add the expression that you suggested in your previous post. That's the only change I made. Here is what I did:

1. I included the expression that you suggested and ran the app. When I run the app I get:
----

Title of my (Docbook) article


Title of Section 1
This is a paragraph

<br /> Title of my (Docbook1) article<br />

Title of Section 1
This is a paragraph


---
Note: "Title of my (Docbook) article" still shows in the output but without "" tags. I think it's being excluded, but still it's being printed somewhere else by transformer by default.</p> <p>2. I changes xsl to:<br /> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"><br /> <xsl:output method="xml"/><br /> </xsl:stylesheet></p> <p>Even with no template functions I get all the text values. So I think even though the expression in bullet 1. above excludes it from being matched, it's still being printed. How can I eliminate it from being printed ?</p> <p>Please let me know if I am not clear in what I am doing.</p> </div> </div> </div> <div class="forum-post-footer clear-block"> <div class="forum-post-links"> <ul class="links"><li class="comment_forbidden"><span><a href="/user/login?destination=comment%2Freply%2F675941%23comment-form">Login</a> or <a href="/user/register?destination=comment%2Freply%2F675941%23comment-form">register</a> to post comments</span></li> </ul> </div> </div> </div> <div class="indented"><a id="comment-713995"></a> <div id="comment-713995" class="forum-post commentodd clear-block"> <div class="post-info clear-block"> <div class="posted-on"> <div class="post-title"> Re: Suggestion </div> March 26, 2008 - 20:34 </div> <span class="post-num"><a href="/node/675941#comment-713995" class="active">#11</a> </span> </div> <div class="forum-post-wrapper"> <div class="forum-post-panel-sub"> <div class="author-pane"> <div class="author-pane-inner"> <div class="author-pane-name-status author-pane-section"> <div class="author-pane-line author-name"> joehw </div> <!-- /user-picture --> <div class="author-pane-line author-pane-online"> <span class="author-pane-online-icon"></span> <span class="author-pane-online-status">Offline</span> </div> </div> <div class="author-pane-stats author-pane-section"> <div class="author-pane-line author-joined"> <span class="author-pane-label">Joined:</span> 2004-12-15 </div> <div class="author-pane-line author-points"> <span class="author-pane-label">Points</span>: 0 </div> </div> <div class="author-pane-admin author-pane-section"> </div> <div class="author-pane-contact author-pane-section"> </div> </div> </div> </div> <div class="forum-post-panel-main clear-block"> <div class="content"> <p>I don't have the xml file you're presenting now (with SECT and etc.) Using the original xml you provided in earlier post, I've created a xsl file that would transform:</p> <p><?xml version="1.0"?><br /> <Article><br /> <ArtHeader><br /> <Title>Title of my (Docbook) article
Body of my (Docbook) article


Title of my (Docbook1) article
Body of my (Docbook1) article

to:



Title of my (Docbook) article
Body of my (Docbook) article

Here's the xsl:













<xsl:apply-templates/>




Joe

mohitanchlia
Offline
Joined: 2006-04-24
Points: 0

Thanks a lot. That works. But for my understanding I am trying to understand what I was doing wrong. Why was I still getting those nodes inspite of having match condition of "Docbooc" Why transformer was not able to exclude the tags. I'll paste everything that I am doing:

1, xml file:




Title of my (Docbook) article


Title of Section 1
This is a paragraph


Title of my (Docbook1) article


Title of Section 1
This is a paragraph

2. xsl file







<xsl:apply-templates/>







3. Output:

Title of my (Docbook) article


Title of Section 1
This is a paragraph

<br /> Title of my (Docbook1) article<br />

Title of Section 1
This is a paragraph

P.S. Not that "Title of my (Docbook) article" is still being generated (but without tags), which really means there is something that I don't understand about the transformer.

joehw
Offline
Joined: 2004-12-15
Points: 0

That template rule basically says find a match and output text. Try remove everything but:


You'll see what I means.

In your xsl, you specified:

<xsl:apply-templates/>

So when there's a match for the above, you tell the transformer to output text in the TITLE element. So you'd get:
<br /> Title of my (Docbook1) article<br />

But you didn't specify that for ArtHeaders which contain Docbook. That's why it just output the text of that element.

Joe

mohitanchlia
Offline
Joined: 2006-04-24
Points: 0

Thanks. Correct me if I am wrong, so in your xml what it really means is that if ArtHeader contains Docbook then print ArtHeader, Title, Body elements even though they are not nested. So it it basically is kind of if, else if type structure. But why isn't it nested. Something like:






<xsl:apply-templates/>





Above XSLT gives more meaning, I read this as, if 'Docbook1' is the text then print ArtHeader, Title ...tags. I remember reading somewhere that nested templates are not allowed. Sorry for all the questions. I am just trying to clear the basics. I went through the J2EE tutorial but it doesn't have this kind of information. I guess you get only through experience.

joehw
Offline
Joined: 2004-12-15
Points: 0

The way XSLT works is that it reads the xml file and build a tree view. It then goes through the nodes in the tree and applies the template that matches the best (or more specific). So templates are like rules that XSLT uses to process the current context. There are no concept of nested templates as you pointed out. Think of it as rules, as XSLT processes your xml file, you need to tell it the rules you want it to employ for the ones you're interested in.

However, you could use a recursive template to do what you wanted, sth like:







which make a copy of the current node, copies attributes if any and then processes all of the children. Since the xsl:copy element only handles the node itselt, you would also need a template to output the text nodes:

. So the complete templates look like this:











Joe

mohitanchlia
Offline
Joined: 2006-04-24
Points: 0

Thanks a lot for going extra mile to explain the recursive template. Can I ask you 2 more questions:

1. - Does it just copies or does it copies the node and print them also like apply-templates. Reason I am asking is because in "" there is no apply-templates. Sorry if it's a stupid question.
2. Is there a performance impact of using copy. And why would copy be used over the traditional way of applying rules.

Thanks again

joehw
Offline
Joined: 2004-12-15
Points: 0

You're very welcome.

What xsl:copy does is to create a copy of the current node. The template you mentioned above selects any text nodes. Since a text node does not have any children, we would simply make a copy of its text.

Since xsl:copy basically makes a copy of the current node which has already been read by the transformer, there should be no performance penalty. xsl:copy eliminates the need of hard-coded element names as we did before in this discussion thread. So it could be used in a more general way to make copies of the original xml elements with no additional (hand-coded) text injected.

From the sample code above, we already know both would work. We just need to decide which method suits the application requirement better.

Hope that helps.
Joe

mohitanchlia
Offline
Joined: 2006-04-24
Points: 0

Thanks so much. I guess that's it for now.

joehw
Offline
Joined: 2004-12-15
Points: 0

You're welcome. Let us know how your application goes.