Tuesday, May 11, 2010

XSLT, XQuery, XPath tooling

I'm always on the lookout for nice tooling. Preferably small and free (as in beer, though I like the other free too). Not because I'm cheap (though I am), but because that's the sort of tools that I can use without having to go through the bureaucratic hell associated with getting the boss or customer to pay for software licenses.

Here are two of my favorite free tools for working with XSLT, XQuery and XPath.

Architag XRay XML Editor. Technically a general purpose XML editor, but it really shines when you're writing XSLTs. As you're writing your XSLT, you can set an input document, and it'll continually evaluate your XSLT against that document. That way, the second you make a change to the XSLT, you'll get to see the effect it has. Awesome!

Besides this, it continually informs you of invalidities in your XML in a very unobtrusive way (no popups or any of that nonsense), and will automatically perform Schema validations if you have the right schema open. (No need to explicitly associate documents with schemas.)

Sadly, it only supports XSLT 1.0. Still, that's the version I usually have to use at work, so XRay still comes in handy very often.

Kernow. Kernow's stated goal is "to make it faster and easier to repeatedly run transforms using Saxon." It does so admirably, but it also has very convenient sandboxes for performing XSLT 2.0 or XQuery transforms, and XML Schema or Schematron validations. Maybe not yet as convenient as XRay, but still quite nice. Definitely recommended!

Sunday, May 09, 2010

On WebLogic deployment plans

Last week I was struggling to get an MQ adapter configured using a WebLogic deployment plan. I must have been especially low on brain power that day, as they're really not all that complex. Still, they are tricky if you've never seen them before (as I hadn't), so here are some pointers to help you out.

These deployment plans turn out to be part of an implementation of JSR-88. Don't try to read that spec unless you feel significantly more masochistic than I do though. Suffice it to say that - under WebLogic, at least - you can use them to provide server-specific configuration information to an application.

You may want to use a plan if your config is included in an jar/war/ear/etc. file; for instance in an included web.xml. Having config in such a place can be a pain, as you typically don't want to change those *ar files after QA has signed off on them. A deployment plan lets you work around this problem by telling the server to pretend that the config was actually at bit (or entirely) different from what those files in the *ar said.

The way this actually works is somewhat unintuitive, but it does work. Under WebLogic, you use an XML document that conforms to this schema. (I can't comment on other platforms, as I have only tried this on WebLogic.)

The most interesting part of this schema is the module-override element. With this element, we can specify what descriptors to update, and in what modules (rar, jar, etc.). For instance:

<module-override>
  <module-name>MQSeriesAdapter.rar</module-name>
  <module-type>rar</module-type>
  <module-descriptor external="false">
    <root-element>weblogic-connector</root-element>
    <uri>META-INF/weblogic-ra.xml</uri>
    <variable-assignment>
      ...
    </variable-assignment>
  </module-descriptor>
</module-override>

This config will update the META-INF/weblogic-ra.xml descriptor in MQSeriesAdapter.rar. The actual fields being updated are specified in the variable-assignment block. (You can have multiple of those, by the way.)

Those variable-assigment blocks are where things get confusing. Before I explain, allow me to show you their basic form.

<variable-assignment>
  <name>name</name>
  <xpath>pseudoXPath</xpath>
  <operation>add</operation> <-- Can also be remove or replace. -->
</variable-assignment>

The idea is that the data from the "name" element gets assigned to the location in the descriptor (i.e. META-INF/weblogic-ra in the example) indicated with the "xpath" element. The "operation" element specifies if this value must be added, removed or replaced. (The default is "add", and in this case, any XML elements that are specified in the xpath tag, but missing in the target document, will be added.)

This is where things start getting confusing. First of all, you can't actually specify the value you want to assign in the "name" element. There is an element of indirection here in that you must assign this value to a variable first, and then you can refer to this variable here. Sound confusing? I sure thought it was.

Here's an example of how it works:

<variable-definition>
  <variable>
    <name>myVar</name>
    <value>niftyValue</value>
  </variable>
</variable-definition>
<module-override>
  <module-name>MQSeriesAdapter.rar</module-name>
  <module-type>rar</module-type>
  <module-descriptor external="false">
    <root-element>weblogic-connector</root-element>
    <uri>META-INF/weblogic-ra.xml</uri>
    <variable-assignment>
      <name>myVar</name>
      <xpath>pseudoXPath</xpath>
    </variable-assignment>
  </module-descriptor>
</module-override>

In the example we assign the value "niftyValue" via the variable myVar. Ridiculously circumlocutious (word of the day), but it works.

There's something even more confusing though. Have you noticed how I put the value "pseudoXPath" in every "xpath" element so far? This is because despite what the name suggests, this element doesn't actually accept proper XPath!

JSR-88 restricts the XPaths allowed to just those who contain only ".", "..", "/", and tag names. I suppose it makes sense to restrict the allowed expressions a bit, considering that the expression can be used to create a new element. (In which case the plan processor must manipulate the XML until the XPath expression evaluates to a node. Could be tricky for complex XPaths.) Still, as it's now, it's probably a bit too restrictive.

Fortunately, WebLogic seems to accept an enhanced syntax, but this makes it deviate even more from the XPath standard. I haven't yet found proper documentation of the syntax, but I know that it accepts XPath predicates with equality comparisons. Unlike proper XPath though, they must be preceded by a slash ("/"). So you can have a construction like
book/[title="Lord of the Rings"]
to select only books with a title element with value "Lord of the Rings".

Don't bother with namespaces in these XPaths, by the way. WebLogic seems to ignore namespaces when processing them, which I guess is usually pretty convenient. Yet another way in which it differs from true XPath, however.

Putting it all together, you could end up with something like this:

<variable-definition>
  <variable>
    <name>port</name>
    <value>1414</value>
  </variable>
</variable-definition>
<module-override>
  <module-name>MQSeriesAdapter.rar</module-name>
  <module-type>rar</module-type>
  <module-descriptor external="false">
    <root-element>weblogic-connector</root-element>
    <uri>META-INF/weblogic-ra.xml</uri>
    <variable-assignment>
      <name>port</name>
      <xpath>/weblogic-connector/outbound-resource-adapter/connection-definition-group/[connection-factory-interface="javax.resource.cci.ConnectionFactory"]/connection-instance/[jndi-name="eisMQ/MQAdapter"]/connection-properties/properties/property/[name="portNumber"]/value</xpath>
    </variable-assignment>
  </module-descriptor>
</module-override>

By now you should have a pretty good idea of how these plans work. If you want to know how to deploy them, then you could do worse than to check out this helpful blog.

Wednesday, May 05, 2010

Humble Indie Bundle

Wow. Five indie classics for literally whatever you want to pay for them. And you get to decide how much of whatever you want to pay goes to charity, and how much to the developers. Nothing to any middle-men except for the payment processor. (Your choice of Amazon, Google, or PayPal.) Also, no DRM, and all titles are available for Windows, linux, and Mac.

If ever you said that you wouldn't pirate if only prices weren't so high, if only developers/distributors weren't so evil, if only your favorite platform was supported, or if only there wasn't any DRM, then here's your chance to show that you meant it!

Check out the Humble Indie Bundle.

P.S. Also check it out if you never said anything like those things above. :-)

Saturday, May 01, 2010

XSLT copies and sequences

And yet another post! Is it a trend or an aberration? Only time will tell.

In a previous post, I offered some solutions for problems that arise from having to deal with nodes that are functionally identical, yet still different. Sort of how two cars can be absolutely identical in terms of brand, model, year, color, etc. and yet still remain two distinct cars. (Just try to argue with the tax man that those two identical cars are actually one and the same.)

This problem can arise very easily when you deal with variables in XSLT. Consider the following:

<xsl:variable name="foo">
  <a/>
<xsl:variable>

<xsl:variable name="bar">
  <a/>
<xsl:variable>

If you compare the nodes in these variables with "$foo/a is $bar/a", the result will be "false", indicating that while these nodes may look awfully identical, XPath doesn't consider them to be the same node. And XPath does have a point, because these <a> elements will be distinct copies in memory.

In fact, you may not realize just how many copies your XSLTs are making. It's not just these hard coded bits of XML in variables, it's also any time you use an xsl:copy, xsl:copy-of, or xsl:element, as well as when you use included content in an xsl:variable, xsl:param, or xsl:with-param without a type declaration. (And some other, less common constructs as well.)

Not only can this be most inconvenient (for instance when you want to use set operators), but you may also be wasting machine resources in your performance critical application.

Fortunately, it's relatively easy to reduce the number of copies. You just have to know the tricks of the trade. And those are just what I'm going to tell you right now.

Don't duplicate hard coded XML content
Whenever you have hard coded XML content, this will result in nodes being created. If you create the same XML content in multiple places (such as we did in the foo and bar variables earlier), those will be duplicates. We could have avoided this by just copying foo to bar, like so:

<xsl:variable name="bar" select="$foo">

Now XSLT will create a new node for foo, but not for bar, as the latter will simply point to the same node that was already created for foo.

Replace xsl:copy-of with xsl:sequence
Unlike xsl:copy-of, xsl:sequence can return existing nodes. And since everything is a sequence anyway in XSLT 2.0 (including the result of xsl:copy-of), there's really no reason to not just use xsl:sequence instead of xsl:copy-of. The same goes for xsl:copy's without children, but those tend to be uncommon.

So rather than this:
<xsl:copy-of select="//baz">

Use this:
<xsl:sequence select="//baz">

Simple!

Make sure xsl:variable, xsl:param, xsl:with-param have either a "select" or an "as" attribute (or both)
The elements xsl:variable, xsl:param, xsl:with-param always make a copy, unless you specify either the "select" attribute, or the "as" attribute (or both).

Whenever possible, use the select attribute, as in those cases you'll never get a copy. If that's not possible, and you really do have to use the element content, you can specify the variable's type with the "as" attribute. In such a case, XSLT will not force the copy to be make. However, if you use hard coded XML, or an xsl:copy-of in the variable content, then those'll still result in copies!

So this is good:
<xsl:variable name="baz" select="//bazElem">

As is:
<xsl:variable name="baz" as="node()">
  <xsl:sequence select="//bazElem"/>
</xsl:variable>

But this is going to create a new node in any case:
<xsl:variable name="baz" as="node()">
  <bazElem/>
</xsl:variable>

And any nodes here will also be copies:
<xsl:variable name="baz" as="node()">
  <xsl:copy-of select="//bazElem"/>
</xsl:variable>

Pro-tip: if you're unsure about the type of your variable, just specify "item()*". That'll allow any sort of sequence.

And that's all you need to know to get rid of most of those unnecessary copies. :-)