Tuesday, June 01, 2010

Making XML Schema less of a pain by parsing text with XSLT

Allow me to get to the point immediately. XML Schema can be a royal pain.

Don't get me wrong; I'm glad it exists. It's powerful, serves a clear purpose, is well-supported, yadda, yadda, yadda. Unfortunately, it's also quite complex, has a lot of pitfalls (elementFormDefault!), and is terribly verbose.

For instance, would you rather have this:


http://blog.jwbroek.com/nifty-namespace
thingamabob        ; This is a comment.
  foo xsd:string
  bar xsd:boolean  ; Set to true to enable bar.
  baz
    alice
      count xsd:integer?  ; Count is optional.
      description  ; Type defaults to string.
    bobs           ; List of 0 or more bobs.
      bob xsd:boolean*
    charles +      ; At least one charles.


Or this:


<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:tns="http://blog.jwbroek.com/nifty-namespace"
            xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            elementFormDefault="qualified"
            attributeFormDefault="unqualified"
            targetNamespace="http://blog.jwbroek.com/nifty-namespace">
   <xsd:element name="thingamabob">
      <xsd:annotation>
         <xsd:documentation>This is a comment.</xsd:documentation>
      </xsd:annotation>
      <xsd:complexType>
         <xsd:sequence>
            <xsd:element name="foo" type="xsd:string"/>
            <xsd:element name="bar" type="xsd:boolean">
               <xsd:annotation>
                  <xsd:documentation>Set to true to enable bar.</xsd:documentation>
               </xsd:annotation>
            </xsd:element>
            <xsd:element name="baz">
               <xsd:complexType>
                  <xsd:sequence>
                     <xsd:element name="alice">
                        <xsd:complexType>
                           <xsd:sequence>
                              <xsd:element name="count" type="xsd:integer" minOccurs="0">
                                 <xsd:annotation>
                                    <xsd:documentation>Count is optional.</xsd:documentation>
                                 </xsd:annotation>
                              </xsd:element>
                              <xsd:element name="description" type="xsd:string">
                                 <xsd:annotation>
                                    <xsd:documentation>Type defaults to string.</xsd:documentation>
                                 </xsd:annotation>
                              </xsd:element>
                           </xsd:sequence>
                        </xsd:complexType>
                     </xsd:element>
                     <xsd:element name="bobs">
                        <xsd:annotation>
                           <xsd:documentation>List of 0 or more bobs.</xsd:documentation>
                        </xsd:annotation>
                        <xsd:complexType>
                           <xsd:sequence>
                              <xsd:element name="bob" type="xsd:boolean" minOccurs="0" maxOccurs="unbounded"/>
                           </xsd:sequence>
                        </xsd:complexType>
                     </xsd:element>
                     <xsd:element name="charles" type="xsd:string" maxOccurs="unbounded">
                        <xsd:annotation>
                           <xsd:documentation>At least one charles.</xsd:documentation>
                        </xsd:annotation>
                     </xsd:element>
                  </xsd:sequence>
               </xsd:complexType>
            </xsd:element>
         </xsd:sequence>
      </xsd:complexType>
   </xsd:element>
</xsd:schema>


Both describe the same XML structure, but if you ask me, the first one is much clearer, and much quicker to write as well.

Granted, we're not using any of the fancy bells and whistles of XML Schema here. However, this would be quite sufficient for most of the things I see Schema being used for.

Wouldn't it be nice if you could actually write your Schema's using the first syntax?

Well, you're in luck: you can! The Schema above was entirely generated by applying the XSLT below to the simple syntax at the top. Hope you'll enjoy it as much as I do. :-)

(Tip: use Kernow to execute the XSLT. Put your input in C:\dev\projects\schemagen\test\input.txt, or override the parameter to use a file of your choice.)


<!--
Copyright 2010 J.W. van den Broek

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jws="http://blog.jwbroek.com/xslt/xsd/functions"
   exclude-result-prefixes="#all">
  
   <xsl:output indent="yes"/>
  
   <!-- Override this to read your file. -->
   <xsl:param name="input-file" select="'file:///C:/dev/projects/schemagen/test/input.txt'"/>
  
   <xsl:template match="/">
      <!-- Sequence of all non-empty lines in the input. -->
      <xsl:variable name="lines" select="tokenize(unparsed-text($input-file),'&#x0D;')[not(matches(.,'^\s*$'))]"/>
     
      <!-- Create the schema. Make the root schema element here, taking the target namespace from the first line of input. -->
      <xsd:schema elementFormDefault="qualified" attributeFormDefault="unqualified" targetNamespace="{$lines[1]}">
         <xsl:namespace name="tns" select="$lines[1]"/>
         <!-- Pass all other lines on the the element-declarations function, which will create the element declarations. -->
         <xsl:sequence select="jws:element-declarations(subsequence($lines,2,count($lines)-1))"/>
      </xsd:schema>
   </xsl:template>
  
   <!-- Create element declarations based on lines of input. -->
   <xsl:function name="jws:element-declarations" as="element()*">
      <xsl:param name="rawLines" as="xsd:string*"/>
     
      <!-- Only continue if we have lines of input remaining. -->
      <xsl:if test="exists($rawLines)">
         <!-- Take the indentation from the first line. We'll create declarations for all elements with this level of indentation. -->
         <!-- We'll recursively create declarations for elements at higher indentation. -->
         <xsl:variable name="curIndent" select="replace($rawLines[1],'^(\s*).+$','$1')"/>
         <!-- Remove the base indentation from all lines. The elements we're going to make declarations for now have no indentation. -->
         <xsl:variable name="lines" select="for $l in $rawLines return substring-after($l, $curIndent)"/>
         <!-- Determine indices for all elements without indentation. We'll use this info to efficiently access the right lines of input. -->
         <xsl:variable name="indicesAtRoot" select="index-of((for $l in $lines return matches($l, '^\i+.*')), true())"/>
         <!-- Contains the root indices, but also the end of input. We'll use this to create subsequences for our recursive calls. -->
         <xsl:variable name="indicesAndBound" select="$indicesAtRoot, count($lines)+1"/>
        
         <!-- Create declarations for all root elements. (And recursively all child elements as well.) -->
         <xsl:for-each select="$indicesAtRoot">
            <!-- Current line of input. -->
            <xsl:variable name="curLine" select="$lines[current()]"/>
            <!-- Name of current element. -->
            <xsl:variable name="name" select="replace($curLine,'^(\i\c*).*$','$1')"/>
            <!-- Type of current element. May be empty, in which case we'll use xsd:string as default later on. -->
            <xsl:variable name="type" select="replace($curLine,'^\i\c*\s*([^?*+;\s]*)?.*$','$1')"/>
            <!-- Occurrence of current element. ?: optional, *: 0 or more, +: 1 or more. Empty is XSD default (1). -->
            <xsl:variable name="occurrence" select="replace($curLine,'^[^?*+;]*(\?|\*|\+).*$','$1')"/>
            <!-- Documentation. Will go into a documentation annotation. -->
            <xsl:variable name="doc" select="replace($curLine,'^[^;]+(;\s*(.*))?$','$2')"/>
            <!-- Current position in the $indicesAtRoot sequence. -->
            <xsl:variable name="pos" select="position()"/>
            <!-- Select the subsequence of all lines that contain children of the current element. -->
            <xsl:variable name="children" select="subsequence($lines, $indicesAndBound[$pos]+1, $indicesAndBound[$pos+1] - $indicesAndBound[$pos] - 1)"/>
           
            <!-- Create the element declaration. -->
            <xsd:element name="{$name}">
               <!-- No type declaration if there are children. Is an inline complex type declaration. -->
               <xsl:if test="empty($children)">
                  <xsl:choose>
                     <!-- On empty type, we default to string. -->
                     <xsl:when test="$type = ''">
                        <xsl:attribute name="type" select="'xsd:string'"/>
                     </xsl:when>
                     <xsl:otherwise>
                        <xsl:attribute name="type" select="$type"/>
                     </xsl:otherwise>
                  </xsl:choose>
               </xsl:if>
              
               <!-- Set minOccurs and maxOccurs. -->
               <xsl:choose>
                  <xsl:when test="$occurrence='?'">
                     <xsl:attribute name="minOccurs" select="'0'"/>
                  </xsl:when>
                  <xsl:when test="$occurrence='*'">
                     <xsl:attribute name="minOccurs" select="'0'"/>
                     <xsl:attribute name="maxOccurs" select="'unbounded'"/>
                  </xsl:when>
                  <xsl:when test="$occurrence='+'">
                     <xsl:attribute name="maxOccurs" select="'unbounded'"/>
                  </xsl:when>
               </xsl:choose>
              
               <!-- Set documentation annotation. -->
               <xsl:if test="$doc != ''">
                  <xsd:annotation>
                     <xsd:documentation>
                        <xsl:sequence select="$doc"/>
                     </xsd:documentation>
                  </xsd:annotation>
               </xsl:if>
              
               <!-- Recursively do child declarations. -->
               <xsl:if test="exists($children)">
                  <xsd:complexType>
                     <xsd:sequence>
                        <xsl:sequence select="jws:element-declarations($children)"/>
                     </xsd:sequence>
                  </xsd:complexType>
               </xsl:if>
            </xsd:element>
         </xsl:for-each>
      </xsl:if>
   </xsl:function>
  
</xsl:stylesheet>

16 comments:

Anonymous said...

I've already bookmark this article and will definitely refer this article to all my close friends and colleagues. Thanks for posting!

Anonymous said...

kiwbtuk [url=http://www.seebychloeja.com/]シーバイクロエ バッグ[/url] hicjkqa kvqxpdo http://www.colorfulchloeja.com/ rucmlld fvoiztw [url=http://www.chloe2013ss.com/]クロエ 財布[/url] jdxemor msntpfp [url=http://www.chloe2013ss.com/]クロエ バッグ[/url] yxyzdnc gvwqgcb [url=http://www.chloe2013ss.com/]クロエ アウトレット[/url] uglglpm ieurdjs http://www.chloe2013ss.com/ pftqfum tggrxxz [url=http://www.bestjpgucci.com/]グッチ アウトレット[/url] vhfvdrv cmperli [url=http://www.bestjpgucci.com/]グッチ 財布[/url] qmgvmpw yeusgpa [url=http://www.bestjpgucci.com/]グッチ バッグ[/url] mhmpzpw rnligef http://www.bestjpgucci.com/ gomtdvs tkykldl [url=http://www.guccistationsjp.com/]グッチ アウトレット[/url] udthnpa tofohyv [url=http://www.guccistationsjp.com/]グッチ 財布[/url] uwhklke ykclyjg [url=http://www.guccistationsjp.com/]gucci 財布[/url] bmxkkxa yybjmpi [url=http://www.guccisprings.com/]グッチ アウトレット[/url] swsfrgi ypqhtky http://www.guccisprings.com/ uvjfsun wxcghgy [url=http://www.guccisprings.com/]グッチ 財布[/url] pjyjhou tytarmr [url=http://www.guccisprings.com/]グッチ バッグ[/url] hcesife igyatqn [url=http://www.chloefind.com/]クロエ 財布[/url] ghjfujs dbqmnec [url=http://www.chloefind.com/]クロエ バッグ[/url] tnvalmw hypeawf [url=http://www.chloefind.com/]クロエ アウトレット[/url] jdhcyrq vqpszbg http://www.guccistationsjp.com/ jdgmgbj mqrpiif http://www.chloefind.com/ mgkuuch wsehnep [url=http://www.seebychloeja.com/]シーバイクロエ 財布[/url] quhzfeo yxxfvty [url=http://www.seebychloeja.com/]シーバイクロエ バッグ 新作[/url] luancbb ytehpmc http://www.seebychloeja.com/ zmjmigk weamvjg [url=http://www.colorfulchloeja.com/]クロエ 財布[/url] utbnblf pxqimfx [url=http://www.colorfulchloeja.com/]クロエ アウトレット[/url] avtkucw qnctqnu [url=http://www.colorfulchloeja.com/]シーバイクロエ バッグ[/url] nbqmdut

Unknown said...

2015626dongdong
mont blanc pens
louis vuitton
nfl jerseys
abercrombie
jordan 6
michael kors handbags
louis vuitton
louis vuitton
coach outlet
nfl jerseys
pandara jewelry
louis vuitton handbags
michael kors outlet
michael kors
coach factory outlet online
coach outlet store online
hollister
true religion
ray ban sunglasses
replica watches
gucci handbags
jordan 4
oakley sunglasses
mont blanc pens
oakley sunglasses
abercrombie
louis vuitton handbags
chi flat iron
michael kors outlet
polo ralph lauren
ralph lauren
coco chanel
michael kors outlet
mulberry uk
pandora charms
kate spade outlet
jordan 11s
christian louboutin

Unknown said...

7.12lllllyuan"oakley sunglasses wholesale"
"louis vuitton handbags outlet"
"ray-ban sunglasses"
"michael kors wallet"
"ray ban sunglasses"
"oakley sunglasses wholesale"
"longchamp handbags"
"juicy couture tracksuit"
"prada sunglasses for women"
"swarovski outlet"
"ralph lauren polo"
"longchamp outlet online"
"links of london"
"cartier watches"
"nike roshe run"
"asics"
"ferragamo outlet"
"cheap ray ban sunglasses"
"polo ralph lauren"
"michael kors outlet"
"nike tn pas cher"
"ray ban sunglasses"
"tiffany and co"
"tory burch outlet online"
"michael kors outlet"
"mulberry outlet"
"tiffany jewellery"
"tory burch outlet"
"babyliss pro"
"police sunglasses for men"
"prada outlet online"
"soccer jerseys"
"michael kors clearance"
"mcm outlet"
"true religion jeans outlet"
7.12

Unknown said...

seahawks jersey this
yeezy boost 350 black would
baltimore ravens jerseys website
new england patriots jerseys almost
valentino shoes at
under armour shoes back
carolina jerseys This
yeezy boost 350 have
hugo boss sale for
dolce and gabbana shoes easy

John said...

longchamp bags
christian louboutin shoes
discount oakley sunglasses
louboutin outlet
cheap air max
ralph lauren
true religion outlet online
michael kors outlet canada
longchamp handbags
yeezy boost
20170703yuanyuan

Unknown said...

uggs classic boots
nike store
ugg outlet
ugg boots
nike factory outlet
fitflops sale clearance
coach outlet
mac cosmetics
vibram fivefingers
louis vuitton
20179.25wengdongdong

Unknown said...

2017106 leilei3915

ugg boots
cheap jerseys
kate spade handbags
michael kors outlet online
michael kors handbags clearance
ralph lauren shirts
coach outlet online
prada outlet
coach outlet online
fred perry polo

Obat Mata Juling said...

his article is very helpful at all thanks

Obat Penyakit Asam Urat
Obat Kanker Rahim

Unknown said...

yeezy 700
michael kors handbags
true religion jeans
cheap nfl jerseys
golden goose francy
adidas yeezy
yeezy shoes
yeezy boost 350
kayno
fitflops sale clearance

jeje said...

Par conséquent, les individus peuvent Air Jordan 1 France restaurer leur résidence avec beaucoup moins de stress en utilisant cette colle. Certains des couleurs communes incluent le noir, le bleu, le blanc et même l'orange. Vous verrez dans cette mise en place de nombreux souffrants du genre de malaise obtenir avis chaussure running new balance un grand soulagement de votre sport de la marche. "Pourquoi Carrefour n'embauche pas plus de personnel?" quelqu'un interrogé. Ils offrent toutes les mêmes fonctionnalités que les anciens Kelty FC nike air max 1 femme soldes 3.0 et 2.0, avec quelques bonus qui pourraient rendre nos aventures en plein air beaucoup plus amusantes. Parmi les types de colle accessibles, Loctite GO2 Glue a gagné en popularité dans chaque élément du monde en raison de nike air jordan femme pas cher son efficacité.

zzyytt said...

red bottoms
reebok shoes
michael kors outlet
yeezy boost 350
air jordan
chrome hearts online
moncler jackets
michael kors outlet online
longchamp handbags
lacoste outlet

chenlili said...

ugg outlet
mbt shoes
coach outlet
michael kors outlet
michael kors outlet
lacoste outlet
coach outlet online
michael kors outlet
yeezy boost 350
polo ralph lauren
yaoxuemei20181105

zzyytt said...

fila shoes
jordan retro 6
rolex replica
golden goose outlet
birkin bag
chrome hearts outlet
off white shoes
jordan retro
kobe 9
kd 11

cara menggugurkan kandungan dan mempercepat haid said...

I thank you for the information and articles you provided

Lorriel Sims said...

A string conforming to the XML schema rules for the type in www.stuccorepairlasvegasnv.com/