A common problem: You have a certain XML file, of a semi-fixed format. No document type definition (DTD), no XML schema definition (XSD), just some "agreed-on" XML structure. You want to load a bunch of those files into Java, and work with them. Best would be to transfer them into Java classes or beans. Castor allows this - see http://www.castor.org/ Install Castor 1.2, you'll need the complete version with source code; otherwise, it seems to be missing some of the dependencies, e.g., velocity-1.5.jar. The scripts to run are found in <CASTOR>/bin as .sh and/or .bat scripts, e.g., classpath.bat/.sh which will be used in the following.
Step 1: Generating a schema definition
Castor is able to generate an xsd file from an XML instance file. This schema might not be complete nor correct, yet it is a good starting point. Possible, you'll have to patch it, to remove nodes that have no well-known structure, or to add others that don't appear in the selected instance file.classpath.bat org.exolab.castor.xml.schema.util.XMLInstance2Schema input.xml [output.xsd]If no output file is given, the schema is written to standard out. Alternatively, you can used the class from your own code:
XMLInstance2Schema instance2Schema = new XMLInstance2Schema(); Schema schema = instance2Schema. createSchema("input.xml"); System.out.println(schema); // copied from XMLInstance2Schema#main Writer dstWriter = new PrintWriter( new FileOutputStream("output.xsd"), true); SchemaWriter schemaWriter = new SchemaWriter(dstWriter); schemaWriter.write(schema); dstWriter.flush();Some Links:
Step 2: Patch the generated schema
Often, changes to the generated schema file are necessary. The input.xml may, for example, contain a set of nodes that are not really well-agreed on, change regularly, or are very different between different instance files. In our case, it was some html-formatted text that was just barely made xml-compatible by making sure each <p> also contained </p> ... not even xhtml, I'd say. So, we replaced a complex node structure{sequence} {element name="p"} {complexType} {all} {element name="i"} {complexType mixed="true"} {sequence} [...]with simple
{element name="p" type="xsd:anyType" /}Links:
Step 3: Generate the Java classes
Next step, Castor generates Java classes from the schema definition. Again, this can either be done by the sourceGen.bat provided with castor, or programmatically viaorg.exolab.castor.builder.SourceGeneratorMain.main(new String[] {param1, param2, ...})
.
sourceGen.bat -i output-patched.xsd -package my.package.name -dest src -f -types j2-f suppresses any non-fatal warnings, including the overwriting of existing files. -types j2 uses java.util.List for collections, even List<Type> with Java 5.0 as below. For each type Type of the schema, a my.package.name.Type java file is generated, and a my.package.name.descriptors.TypeDescriptor for Castor use. Oh, I also put a castorbuilder.properties file into the current directory which contained
# Defines the XML parser to be used by Castor. # The parser must implement org.xml.sax.Parser. org.exolab.castor.parser=org.xml.sax.helpers.XMLReaderAdapter # Defines the (default) XML serializer factory to use by Castor, which must # implement org.exolab.castor.xml.SerializerFactory; default is # org.exolab.castor.xml.XercesXMLSerializerFactory org.exolab.castor.xml.serializer.factory=org.exolab.castor.xml.XercesJDK5XMLSerializerFactory # Defines the default XML parser to be used by Castor. org.exolab.castor.parser=com.sun.org.apache.xerces.internal.parsers.SAXParser org.exolab.castor.builder.javaVersion=5.0Castor Source-Generation
Step 4: Use the classes
Write some code that unmarshals the XML file(s), and prints the resulting objects. toString() is not overridden, so you have to query each attribute and subnode individually.TopType top = (TopType) Unmarshaller.unmarshal( TopType.class, new FileReader("input.xml")); // topType.getSubItem returns SubItem[] for (SubItem item: topType.getSubItem()) { System.out.printf("SubItem id: %s; value: %s\n", item.getSomeId(), item.getSomeValue()); // p is just the anyType object from above; toString(), it // prints the XML content as a fragment. System.out.println(item.getP()); }Have fun with it!
No comments:
Post a Comment