=head1 NAME XML::Compile::Schema - Compile a schema into CODE =head1 INHERITANCE XML::Compile::Schema is a XML::Compile =head1 SYNOPSIS # compile tree yourself my $parser = XML::LibXML->new; my $tree = $parser->parse...(...); my $schema = XML::Compile::Schema->new($tree); # get schema from string my $schema = XML::Compile::Schema->new($xml_string); # get schema from file my $schema = XML::Compile::Schema->new($filename); # adding schemas $schema->addSchemas($tree); $schema->importDefinitions('http://www.w3.org/2001/XMLSchema'); $schema->importDefinitions('2001-XMLSchema.xsd'); # create and use a reader my $read = $schema->compile(READER => '{myns}mytype'); my $hash = $read->($xml); # create and use a writer my $doc = XML::LibXML::Document->new('1.0', 'UTF-8'); my $write = $schema->compile(WRITER => '{myns}mytype'); my $xml = $write->($doc, $hash); # show result print $xml->toString; # to create the type nicely use XML::Compile::Util qw/pack_type/; my $type = pack_type 'myns', 'mytype'; print $type; # shows {myns}mytype =head1 DESCRIPTION This module collects knowledge about one or more schemas. The most important method provided is L, which can create XML file readers and writers based on the schema information and some selected element or attribute type. Various implementations use the translator, and more can be added later: =over 4 =item C<$schema->compile('READER'...)> translates XML to HASH The XML reader produces a HASH from a XML::LibXML::Node tree or an XML string. Those represent the input data. The values are checked. An error produced when a value or the data-structure is not according to the specs. The CODE reference which is returned can be called with anything accepted by L. example: create an XML reader my $msgin = $rules->compile(READER => '{myns}mytype'); # or ... = $rules->compile(READER => pack_type('myns', 'mytype')); my $xml = $parser->parse("some-xml.xml"); my $hash = $msgin->($xml); or my $hash = $msgin->('some-xml.xml'); my $hash = $msgin->($xml_string); my $hash = $msgin->($xml_node); =item C<$schema->compile('WRITER', ...)> translates HASH to XML The writer produces schema compliant XML, based on a Perl HASH. To get the data encoding correctly, you are required to pass a document object in which the XML nodes may get a place later. example: create an XML writer my $doc = XML::LibXML::Document->new('1.0', 'UTF-8'); my $write = $schema->compile(WRITER => '{myns}mytype'); my $xml = $write->($doc, $hash); print $xml->toString; alternative my $write = $schema->compile(WRITER => 'myns#myid'); =item C<$schema->template('XML', ...)> creates an XML example Based on the schema, this produces an XML message as example. Schemas are usually so complex that people loose overview. This example may put you back on track, and used as starting point for many creating the XML version of the message. =item C<$schema->template('PERL', ...)> creates an Perl example Based on the schema, this produces an Perl HASH structure (a bit like the output by Data::Dumper), which can be used as template for creating messages. The output contains documentation, and is usually much clearer than the schema itself. =back Be warned that the B; you can develop schemas which do work well with this module, but are not valid according to W3C. In many cases, however, the translater will refuse to accept mistakes: mainly because it cannot produce valid code. =head1 METHODS =head2 Constructors XML::Compile::Schema-EB(TOP, OPTIONS) =over 4 Collect schema information. Details about many name-spaces can be organized with only a single schema object (actually, the data is administered in an internal L object) Option --Defined in --Default hook undef hooks [] schema_dirs XML::Compile undef . hook => ARRAY-WITH-HOOKDATA | HOOK =over 4 See L. Adds one HOOK (HASH). =back . hooks => ARRAY-OF-HOOK =over 4 See L. =back . schema_dirs => DIRECTORY|ARRAY-OF-DIRECTORIES =back =head2 Accessors $obj-EB(HOOKDATA|HOOK|undef) =over 4 HOOKDATA is a LIST of options as key-value pairs, HOOK is a HASH with the same data. C is ignored. See L and L below. =back $obj-EB(HOOK, [HOOK, ...]) =over 4 Add multiple hooks at once. These must all be HASHes. See L and L. C values are ignored. =back $obj-EB(DIRECTORIES) XML::Compile::Schema-EB(DIRECTORIES) =over 4 See L =back $obj-EB(XML, OPTIONS) =over 4 Collect all the schemas defined in the XML data. The XML parameter must be a XML::LibXML node, therefore it is adviced to use L, which has a much more flexible way to specify the data. No OPTIONS are defined, on the moment. =back $obj-EB(FILENAME) =over 4 See L =back $obj-EB =over 4 Returns the LIST of defined hooks (as HASHes). =back $obj-EB(XMLDATA, OPTIONS) =over 4 Import (include) the schema information included in the XMLDATA. The XMLDATA must be acceptable for L. The resulting node and the OPTIONS are passed to L. =back $obj-EB(NAMESPACE|PAIRS) XML::Compile::Schema-EB(NAMESPACE|PAIRS) =over 4 See L =back $obj-EB =over 4 Returns the L object which is used to collect schemas. =back =head2 Read XML $obj-EB(NODE|REF-XML-STRING|XML-STRING|FILENAME|KNOWN) =over 4 See L =back =head2 Filters $obj-EB(NODE, CODE) =over 4 See L =back =head2 Compilers $obj-EB(('READER'|'WRITER'), TYPE, OPTIONS) =over 4 Translate the specified ELEMENT (found in one of the read schemas) into a CODE reference which is able to translate between XML-text and a HASH. When the TYPE is C, an empty LIST is returned. The indicated TYPE is the starting-point for processing in the data-structure, a toplevel element or attribute name. The name must be specified in C<{url}name> format, there the url is the name-space. An alternative is the C which refers to an element or type with the specific C attribute value. When a READER is created, a CODE reference is returned which needs to be called with XML, as accepted by L. Returned is a nested HASH structure which contains the data from contained in the XML. The transformation rules are explained below. When a WRITER is created, a CODE reference is returned which needs to be called with an XML::LibXML::Document object and a HASH, and returns a XML::LibXML::Node. Most options below are explained in more detailed in the manual-page L, which implements the compilation.. Option --Default anyAttribute undef anyElement undef attributes_qualified check_occurs check_values elements_qualified hook undef hooks undef ignore_facets include_namespaces namespace_reset output_namespaces {} path sloppy_integers . anyAttribute => CODE =over 4 In general, C schema components cannot be handled automatically. If you need to create or process anyAttribute information, then read about wildcards in the DETAILS chapter of the manual-page for the specific back-end. =back . anyElement => CODE =over 4 In general, C schema components cannot be handled automatically. If you need to create or process any information, then read about wildcards in the DETAILS chapter of the manual-page for the specific back-end. =back . attributes_qualified => BOOLEAN =over 4 When defined, this will overrule the C flags in all schemas. When not qualified, the xml will not produce nor process prefixes on attributes. =back . check_occurs => BOOLEAN =over 4 Whether code will be produced to do bounds checking on elements and blocks which may appear more than once. When the schema says that maxOccurs is 1, then that element becomes optional. When the schema says that maxOccurs is larger than 1, then the output is still always an ARRAY, but now of unrestricted length. =back . check_values => BOOLEAN =over 4 Whether code will be produce to check that the XML fields contain the expected data format. Turning this off will improve the processing speed significantly, but is (of course) much less safe. Do not set it off when you expect data from external sources: validation is a crucial requirement for XML. =back . elements_qualified => C|C|C|BOOLEAN =over 4 When defined, this will overrule the C flags in all schemas. When C is specified, at least the top-element will be name-space qualified. When C or a true value is given, then all elements will be used qualified. When C or a false value is given, the XML will not produce or process prefixes on the elements. The C
attributes will be respected, except on the top element when C is specified. Use hooks when you need to fix name-space use in more subtile ways. =back . hook => HOOK|ARRAY-OF-HOOKS =over 4 Define one or more processing hooks. See L below. These hooks are only active for this compiled entity, where L and L can be used to define hooks which are used for all results of L. The hooks specified with the C or C option are run before the global definitions. =back . hooks => HOOK|ARRAY-OF-HOOKS =over 4 Alternative for option C. =back . ignore_facets => BOOLEAN =over 4 Facets influence the formatting and range of values. This does not come cheap, so can be turned off. It affects the restrictions set for a simpleType. The processing speed will improve, but validation is a crucial requirement for XML: please do not turn this off when the data comes from external sources. =back . include_namespaces => BOOLEAN =over 4 Indicates whether the WRITER should include the prefix to namespace translation on the top-level element of the returned tree. If not, you may continue with the same name-space table to combine various XML components into one, and add the namespaces later. =back . namespace_reset => BOOLEAN =over 4 Use the same prefixes in C as with some other compiled piece, but reset the counts to zero first. =back . output_namespaces => HASH =over 4 Can be used to predefine an output namespace (when 'WRITER') for instance to reserve common abbreviations like C for external use. Each entry in the hash has as key the namespace uri. The value is a hash which contains C, C, and C fields. Pass a reference to a private hash to catch this index. =back . path => STRING =over 4 Prepended to each error report, to indicate the location of the error in the XML-Scheme tree. =back . sloppy_integers => BOOLEAN =over 4 The C and C types must support at least 18 digits, which is larger than Perl's 32 bit internal integers. Therefore, the implementation will use Math::BigInt objects to handle them. However, often an simple C type whould have sufficed, but the XML designer was lazy. A long is much faster to handle. Set this flag to use C as fast (but inprecise) replacements. Be aware that C and C objects are nearly but not fully transparent mimicing the behavior of Perl's ints and floats. See their respective manual-pages. Especially when you wish for some performance, you should optimize access to these objects to avoid expensive copying which is exactly the spot where the difference are. =back =back $obj-EB =over 4 List all elements, defined by all schemas sorted alphabetically. =back $obj-EB