dev

XML parser for valid XML streams in Lua. This module is a fork of the xml2lua library by @manoelcampos. It is available under the MIT license, as with the original library.

The parser provides a partially object-oriented API with its functionality split into tokeniser and handler components.

The handler instance from xml.handlers is passed to the tokeniser via xml.parser and receives callbacks for each XML element processed (if a suitable handler function is defined). The API is conceptually similar to the SAX API but implemented differently.

XML data is passed to the parser instance through the XMLParser:parse method. Note that the parser only accepts a single string currently.

The default XML handler is xml.handlers.DOM, due to its ability to nondestructively parse any XML (representing comments, text nodes and mixed content appropriately). The module provides a serialiser supporting XML DOM root tables at xml.serialise, which has a compatibility layer for XML tree root tables.

If your application involves bidirectional parsing of data, such as the contents of templates using Wikia's infobox component, the xml.handlers.DOM handler is recommended. When creating XML configuration files for use in Lua modules, it is recommended to use the xml.handlers.Tree handler which allows for easier node traversal and data extraction.

Features

Limitations

Usage

local xml = require('Dev:XML')

-------- Uses a handler that converts the XML to a Lua table. --------
local tree_handler = xml.handlers.Tree
local inspect = require('Module:Inspect')
local options = { indent = '    ' }

----------------------- Books XML parse code. ------------------------
mw.log('books.xml')
local books_root = xml.parse(xml.load('Dev:XML/testcases/books'))
mw.log(inspect(books_root, options))

----------------------- People XML parse code. -----------------------
mw.log('people.xml')
local people_root = xml.parse(xml.load('Dev:XML/testcases/people'))
mw.log(inspect(people_root, options))

Documentation

Package items

xml.parse(str, handler, parser_opts, handler_opts) (function)
Parses an XML string into an abstract syntax tree or event trace. This function includes logic to attach a handler to the XML parser, making it much more convenient than xml.parser.
Parameters:
  • str XML string to be parsed. (string)
  • handler Handler to use. Default: "DOM". Accepts the following values:
    • "DOM" - DOM handler (typed).
    • "Tree" - tree handler.
    • "Print" - parser logging.
    • Custom handler in the form of a Lua table.
  • (string|table)
  • parser_opts Parser configuration options. Defaults are listed in xml.parser options. (table; optional)
  • handler_opts Handler configuration options. Defaults are listed in xml.handler options. (table; optional)
Error: 'XML handler "$handler" not found' (line 688)
Returns: Lua representation of XML root structure. (table)
xml.serialise(tbl, level) (function)
Converts a Lua XML DOM tree to a XML string representation.
Parameters:
  • tbl DOM or tree root for XML conversion. This parameter is the root table generated by a xml.handlers.DOM or xml.handlers.Tree parser instance. (table)
  • level Only used internally, when the function is called recursively to print indentation. (number; optional)
Error: 'cannot serialise this value. Are you using a handler other than "xml.handlers.DOM" and "xml.handlers.Tree"?' (line 739)
Returns: XML string representation for table. (string)
xml.load(filepath) (function)
Loads an XML file from a specified path. If the file is in the Module namespace, the loader assumes the page is a Lua module returning a string. Otherwise, the loader will fetch the page's raw text, removing any leading non-XML comment/shebang.
Parameter: filepath XML file target path (including namespace). (string)
Error: 'file "$filepath" does not contain XML'
  • The page filepath does not exist.
  • The module filepath does not exist or does not export a string.
(line 784)
Returns: The contents of the XML file. (string)
xml.parser(handler, options) (function)
Instantiates a XmlParser object to parse a XML string.
Parameters:
  • handler Handler object to be used to convert the XML string to another format, usually from xml.handlers. (table)
  • options Options for parsing XML. (table; optional)
    • options.stripWS Strip non-significant whitespace (leading or trailing) and do not generate events for empty text elements. Default: true. (table; optional)
    • options.stripWS (table; optional)
    • options.expandEntities Expand entities (standard entities and single character numeric entities only currently - could be extended at runtime if a suitable DTD parser added elements to the table (see XMLParser._ENTITIES). May also be possible to expand multibyre entities for UTF-8 only. Default: true. (table; optional)
    • options.errorHandler Custom error handler function. (table; optional)
Returns: An XML parser instance used to parse the XML.
xml.handlers (table)
XML handlers for conversion logic in the XML parser.
xml.handlers.DOM (table)
Handler to generate a DOM-like node tree structure. The tree structure has a single ROOT node parent, and is capable of representing any valid XML document. Each node is a table comprising the fields below:
  • _name - element name (string)
  • _type - any of 'ROOT', 'ELEMENT', 'TEXT', 'COMMENT', 'PI', 'DECL', 'DTD' (string)
    • PI - XML Processing Instruction tag.
    • DECL - XML declaration tag
  • _attr - node attributes - see callback API (table)
  • _parent - parent node (table)
  • _children - child nodes (table)
xml.handlers.DOM:new(options) (function • constructor)
Instantiates a new DOM handler.
Parameters:
  • options Handler options for parsing. (table)
    • options.commentNode Whether to include comment nodes. Default: true. (boolean; optional)
    • options.piNode Whether to include processing instruction nodes. Default: true. (boolean; optional)
    • options.dtdNode Whether to include DTD declaration nodes. Default: true. (boolean; optional)
    • options.declNode Whether to include XML declaration nodes. Default: true. (boolean; optional)
xml.handlers.Tree (table)
Handler to generate a natural table-based tree. This handler supports many XML formats. The XML structure tree is mapped into a recursive map of node names to child elements (as a string representing text, or a table of values).
Where there is only a single child element this is inserted as a named key. If there are multiple elements, these are inserted as an array element (in some cases it may be preferable to always insert elements as an array elment which can be specified on a per element basis in the options). Attributes are inserted as a child element with a key of '_attr'.
In general, this format is relatively useful, despite the following limitations:
  • Tag/text & CDATA elements are processed - all others are ignored.
  • Mixed-Content XML behaves unpredictably.
  • If a leaf element has both a text element and attributes, the text must be accessed through an array element (to provide a container for the attribute).
xml.handlers.Tree:new(options) (function • constructor)
Instantiates a new tree handler.
Parameters:
  • options Handler options for parsing. (table)
    • options.noreduce Boolean map of tag names that node children elements will not be reduced for even if there is only one child. (table; optional)
Returns: Tree handler instance. (Handler)
xml.handlers.Print (table)
Handler to generate simple event tracing during parsing. Outputs messages to the Scribunto console during the parse process, usually for debugging purposes.
xml.handlers.Print:new(options) (function • constructor)
Instantiates a new Print handler.
Parameters:
  • options Handler options for parsing. (table)
    • options.commentNode Whether to include comment nodes. Default: true. (boolean; optional)
    • options.piNode Whether to include processing instruction nodes. Default: true. (boolean; optional)
    • options.dtdNode Whether to include DTD declaration nodes. Default: true. (boolean; optional)
    • options.declNode Whether to include XML declaration nodes. Default: true. (boolean; optional)

XMLParser

Class providing the actual XML parser.

XmlParser.new(_handler, _options) (function)
Instantiates a XmlParser object.
Parameters:
  • _handler Handler object to be used to convert the XML string to another formats. See the available handlers at xml.handlers. (table)
  • _options Options for this XmlParser instance, defined in xml.parser.
XmlParser:parse(str, parseAttributes) (function)
Main function which starts the XML parsing process
Parameters:
  • str the XML string to parse (string)
  • parseAttributes indicates if tag attributes should be parsed or not. Default: true. (boolean; optional)

Handler

Handler object, used to generate parser output.

Handler:new(options) (function)
Instantiates a new handler object. Each instance can handle a single XML string. By using such a constructor, you can parse multiple XML files in the same application.
Parameter: options Handler configuration options. (table; optional)
Returns: Handler object instance. (Hander)
Note: This method is not available in xml.handlers.Print.
Handler:starttag(tag, tag1, tag2, s, e) (function)
Parses a start tag.
Parameters:
  • tag A table describing the opening tag and its attribute nodes. (table)
  • tag1 The name of the tag. (string; optional)
  • tag2 The atribute nodes of the tag. (table; optional)
  • s Start index of match. (number; optional)
  • e End index of match. (number; optional)
Handler:endtag(tag, tag1, tag2, s, e) (function)
Parses an end tag.
Parameters:
  • tag A table describing the closing tag and its attribute nodes. (table)
  • tag1 The name of the tag. (string; optional)
  • tag2 The atribute nodes of the tag. (table; optional)
  • s Start index of match. (number; optional)
  • e End index of match. (number; optional)
Handler:text(text, s, e) (function)
Parses the text content of a tag.
Parameters:
  • text Text content to process. (string)
  • s Start index of match. (number; optional)
  • e End index of match. (number; optional)
Handler:comment(text, s, e) (function)
Parses a comment tag.
Parameters:
  • text Comment text to process. (string)
  • s Start index of match. (number; optional)
  • e End index of match. (number; optional)
Handler:pi(tag, tag1, tag2, s, e) (function)
Parses a XML processing instruction (PI) tag
Parameters:
  • tag A table describing the opening tag and its attribute nodes. (table)
  • tag1 The name of the tag. (string; optional)
  • tag2 The atribute nodes of the tag. (table; optional)
  • s Start index of match. (number; optional)
  • e End index of match. (number; optional)
Handler:decl(tag, tag1, tag2, s, e) (function)
Parse the XML declaration line (indicating the XML version).
Parameters:
  • tag A table describing the opening tag and its attribute nodes. (table)
  • tag1 The name of the tag. (string; optional)
  • tag2 The atribute nodes of the tag. (table; optional)
  • s Start index of match. (number; optional)
  • e End index of match. (number; optional)
Handler:dtd(tag, tag1, tag2, s, e) (function)
Parses a DTD tag.
Parameters:
  • tag A table describing the opening tag and its attribute nodes. (table)
  • tag1 The name of the tag. (string; optional)
  • tag2 The atribute nodes of the tag. (table; optional)
  • s Start index of match. (number; optional)
  • e End index of match. (number; optional)
Handler:cdata(text, s, e) (function)
Parses a CDATA section.
Parameters:
  • text Text content to process. (string)
  • s Start index of match. (number; optional)
  • e End index of match. (number; optional)