Skip to: Site menu | Main content

Woodstox

High-performance XML processor.

StAX2 Print

(note: As of 09-Jun-2009, following is somewhat incomplete list, but should cover all major areas of functionality up to and including Stax2 v3.0)

"StAX2" API

StAX2 is an experimental API that is intended to extend basic StAX specifications in a way that allows implementations to experiment with features before they end up in the actual StAX specification (if they do). As such, it is intended to be freely implementable by all StAX implementations same way as StAX, but without going through a formal JCP process.

Currently Woodstox is the only known implementation.

To access StAX API jar, you can go to Woodstox Download page.

For javadocs you can go back to Woodstox Documentation page.

Goals

Main goals of StAX2 API are:

Access to formerly missing/inaccessible information

  • DOCTYPE declaration: can access root name, public/system identifiers, and internal subset all separately; similarly added matching output methods
  • Attribute information: can find index of the specified attribute, as well as ID and NOTATION attribute indexes (as per DTD)
  • Location information (XMLStreamLocation2):
    • Access to nested Locations (parent context)
    • Separate byte/char offsets; value type as long instead of int to allow for multi-gigabyte input NOT FULLY IMPLEMENTED
  • Misc. information:
    • XMLStreamReader.isEmptyElement() (whether current START_ELEMENT was a result of an empty element, i.e. <tag />),
    • XMLStreamReader.getDepth() (number of enclosing start elements, including current START_ELEMENT)

Efficient (fully streaming) access

  • getText methods that write contents of textual events through Writer object passed as the argument
  • Pass-through XMLStreamWriter.copyEventFromReader method
  • XMLStreamReader.skipElement allows efficient skipping of all children of an element

Configurable writers

  • All events can now be output through specified XMLStreamWriter; this allows for full configuration of output (namespace bindings, encoding/quoting)
  • Can define a custom quoter/encoder for textual content
  • Can define a custom quoter/encoder for attribute values
  • Raw write access using XMLStreamWriter.writeRaw(), for outputting text as is, without quoting/encoding.

Basic support for per-reader/writer configuration

  • XMLStreamReader: allow overriding of external DTD subset.
  • XMLStreamWriter: (no options yet)

Fully pluggable bi-directional validation framework

(for Stax2 v2.0 and above, i.e. Woodstox 3.0)

  • Extensible: easy to add custom validators, and infoset augmenters (for example, DTD validator adds default attribute values).
  • Support for multiple chainable validators (for example: can do both DTD validation and RelaxNG validation simultaneously).
  • Fully bi-directional: can not only validate when parsing, but when outputting as well.
  • Configurable problem (validation error) handling: can collect all validation problems, or do fail-fast validation (exception on encountering the first problem).

Symmetry between Readers and Writers (where applicable)

Within Stax 1.0, writers seem like ugly stepchildren, ignored by the specification. Whereas readers have full set of accessors, and configuration, writers have bare-bones support (except for the confusing two-in-one handling of the repairing and non-repairing modes).

Stax2 tries to address this asymmetry by bringing over those Reader methods to Writer side that make sense, as well as make sure that new Stax2 functionality listed above will also be available for Writers, where it makes sense. For example:

  • You can add XMLReporter to writers as well, to be able to report non-fatal problems.
  • Optional Location information will be accessible from Writers too: either so that they can relay information from incoming events (for event writers), or even keep track of the output line numbers. This should be useful for debugging purposes, esp. when used with Writer-side validation.

Typed Access

(for Stax2 v3.0 and above, i.e. Woodstox 4.0)

This means adding ability to read and write textual content (attribute values, text segments) as typed content – if contents are serialized numbers, QNames, arrays, base64 encoded binary data, they can be accessed directly as such. Stax2 implementation handles necessary conversions as efficiently and conveniently as possible.

In addition to supported types, simple plug-in extensions exist to allow for binding custom types as well.

Naming

The idea behind name "Stax2" is really either "Stax squared" (ie. Stax^2); or, "Stax, take 2", but NOT "Stax version 2.0". There is hope that new revisions of the official Stax specification/standard will be done, sometime in future; and work with Stax2 is at most to provide ideas and experiences, and not to replace actual standardization efforts.