Syntax and XML syntax specifications

As the parent page describes, the command syntax "picture" has two distinct parts. A command class registers Argument objects with the infrastructure to specify its formal command parameters. The concrete syntax for the command line is represented in memory by Syntax objects.

This page documents the syntactic constructs provided by the Syntax objects, and the XML syntax that provides the normal way of specifying a syntax.

You will notice that there can be a number of ways to build a command syntax from the constructs provided. This is redundancy is intentional.

The Syntax base class

The Syntax class is the abstract base class for all classes that represent high-level syntactic constructs in the "new" syntax mechanisms. A Syntax object has two (optional) attributes that are relevant to the process of specifying syntax:

  • The "label" attribute gives a name for the syntax node that will be used when the node is formatted; e.g. for "help" messages.
  • The "description" attribute gives a basic description for the syntax node.

These attributes are represented in an XML syntax element using optional XML attributes named "label" and "description" respectively.

ArgumentSyntax

An ArgumentSyntax captures one value for an Argument with a given argument label. Specifically, an ArgumentSyntax instance will cause the parser to consume one token, and to attempt to bind it to the Argument with the specified argument label in the current ArgumentBundle.

Note that many Arguments are very non-selective in the tokens that they will match. For example, while an IntegerArgument will accept "123" as valid, so will "FileArgument" and many other Argument classes. It is therefore important to take account the parser's handling of ambiguity when designing command syntaxes; see below.

Here are some ArgumentSyntax instances, as specified in XML:

    <argument argLabel="foo">
    <argument label="foo" description="this controls the command's fooing" argLabel="foo">

An EmptySyntax matches absolutely nothing. It is typically used when a command requires no arguments.

    <empty description="dir with no arguments lists the current directory">

OptionSyntax

An OptionSyntax also captures a value for an Argument, but it requires the value token to be preceded by a token that gives an option "name". The OptionSyntax class supports both short option names (e.g. "-f filename") and long option names (e.g. "--file filename"), depending on the constructor parameters.

    <option argLabel="filename" shortName="f">
    <option argLabel="filename" longName="file">
    <option argLabel="filename" shortName="f" longName="file">

If the Argument denoted by the "argLabel" is a FlagArgument, the OptionSyntax matches just an option name (short or long depending on the attributes).

SymbolSyntax

A SymbolSyntax matches a single token from the command line without capturing any Argument value.

    <symbol symbol="subcommand1">

VerbSyntax

A VerbSyntax matches a single token from the command line, setting an associated Argument's value to "true".

    <verb symbol="subcommand1" argLabel="someArg">

SequenceSyntax

A SequenceSyntax matches a list of child Syntaxes in the order specified.

    <sequence description="the input and output files">
        <argument argLabel="input"/>
        <argument argLabel="output"/>
    </sequence>

AlternativesSyntax

An AlternativesSyntax matches one of a list of alternative child Syntaxes. The child syntaxes are tried one at a time in the order specified until one is found that matches the tokens.

    <alternatives description="specify an input or output file">
        <option shortName="i" argLabel="input"/>
        <option shortName="o" argLabel="output"/>
    </alternatives>

RepeatSyntax

A RepeatSyntax matches a single child Syntax repeated a number of times. By default, any number of matches (including zero) will satisfy a RepeatSyntax. The number of required and allowed repetitions can be constrained using the "minCount" and "maxCount" attributes. The default behavior is to match lazily; i.e. to match as few instances of the child syntax as is possible. Setting the attribute eager="true" causes the powerset to match as many child instances as possible, within the constraints of the "minCount" and "maxCount" attributes.

    <repeat description="zero or more files">
        <argument argLabel="file"/>
    </repeat>

    <repeat minCount="1" description="one or more files">
        <argument argLabel="file"/>
    </repeat>

    <repeat maxCount="5" eager="true" 
              description="as many files as possible, up to 5">
        <argument argLabel="file"/>
    </repeat>

OptionalSyntax

An OptionalSyntax optionally matches a sequence of child Syntaxes; i.e. it matches nothing or the sequence. The default behavior is to match lazily; i.e. to try the "nothing" case first. Setting the attribute eager="true" causes the "nothing" case to be tried second.

    <optional description="nothing, or an input file and an output file">
        <argument argLabel="input"/>
        <argument argLabel="output"/>
    </optional>

    <optional eager="true"
                 description="an input file and an output file, or nothing">
        <argument argLabel="input"/>
        <argument argLabel="output"/>
    </optional>

PowerSetSyntax

A PowerSetSyntax takes a list of child Syntaxes and matches any number of each of them in any order or any interleaving. The default behavior is to match lazily; i.e. to match as few instances of the child syntax as is possible. Setting the attribute eager="true" causes the powerset to match as many child instances as possible.

    <powerSet description="any number of inputs and outputs">
        <option argLabel="input" shortName="i"/>
        <option argLabel="output" shortName="o"/>
    </powerSet>

OptionSetSyntax

An OptionSetSyntax is like a PowerSetSyntax with the restriction that the child syntaxes must all be OptionSyntax instances. But what OptionSetSyntax different is that it allows options for FlagArguments to be combined in the classic Unix idiom; i.e. "-a -b" can be written as "-ab".

    <optionSet description="flags and value options">
        <option argLabel="flagOne" shortName="1"/>
        <option argLabel="flagTwo" shortName="2"/>
        <option argLabel="argThree" shortName="3"/>
    </optionSet>

Assuming that the "flagOne" and "flagTwo" correspond to FlagArguments, and "argThree" corresponds to (say) a FileArgument, the above syntax will match any of the following: "-1 -2 -3 three", "-12 -3 three", "-1 -3 three -1", "-3 three" or even an empty argument list.

The <syntax ... > element

The outermost element of an XML Syntax specification is the <syntax> element. This element has a mandatory "alias" attribute which associates the syntax with an alias that is in force for the shell. The actual syntax is given by the <syntax> element's zero or more child elements. These must be XML elements representing Syntax sub-class instances, as described above. Conceptually, each of the child elements represents an alternative syntax for the command denoted by the alias.

Here are some examples of complete syntaxes:

    <syntax alias="cpuid">
        <empty description="output the computer's id">
    </syntax>

    <syntax alias="dir">
        <empty description="list the current directory"/>
        <argument argLabel="directory" description="list the given directory"/>
    </syntax>

Ambiguous Syntax specifications

If you have implemented a language grammar using a parser generator (like Yacc, Bison, AntLR and so on), we will recall how the parser generator could be very picky about your input grammar. For example, these tools will often complain about "shift-reduce" or "reduce-reduce" conflicts. This is a parser generator's way of saying that the grammar appears (to it) to be ambiguous.

The new-style command syntax parser takes a different approach. Basically, it does not care if a command syntax supports multiple interpretations of a command line. Instead, it uses a simple runtime strategy to resolve ambiguity: the first complete parse "wins".

Since the syntax mechanisms don't detect ambiguity, it is up to the syntax designer to be aware of the issue, and take it into account when designing the syntax. Here is an example:

    <alternatives>
        <argument argLabel="number">
        <argument argLabel="file">
    </alternatives>

Assuming that "number" refers to an IntegerArgument, and "file" refers to a FileArgument, the syntax above is actually ambiguous. For example, a parser could in theory bind "123" to the IntegerArgument or the FileArgument. In practice, the new-style command argument parser will pick the first alternative that gives a complete parse, and bind "123" to the IntegerArgument. If you (the syntax designer) don't want this (e.g. because you want the command to work for all legal filenames), you will need to use OptionSyntax or TokenSyntax or something else to allow the end user to force a particular interpretation.

SyntaxSpecLoader and friends

More about the Syntax base class.

If you are planning on defining new sub-classes of Syntax, the two key behavioral methods that must be implemented are as follows:

  • The "prepare" method is responsible for translating the syntax node into the MuSyntax graph that will be used by the parser. This will typically be produced by preparing any child syntaxes, and assembling them using the appropriate MuSyntax constructors. If the syntax entails recursion at the MuSyntax levels, this will initially be expressed using MuBackReferences. The recursion points will then transformed into graph cycles by calling "MuSyntax.resolveBackReferences()".
    Another technique that can be used is to introduce "synthetic" Argument nodes with special semantics. For example, the OptionSetSyntax uses a special Argument class to deal with combined short options; e.g. where "-a -b" is expressed as "-ab".

  • The "format" method renders the Syntax in a form that is suitable for "usage" messages.
  • The "toXML" method creates a "nanoxml.XMLElement" that expresses this Syntax node in XML form. It is used by the "SyntaxCommand" class when it dumps an syntax specification as text. It is important that "toXML" produces XML that is compatible with the SyntaxSpecLoader class; see below.