The new command line syntax mechanism

In the classical Java world, a command line application is launched by calling the "main" entry point method on a nominated class, passing the user's command arguments as an array of Strings. The command is responsible for working out which arguments represent options, which represent parameters and so on. While there are (non-Sun) libraries to help with this task (like the Java version of GNU getOpts), they are rather primitive.

In JNode, we take a more sophisticated approach to the issue of command arguments. A native JNode command specifies its formal arguments and command line syntax. The task of matching actual command line arguments is performed by JNode library classes. This approach offers a number of advantages over the classical Java approach:

The application programmer has less work to do.
The user sees more uniform command syntax.
Diagnostics for incorrect command arguments can be more uniform.

In addition, this approach allows us to do some things at the Shell level that are difficult with (for example) UNIX style shells.

The JNode shell does intelligent command line completion based on a command's declared syntax and argument types. For example, if the syntax requires a device name at the cursor position when the user hits TAB, the JNode shell will complete against the device namespace.
The JNode help command uses a command's declared syntax to produce accurate "usage" and parameter type descriptions. These can be augmented by descriptions embedded in the syntax, or in separate files.
In the new version of the JNode syntax mechanisms, command syntaxes are specified in XML separate from the Java source code. Users can tailor the command syntax, like UNIX aliases only better. This can be used to support portable scripting; e.g. Unix-like command syntaxes could be used with a POSIX shell compatible interpreter to run Unix shell scripts.

As the above suggests, there are two versions of JNode command syntax and associated mechanisms; i.e parsing, completion, help and so on. In the first version (the "old" mechanisms) the application class declares a static Argument object for each formal parameter, and creates a static "Help.Info" data structure containing Syntax objects that reference the Arguments. The command line parser and completer traverse the data structures, binding values to the Arguments.

The problems with the "old" mechanisms include:

Use of statics to hold the Argument and Help.Info objects makes JNode commands non-reentrant, leading to unpredictable results when a command is executed in two threads.
The Syntax, Argument and associated classes were never properly documented, making them hard to maintain and hard to use.
There were numerous bugs and implementation issues; e.g. Unix-style named options didn't work, completion didn't work properly with alternative syntaxes, and so on.
Command syntaxes could not be tailored, as described above.

The second version (the "new" mechanisms) are a ground-up redesign and reimplementation:

Argument objects are created by the command class constructor, and registered to form an ArgumentBundle. Thus, command syntax is not an impediment to making command classes re-entrant.
Syntax objects are created from XML that is defined in the command's plugin descriptor, and that can be overridden from the JNode shell using the "syntax" command.
The "new" Syntax classes are much richer than the "old" versions. Each Syntax class has a "prepare" method that emits a simple BNF-like grammar; i.e. the MuSyntax classes. This grammar is used by the MuParser which performs n-level backtracking, and supports "normal" and "completion" modes. (Completion mode parsing works by capturing completions at the appropriate point and then initialing backtracking to find other alternatives.)

A worked example: the Cat command.

(This example is based on material provided by gchii)

The cat command is a JNode file system command for the concatenation of files.
The alternative command line syntaxes for the command are as follows:

 cat 
 cat -u | -urls <url> ... |
 cat <file> ...

The simplest use of cat is to copy a file to standard output displaying the contents of a file; for example.

 cat d.txt

The following example displays a.txt, followed by b.txt and then c.txt.

 cat a.txt b.txt c.txt

The following example concatenates a.txt, b.txt and c.txt, writing the resulting file to d.txt.

 cat a.txt b.txt c.txt > d.txt

In fact, the > output redirection in the example above is performed by the command shell and interpreter, and the "> d.txt" arguments are removed before the command arguments are processed. As far the command class is concerned, this is equivalent to the previous example.

Finally, the following example displays the raw HTML for the JNode home page:
cat --urls http ://www.jnode.org/

Syntax specification
The syntax for the cat command is defined in fs/descriptors/org.jnode.fs.command.xml.

The relevant section of the document is as follows:

   39   <extension point="org.jnode.shell.syntaxes">
   40     <syntax alias="cat">
   41       <empty description="copy standard input to standard output"/>
   42       <sequence description="fetch and concatenate urls to standard output">
   43         <option argLabel="urls" shortName="u" longName="urls"/>
   44         <repeat minCount="1">
   45           <argument argLabel="url"/>
   46         </repeat>
   47       </sequence>
   48       <repeat minCount="1" description="concatenate files to standard output">
   49         <argument argLabel="file"/>
   50       </repeat>
   51     </syntax>

Line 39: "org.jnode.shell.syntaxes" is an extension point for command syntax.

Line 40: The syntax entity represents the entire syntax for a command. The alias attribute is required and associates a syntax with a command.

Line 41: When parsing a command line, the empty tag does not consume arguments. This is a description of the cat command.

Line 42: A sequence tag represents a group of options and arguments, and others.

Line 43: An option tag is a command line option, such as -u and --urls. Since -u and --urls are actually one and the same option, the argLable attribute identifies an option internally.

Line 44: An option might be used more than once on a command line. When minCount is one or more, an option is required.

Line 45: An argument tag consumes one command line argument.

Line 48: When minCount is 1, an option is required.

Line 49: An argument tag consumes one command line argument.

The cat command is implemented in CatCommand.java. The salient parts of the command's implementation are as follows.

   54     private final FileArgument ARG_FILE =
   55         new FileArgument("file", Argument.OPTIONAL | Argument.MULTIPLE,
   56                 "the files to be concatenated");

This declares a formal argument to capture JNode file/directory pathnames from the command line; see the specification of the org.jnode.shell.syntax.FileArgument. The "Argument.OPTIONAL | Argument.MULTIPLE" parameter gives the argument flags. Argument.OPTIONAL means that this argument may be optional in the syntax. The Argument.MULTIPLE means that the argument may be repeated in the syntax. Finally, the "file" label matches the "file" attribute in the XML above at line 49.

   58     private final URLArgument ARG_URL =
   59         new URLArgument("url", Argument.OPTIONAL | Argument.MULTIPLE,
   60                 "the urls to be concatenated");

This declares a formal argument to capture URLs from the command line. This matches the "url" attribute in the XML above at line 45.

   62     private final FlagArgument FLAG_URLS =
   63         new FlagArgument("urls", Argument.OPTIONAL, "If set, arguments will be urls");

This declares a formal flag that matches the "urls" attribute in the XML above at line 43.

   67     public CatCommand() {
   68         super("Concatenate the contents of files, urls or standard input to standard output");
   69         registerArguments(ARG_FILE, ARG_URL, FLAG_URLS);
   70     }

The constructor for the CatCommand registers the three formal arguments, ARG_FILE, ARG_URL and FLAG_URLS. The registerArguments() method is implemented in AbstractCommand.java. It simply adds the formal arguments to the command's ArgumentBundle, making them available to the syntax mechanism.

   79     public void execute() throws IOException {
   80         this.err = getError().getPrintWriter();
   81         OutputStream out = getOutput().getOutputStream();
   82         File[] files = ARG_FILE.getValues();
   83         URL[] urls = ARG_URL.getValues();
   84 
   85         boolean ok = true;
   86         if (urls != null && urls.length > 0) {
   87             for (URL url : urls) {
   ...
  107         } else if (files != null && files.length > 0) {
  108             for (File file : files) {
   ...
  127         } else {
  128             process(getInput().getInputStream(), out);
  129         }
  130         out.flush();
  131         if (!ok) {
  132             exit(1);
  133         }
  134     }

The "execute" method is called after the syntax processing has occurred, and after the command argument values have been converted to the relevant Java types and bound to the formals. As the code above shows, the method uses a method on the formal argument to retrieve the actual values. Other methods implemented by AbstractCommand allow the "execute" to access the command's standard input, output and error streams as Stream objects or Reader/Writer objects, and to set the command's return code.

Note: ideally the syntax of the JNode cat command should include this alternative:

 cat ( ( -u | -urls <url> ) | <file> ) ...

or even this:

 cat ( <url> | <file> ) ...

allowing <file> and <url&gt arguments to be interspersed. The problem with the first alternative syntax above is that the Argument objects do not allow the syntax to capture the complete order of the interspersed <file> and <url> arguments. In order to support this, we would need to replace ARG_FILE and ARG_URL with a suitably defined ARG_FILE_OR_URL. The problem with the second alternative syntax above is some legal <url> values are also legal <file> values, and the syntax does not allow the user to control the disambiguation.

For more information, see also org.jnode.fs.command.xml - http://jnode.svn.sourceforge.net/viewvc/jnode/trunk/fs/descriptors/org.j... .

CatCommand.java - http://jnode.svn.sourceforge.net/viewvc/jnode/trunk/fs/src/fs/org/jnode/...

Ideas for future Syntax enhancements

Here are some ideas for work to be done in this area:

Extend OptionSetSyntax to support "--" as meaning everything after here is not an option.
Make OptionSetSyntax smarter in its handling of repeated options. For example completing "cp --recursive " should not offer "--recursive" as a completion.
Improve "help", including improving the output, incorporating more descriptions from the syntax, in preference to descriptions from the Command class, and supporting multi-lingual descriptions. (In fact, we need to go a lot further ... including supporting full documentation complete with a way to specify markup and cross-references. But that's a different problem really.)
Extend the Argument APIs so that we can specify (for example) that a FileArgument should match an existing file, an existing directory, a path to an object that does not exist, etc. This potentially applies to all name arguments over dynamic namespaces.
Extend the Argument APIs to support expansion of patterns against the FS and other namespaces. This needs to be done in a way that allows the user, shell and command to control whether or not expansion occurs. We don't want commands to have to understand that there are patterns at all .... except in cases where the command needs to know (e.g. some flavours of rename command). And we also need to cater for shell languages (e.g. UNIX derived ones) where FS pattern expansion is clearly a shell responsibility.
Add support for command-specific Syntax classes; e.g. to support complex command syntaxes like UNIX style "expr" and "test" commands.
Add command syntax support for command-line interactive commands like old-school UNIX ftp and nslookup. (In JNode, we already have a tftp client that runs this way.)
Implement a compatibility library to allow JNode commands to be executed in the class Java world.

JNode Command and Syntax APIs

This page is an overview of the JNode APIs that are involved in the new syntax mechanisms. For more nitty-gritty details, please refer to the relevant javadocs.
Note:

These APIs still change a bit from time to time. (But if your code is in the JNode code base, you won't need to deal with these changes.)
The javadocs on the JNode website currently do not include the "shell" APIs.
You can generate the javadocs in a JNode build sandbox by running "./build.sh javadoc".
If the javadocs are inadequate, please let us know via a JNode "bug" request.

Java package structure

The following classes mostly reside in the "org.jnode.shell.syntax" package. The exceptions are "Command" and "AbstractCommand" which live in "org.jnode.shell". (Similarly named classes in the "org.jnode.shell.help" and "org.jnode.shell.help.args" packages are part of the old-style syntax support.)

Command
The JNode command shell (or more accurately, the command invokers) understand two entry points for launching classes as "commands". The first entry point is the "public static void main(String[])" entry point used by classic Java command line applications. When a command class has (just) a "main" method, the shell will launch it by calling the method, passing the command arguments. What happens next is up to the command class:

A non-JNode application will typically deal with the command arguments itself, or using some third party class like "gnu.getopt.GetOpt".
A JNode-aware application can also use the old-style syntax method directly, by calling a "Help.Info" object's "parse(String[])" method on the argument strings.

The preferred entry point for a JNode command class is the "Command.execute(CommandLine, InputStream, PrintStream, PrintStream)" method. On the face of it, this entry point offers a number of advantages over the "main" entry point:

The "execute" method provides command's IO streams explicitly, rather than relying on the "System.{in,out,err}" statics. (Those statics are problematic, unless you are using proclets or isolates.)
The "execute" method gives the application access to more information gleaned from the command line; e.g. the command name (alias) supplied by the user.

Unless you are using the "default" command invoker, a command class with an "execute" entry point will be invoked via that entry point, even it it also has a "main" entry point. What happens next is up to the command class:

The "execute" method may fetch the user's argument strings from the CommandLine object and do its own argument analysis.
If the command class is designed to use old-style syntax mechanisms, the "execute" method will typically call the "parse(String[])" method and proceed as described above.
If the command class is designed to use new-style syntax mechanisms, argument analysis will already have been done. This can only happen if the command class extends the AbstractCommand class; see below.

AbstractCommand

The AbstractCommand class is a base class for JNode-aware command classes. For command classes that do their own argument processing, or that use the old-stle syntax mechanisms, use of this class is optional. For commands that want to use the new-style syntax mechanisms, the command class must be a direct or indirect subclass of AbstractCommand.

The AbstractCommand class provides helper methods useful to all command class.

The "exit(int)" method can be called from the command thread terminate command execution with an return code. This is roughly equivalent to a classic Java application calling "System.exit(int)".
The "getInput()", "getOutput()", "getError()" and "getIO(int)" methods return "CommandIO" instances that can be used to get a command's "standard io" streams as
Java Input/OutputStream or Reader/Writer objects.

The "getCommandLine" method returns a CommandLine instance that holds the command's command name and unparsed arguments.

But more importantly, the AbstractCommand class provides infrastructure that is key to the new-style syntax mechanism. Specifically, the AbstractCommand maintains an ArgumentBundle for each command instance. The ArgumentBundle is created when either of the following happens:

The child class constructor chains the AbstractCommand(String) constructor. In this case an (initially) empty ArgumentBundle is created.
The child class constructor calls the "registerArgument(Argument ...)" method. In this case, an ArgumentBundle is created (if necessary) and the arguments are added to it.

If it was created, the ArgumentBundle is populated with argument values before the "execute" method is called. The existence of an ArgumentBundle determines whether the shell uses old-style or new-style syntax, for command execution and completion. (Don't try to mix the two mechanisms: it is liable to lead to inconsistent command behavior.)

Finally, the AbstractCommand class provides an "execute(String[])" method. This is intended to provide a bridge between the "main" and "execute" entry points for situations where a JNode-aware command class has to be executed via the former entry point. The "main" method should be implemented as follows:

    public static void main(String[] args) throws Exception {
        new XxxClass().execute(args);
    }

CommandIO and its implementation classes

The CommandIO interfaces and its implementation classes allow commands to obtain "standard io" streams without knowing whether the underlying data streams are byte or character oriented. This API also manages the creation of 'print' wrappers.

Argument and sub-classes

The Argument classes play a central place in the new syntax mechanism. As we have seen above, the a command class creates Argument instances to act as value holders for its formal arguments, and adds them to its ArgumentBundle. When the argument parser is invoked, traverses the command syntax and binds values to the Arguments in the bundle. When the command's "execute" entry point is called, the it can access the values bound to the Arguments.

The most important methods in the Argument API are as follows:

The "accept(Token)" method is called by the parser when it has a candidate token for the Argument. If the supplied Token is acceptable, the Argument uses "addValue(...)" to add the Token to its collection. If it is not acceptable, "SyntaxErrorException" is thrown.
The "doAccept(Token)" abstract method is called by "accept" after it has done the multiplicity checks. It is required to either return a non-null value, or throw an exception; typically SyntaxErrorException.
In completion mode, the parser calls the "complete(...)" method to get Argument specific completions for a partial argument. The "complete" method is supplied a CompletionInfo object, and should use it to record any completions.
The "isSet()", "getValue()" and "getValues()" methods are called by a command class to obtain the value of values bouond to an Argument.

The constructors for the descendent classes of Argument provide the following common parameters:

The "label" parameter provides a name for the Attribute that is used to bind the Argument to Syntax elements. It must be unique in the context of the command's ArgumentBundle.
The "flags" parameter specify the Argument's multiplicity; i.e how many values are allowed or required for the Argument. The allowed flag values are defined in the Argument class. A well-formed "flags" parameter consists of OPTIONAL or MANDATORY "or-ed" with SINGLE or MULTIPLE.
The "description" parameter gives a default description for the Argument that can be used in "help" messages.

The descendent classes of Argument correspond to different kinds of argument. For example:

StringArgument accepts any String value,
IntegerArgument accepts and (in some cases) completes an Integer value,
FileArgument accepts a pathname argument and completes it against paths for existing objects in the file system, and
DeviceArgument accepts a device name and completes it against the registered device names.

There are two abstract sub-classes of Argument:

EnumArgument accepts values for a given Java enum.
MappedArgument accepts values based on a String to value mapping supplied as a Java Map.

Please refer to the javadoc for an up-to-date list of the Argument classes.

Syntax and sub-classes

As we have seen above, Argument instances are used to specify the command class'es argument requirements. These Arguments correspond to nodes in one or more syntaxes for the command. These syntaxes are represented in memory by the Syntax classes.

A typical command class does not see Syntax objects. They are typically created by loading XML (as specified here), and are used by various components of the shell. As such, the APIs need not concern the application developer.

ArgumentBundle

This class is largely internal, and a JNode application programmer doesn't need to access it directly. Its purpose is to act as the container for the new-style Argument instances that belong to a command class instance.

MuSyntax and sub-classes

The MuSyntax class and its subclasses represent the BNF-like syntax graphs that the command argument parser actually operate on. These graphs are created by the "prepare" method of new-style Syntax objects, in two stages. The first stage is to build a tree of MuSyntax objects, using symbolic references to represent cycles. The second stage is to traverse the tree, replacing the symbolic references with their referents.

There are currently 6 kinds of MuSyntax node:

MuSymbol - this denotes a symbol (keyword) in the syntax. When a MuSymbol is match, no argument capture takes place.
MuArgument - this denotes a placeholder for an Argument in the syntax. When a MuArgument is encountered, the corresponding Argument's "accept" method is called to see if the current token is acceptable. If it is, the token is bound to the Argument; otherwise the parser starts backtracking.
MuPreset - this is a variation on a MuArgument in which a "preset" token is passed to the Argument. Unlike MuArgument and MuSymbol, a MuPreset does not cause the parser to advance to the next token.
MuSequence - this denotes that a list of child MuSyntax nodes must be matches in a given sequence.
MuAlternation - this denotes that a list of child MuSyntax nodes must be tried one at a time in a given order.
MuBackReference - this denotes a reference to an ancestor node in the MuSyntax tree. These nodes are replaced with their referents before parsing takes place.

MuParser

The MuParser class does the real work of command line parsing. The "parse" method takes input parameters that provide a MuSyntax graph, a TokenSource and some control parameters.

The parser maintains three stacks:

The "syntaxStack" holds the current "productions" waiting to be matched against the token stream.
The "choicePointStack" holds "choicePoint" objects that represent alternates that the parser hasn't tried yet. The choicepoints also record the state of the "syntaxStack" when the alternation was encountered, and top of the "argsModified" stack.
The "argsModified" stack keeps track of the Arguments that need to be "unbound" when the parser backtracks.

In normal parsing mode, the "parse" method matches tokens until either the parse is complete, or an error occurs. The parse is complete if the parser reaches the end of the token stream and discovers that the syntax stack is also empty. The "parse" method then returns, leaving the Arguments bound to the relevant source tokens. The error case occurs when a MuSyntax does not match the current token, or the parser reaches the end of the TokenSource when there are still unmached MuSyntaxes on the syntax stack. In this case, the parser backtracks to the last "choicepoint" and then resumes parsing with the next alternative. If no choicepoints are left, the parse fails.

In completion mode, the "parse" method behaves differently when it encounters the end of the TokenSource. The first thing it does is to attempt to capture a completion; e.g. by calling the current Argument's "complete(...)" method. Then itstarts backtracking to find more completions. As a result, a completion parse may do a lot more work than a normal parse.

The astute reader may be wondering what happens if the "MuParser.parse" method is applied to a pathological MuSyntax; e.g. one which loops for ever, or that requires exponential backtracking. The answer is that the "parse" method has a "stepLimit" parameter that places an upper limit on the number of main loop iterations that the parser will perform. This indirectly addresses the issue of space usage as well, though we could probably improve on this. (In theory, we could analyse the MuSyntax for common pathologies, but this would degrade parser performance for non-pathological MuSyntaxes. Besides, we are not (currently) allowing applications to supply MuSyntax graphs directly, so all we really need to do is ensure that the Syntax classes generate well-behaved MuSyntax graphs.)

Syntax and XML syntax specifications

As the parent page describes, the command syntax "picture" has two distinct parts. A command class registers Argument objects with the infrastructure to specify its formal command parameters. The concrete syntax for the command line is represented in memory by Syntax objects.

This page documents the syntactic constructs provided by the Syntax objects, and the XML syntax that provides the normal way of specifying a syntax.

You will notice that there can be a number of ways to build a command syntax from the constructs provided. This is redundancy is intentional.

The Syntax base class

The Syntax class is the abstract base class for all classes that represent high-level syntactic constructs in the "new" syntax mechanisms. A Syntax object has two (optional) attributes that are relevant to the process of specifying syntax:

The "label" attribute gives a name for the syntax node that will be used when the node is formatted; e.g. for "help" messages.
The "description" attribute gives a basic description for the syntax node.

These attributes are represented in an XML syntax element using optional XML attributes named "label" and "description" respectively.

ArgumentSyntax

An ArgumentSyntax captures one value for an Argument with a given argument label. Specifically, an ArgumentSyntax instance will cause the parser to consume one token, and to attempt to bind it to the Argument with the specified argument label in the current ArgumentBundle.

Note that many Arguments are very non-selective in the tokens that they will match. For example, while an IntegerArgument will accept "123" as valid, so will "FileArgument" and many other Argument classes. It is therefore important to take account the parser's handling of ambiguity when designing command syntaxes; see below.

Here are some ArgumentSyntax instances, as specified in XML:

    <argument argLabel="foo">
    <argument label="foo" description="this controls the command's fooing" argLabel="foo">

An EmptySyntax matches absolutely nothing. It is typically used when a command requires no arguments.

    <empty description="dir with no arguments lists the current directory">

OptionSyntax

An OptionSyntax also captures a value for an Argument, but it requires the value token to be preceded by a token that gives an option "name". The OptionSyntax class supports both short option names (e.g. "-f filename") and long option names (e.g. "--file filename"), depending on the constructor parameters.

    <option argLabel="filename" shortName="f">
    <option argLabel="filename" longName="file">
    <option argLabel="filename" shortName="f" longName="file">

If the Argument denoted by the "argLabel" is a FlagArgument, the OptionSyntax matches just an option name (short or long depending on the attributes).

SymbolSyntax

A SymbolSyntax matches a single token from the command line without capturing any Argument value.

<symbol symbol="subcommand1">

VerbSyntax

A VerbSyntax matches a single token from the command line, setting an associated Argument's value to "true".

    <verb symbol="subcommand1" argLabel="someArg">

SequenceSyntax

A SequenceSyntax matches a list of child Syntaxes in the order specified.

    <sequence description="the input and output files">
        <argument argLabel="input"/>
        <argument argLabel="output"/>
    </sequence>

AlternativesSyntax

An AlternativesSyntax matches one of a list of alternative child Syntaxes. The child syntaxes are tried one at a time in the order specified until one is found that matches the tokens.

    <alternatives description="specify an input or output file">
        <option shortName="i" argLabel="input"/>
        <option shortName="o" argLabel="output"/>
    </alternatives>

RepeatSyntax

A RepeatSyntax matches a single child Syntax repeated a number of times. By default, any number of matches (including zero) will satisfy a RepeatSyntax. The number of required and allowed repetitions can be constrained using the "minCount" and "maxCount" attributes. The default behavior is to match lazily; i.e. to match as few instances of the child syntax as is possible. Setting the attribute eager="true" causes the powerset to match as many child instances as possible, within the constraints of the "minCount" and "maxCount" attributes.

    <repeat description="zero or more files">
        <argument argLabel="file"/>
    </repeat>

    <repeat minCount="1" description="one or more files">
        <argument argLabel="file"/>
    </repeat>

    <repeat maxCount="5" eager="true" 
              description="as many files as possible, up to 5">
        <argument argLabel="file"/>
    </repeat>

OptionalSyntax

An OptionalSyntax optionally matches a sequence of child Syntaxes; i.e. it matches nothing or the sequence. The default behavior is to match lazily; i.e. to try the "nothing" case first. Setting the attribute eager="true" causes the "nothing" case to be tried second.

    <optional description="nothing, or an input file and an output file">
        <argument argLabel="input"/>
        <argument argLabel="output"/>
    </optional>

    <optional eager="true"
                 description="an input file and an output file, or nothing">
        <argument argLabel="input"/>
        <argument argLabel="output"/>
    </optional>

PowerSetSyntax

A PowerSetSyntax takes a list of child Syntaxes and matches any number of each of them in any order or any interleaving. The default behavior is to match lazily; i.e. to match as few instances of the child syntax as is possible. Setting the attribute eager="true" causes the powerset to match as many child instances as possible.

    <powerSet description="any number of inputs and outputs">
        <option argLabel="input" shortName="i"/>
        <option argLabel="output" shortName="o"/>
    </powerSet>

OptionSetSyntax

An OptionSetSyntax is like a PowerSetSyntax with the restriction that the child syntaxes must all be OptionSyntax instances. But what OptionSetSyntax different is that it allows options for FlagArguments to be combined in the classic Unix idiom; i.e. "-a -b" can be written as "-ab".

    <optionSet description="flags and value options">
        <option argLabel="flagOne" shortName="1"/>
        <option argLabel="flagTwo" shortName="2"/>
        <option argLabel="argThree" shortName="3"/>
    </optionSet>

Assuming that the "flagOne" and "flagTwo" correspond to FlagArguments, and "argThree" corresponds to (say) a FileArgument, the above syntax will match any of the following: "-1 -2 -3 three", "-12 -3 three", "-1 -3 three -1", "-3 three" or even an empty argument list.

The <syntax ... > element

The outermost element of an XML Syntax specification is the <syntax> element. This element has a mandatory "alias" attribute which associates the syntax with an alias that is in force for the shell. The actual syntax is given by the <syntax> element's zero or more child elements. These must be XML elements representing Syntax sub-class instances, as described above. Conceptually, each of the child elements represents an alternative syntax for the command denoted by the alias.

Here are some examples of complete syntaxes:

    <syntax alias="cpuid">
        <empty description="output the computer's id">
    </syntax>

    <syntax alias="dir">
        <empty description="list the current directory"/>
        <argument argLabel="directory" description="list the given directory"/>
    </syntax>

Ambiguous Syntax specifications

If you have implemented a language grammar using a parser generator (like Yacc, Bison, AntLR and so on), we will recall how the parser generator could be very picky about your input grammar. For example, these tools will often complain about "shift-reduce" or "reduce-reduce" conflicts. This is a parser generator's way of saying that the grammar appears (to it) to be ambiguous.

The new-style command syntax parser takes a different approach. Basically, it does not care if a command syntax supports multiple interpretations of a command line. Instead, it uses a simple runtime strategy to resolve ambiguity: the first complete parse "wins".

Since the syntax mechanisms don't detect ambiguity, it is up to the syntax designer to be aware of the issue, and take it into account when designing the syntax. Here is an example:

    <alternatives>
        <argument argLabel="number">
        <argument argLabel="file">
    </alternatives>

Assuming that "number" refers to an IntegerArgument, and "file" refers to a FileArgument, the syntax above is actually ambiguous. For example, a parser could in theory bind "123" to the IntegerArgument or the FileArgument. In practice, the new-style command argument parser will pick the first alternative that gives a complete parse, and bind "123" to the IntegerArgument. If you (the syntax designer) don't want this (e.g. because you want the command to work for all legal filenames), you will need to use OptionSyntax or TokenSyntax or something else to allow the end user to force a particular interpretation.

SyntaxSpecLoader and friends

More about the Syntax base class.

If you are planning on defining new sub-classes of Syntax, the two key behavioral methods that must be implemented are as follows:

The "prepare" method is responsible for translating the syntax node into the MuSyntax graph that will be used by the parser. This will typically be produced by preparing any child syntaxes, and assembling them using the appropriate MuSyntax constructors. If the syntax entails recursion at the MuSyntax levels, this will initially be expressed using MuBackReferences. The recursion points will then transformed into graph cycles by calling "MuSyntax.resolveBackReferences()".
Another technique that can be used is to introduce "synthetic" Argument nodes with special semantics. For example, the OptionSetSyntax uses a special Argument class to deal with combined short options; e.g. where "-a -b" is expressed as "-ab".
The "format" method renders the Syntax in a form that is suitable for "usage" messages.
The "toXML" method creates a "nanoxml.XMLElement" that expresses this Syntax node in XML form. It is used by the "SyntaxCommand" class when it dumps an syntax specification as text. It is important that "toXML" produces XML that is compatible with the SyntaxSpecLoader class; see below.