In the classical Java world, a command line application is launched by calling the "main" entry point method on a nominated class, passing the user's command arguments as an array of Strings. The command is responsible for working out which arguments represent options, which represent parameters and so on. While there are (non-Sun) libraries to help with this task (like the Java version of GNU getOpts), they are rather primitive.
In JNode, we take a more sophisticated approach to the issue of command arguments. A native JNode command specifies its formal arguments and command line syntax. The task of matching actual command line arguments is performed by JNode library classes. This approach offers a number of advantages over the classical Java approach:
In addition, this approach allows us to do some things at the Shell level that are difficult with (for example) UNIX style shells.
As the above suggests, there are two versions of JNode command syntax and associated mechanisms; i.e parsing, completion, help and so on. In the first version (the "old" mechanisms) the application class declares a static Argument object for each formal parameter, and creates a static "Help.Info" data structure containing Syntax objects that reference the Arguments. The command line parser and completer traverse the data structures, binding values to the Arguments.
The problems with the "old" mechanisms include:
The second version (the "new" mechanisms) are a ground-up redesign and reimplementation:
(This example is based on material provided by gchii)
The cat command is a JNode file system command for the concatenation of files.
The alternative command line syntaxes for the command are as follows:
cat cat -u | -urls <url> ... | cat <file> ...
The simplest use of cat is to copy a file to standard output displaying the contents of a file; for example.
cat d.txt
The following example displays a.txt, followed by b.txt and then c.txt.
cat a.txt b.txt c.txt
The following example concatenates a.txt, b.txt and c.txt, writing the resulting file to d.txt.
cat a.txt b.txt c.txt > d.txt
In fact, the > output redirection in the example above is performed by the command shell and interpreter, and the "> d.txt" arguments are removed before the command arguments are processed. As far the command class is concerned, this is equivalent to the previous example.
Finally, the following example displays the raw HTML for the JNode home page:
cat --urls http ://www.jnode.org/
Syntax specification
The syntax for the cat command is defined in fs/descriptors/org.jnode.fs.command.xml.
The relevant section of the document is as follows:
39 <extension point="org.jnode.shell.syntaxes"> 40 <syntax alias="cat"> 41 <empty description="copy standard input to standard output"/> 42 <sequence description="fetch and concatenate urls to standard output"> 43 <option argLabel="urls" shortName="u" longName="urls"/> 44 <repeat minCount="1"> 45 <argument argLabel="url"/> 46 </repeat> 47 </sequence> 48 <repeat minCount="1" description="concatenate files to standard output"> 49 <argument argLabel="file"/> 50 </repeat> 51 </syntax>
Line 39: "org.jnode.shell.syntaxes" is an extension point for command syntax.
Line 40: The syntax entity represents the entire syntax for a command. The alias attribute is required and associates a syntax with a command.
Line 41: When parsing a command line, the empty tag does not consume arguments. This is a description of the cat command.
Line 42: A sequence tag represents a group of options and arguments, and others.
Line 43: An option tag is a command line option, such as -u and --urls. Since -u and --urls are actually one and the same option, the argLable attribute identifies an option internally.
Line 44: An option might be used more than once on a command line. When minCount is one or more, an option is required.
Line 45: An argument tag consumes one command line argument.
Line 48: When minCount is 1, an option is required.
Line 49: An argument tag consumes one command line argument.
The cat command is implemented in CatCommand.java. The salient parts of the command's implementation are as follows.
54 private final FileArgument ARG_FILE = 55 new FileArgument("file", Argument.OPTIONAL | Argument.MULTIPLE, 56 "the files to be concatenated");
This declares a formal argument to capture JNode file/directory pathnames from the command line; see the specification of the org.jnode.shell.syntax.FileArgument. The "Argument.OPTIONAL | Argument.MULTIPLE" parameter gives the argument flags. Argument.OPTIONAL means that this argument may be optional in the syntax. The Argument.MULTIPLE means that the argument may be repeated in the syntax. Finally, the "file" label matches the "file" attribute in the XML above at line 49.
58 private final URLArgument ARG_URL = 59 new URLArgument("url", Argument.OPTIONAL | Argument.MULTIPLE, 60 "the urls to be concatenated");
This declares a formal argument to capture URLs from the command line. This matches the "url" attribute in the XML above at line 45.
62 private final FlagArgument FLAG_URLS = 63 new FlagArgument("urls", Argument.OPTIONAL, "If set, arguments will be urls");
This declares a formal flag that matches the "urls" attribute in the XML above at line 43.
67 public CatCommand() { 68 super("Concatenate the contents of files, urls or standard input to standard output"); 69 registerArguments(ARG_FILE, ARG_URL, FLAG_URLS); 70 }
The constructor for the CatCommand registers the three formal arguments, ARG_FILE, ARG_URL and FLAG_URLS. The registerArguments() method is implemented in AbstractCommand.java. It simply adds the formal arguments to the command's ArgumentBundle, making them available to the syntax mechanism.
79 public void execute() throws IOException { 80 this.err = getError().getPrintWriter(); 81 OutputStream out = getOutput().getOutputStream(); 82 File[] files = ARG_FILE.getValues(); 83 URL[] urls = ARG_URL.getValues(); 84 85 boolean ok = true; 86 if (urls != null && urls.length > 0) { 87 for (URL url : urls) { ... 107 } else if (files != null && files.length > 0) { 108 for (File file : files) { ... 127 } else { 128 process(getInput().getInputStream(), out); 129 } 130 out.flush(); 131 if (!ok) { 132 exit(1); 133 } 134 }
The "execute" method is called after the syntax processing has occurred, and after the command argument values have been converted to the relevant Java types and bound to the formals. As the code above shows, the method uses a method on the formal argument to retrieve the actual values. Other methods implemented by AbstractCommand allow the "execute" to access the command's standard input, output and error streams as Stream objects or Reader/Writer objects, and to set the command's return code.
Note: ideally the syntax of the JNode cat command should include this alternative:
cat ( ( -u | -urls <url> ) | <file> ) ...
or even this:
cat ( <url> | <file> ) ...
allowing <file> and <url> arguments to be interspersed. The problem with the first alternative syntax above is that the Argument objects do not allow the syntax to capture the complete order of the interspersed <file> and <url> arguments. In order to support this, we would need to replace ARG_FILE and ARG_URL with a suitably defined ARG_FILE_OR_URL. The problem with the second alternative syntax above is some legal <url> values are also legal <file> values, and the syntax does not allow the user to control the disambiguation.
For more information, see also org.jnode.fs.command.xml - http://jnode.svn.sourceforge.net/viewvc/jnode/trunk/fs/descriptors/org.j... .
CatCommand.java - http://jnode.svn.sourceforge.net/viewvc/jnode/trunk/fs/src/fs/org/jnode/...
Here are some ideas for work to be done in this area:
This page is an overview of the JNode APIs that are involved in the new syntax mechanisms. For more nitty-gritty details, please refer to the relevant javadocs.
Note:
Java package structure
The following classes mostly reside in the "org.jnode.shell.syntax" package. The exceptions are "Command" and "AbstractCommand" which live in "org.jnode.shell". (Similarly named classes in the "org.jnode.shell.help" and "org.jnode.shell.help.args" packages are part of the old-style syntax support.)
Command
The JNode command shell (or more accurately, the command invokers) understand two entry points for launching classes as "commands". The first entry point is the "public static void main(String[])" entry point used by classic Java command line applications. When a command class has (just) a "main" method, the shell will launch it by calling the method, passing the command arguments. What happens next is up to the command class:
The preferred entry point for a JNode command class is the "Command.execute(CommandLine, InputStream, PrintStream, PrintStream)" method. On the face of it, this entry point offers a number of advantages over the "main" entry point:
Unless you are using the "default" command invoker, a command class with an "execute" entry point will be invoked via that entry point, even it it also has a "main" entry point. What happens next is up to the command class:
AbstractCommand
The AbstractCommand class is a base class for JNode-aware command classes. For command classes that do their own argument processing, or that use the old-stle syntax mechanisms, use of this class is optional. For commands that want to use the new-style syntax mechanisms, the command class must be a direct or indirect subclass of AbstractCommand.
The AbstractCommand class provides helper methods useful to all command class.
The "getCommandLine" method returns a CommandLine instance that holds the command's command name and unparsed arguments.
But more importantly, the AbstractCommand class provides infrastructure that is key to the new-style syntax mechanism. Specifically, the AbstractCommand maintains an ArgumentBundle for each command instance. The ArgumentBundle is created when either of the following happens:
If it was created, the ArgumentBundle is populated with argument values before the "execute" method is called. The existence of an ArgumentBundle determines whether the shell uses old-style or new-style syntax, for command execution and completion. (Don't try to mix the two mechanisms: it is liable to lead to inconsistent command behavior.)
Finally, the AbstractCommand class provides an "execute(String[])" method. This is intended to provide a bridge between the "main" and "execute" entry points for situations where a JNode-aware command class has to be executed via the former entry point. The "main" method should be implemented as follows:
public static void main(String[] args) throws Exception { new XxxClass().execute(args); }
CommandIO and its implementation classes
The CommandIO interfaces and its implementation classes allow commands to obtain "standard io" streams without knowing whether the underlying data streams are byte or character oriented. This API also manages the creation of 'print' wrappers.
Argument and sub-classes
The Argument classes play a central place in the new syntax mechanism. As we have seen above, the a command class creates Argument instances to act as value holders for its formal arguments, and adds them to its ArgumentBundle. When the argument parser is invoked, traverses the command syntax and binds values to the Arguments in the bundle. When the command's "execute" entry point is called, the it can access the values bound to the Arguments.
The most important methods in the Argument API are as follows:
The constructors for the descendent classes of Argument provide the following common parameters:
The descendent classes of Argument correspond to different kinds of argument. For example:
There are two abstract sub-classes of Argument:
Please refer to the javadoc for an up-to-date list of the Argument classes.
Syntax and sub-classes
As we have seen above, Argument instances are used to specify the command class'es argument requirements. These Arguments correspond to nodes in one or more syntaxes for the command. These syntaxes are represented in memory by the Syntax classes.
A typical command class does not see Syntax objects. They are typically created by loading XML (as specified here), and are used by various components of the shell. As such, the APIs need not concern the application developer.
ArgumentBundle
This class is largely internal, and a JNode application programmer doesn't need to access it directly. Its purpose is to act as the container for the new-style Argument instances that belong to a command class instance.
MuSyntax and sub-classes
The MuSyntax class and its subclasses represent the BNF-like syntax graphs that the command argument parser actually operate on. These graphs are created by the "prepare" method of new-style Syntax objects, in two stages. The first stage is to build a tree of MuSyntax objects, using symbolic references to represent cycles. The second stage is to traverse the tree, replacing the symbolic references with their referents.
There are currently 6 kinds of MuSyntax node:
MuParser
The MuParser class does the real work of command line parsing. The "parse" method takes input parameters that provide a MuSyntax graph, a TokenSource and some control parameters.
The parser maintains three stacks:
In normal parsing mode, the "parse" method matches tokens until either the parse is complete, or an error occurs. The parse is complete if the parser reaches the end of the token stream and discovers that the syntax stack is also empty. The "parse" method then returns, leaving the Arguments bound to the relevant source tokens. The error case occurs when a MuSyntax does not match the current token, or the parser reaches the end of the TokenSource when there are still unmached MuSyntaxes on the syntax stack. In this case, the parser backtracks to the last "choicepoint" and then resumes parsing with the next alternative. If no choicepoints are left, the parse fails.
In completion mode, the "parse" method behaves differently when it encounters the end of the TokenSource. The first thing it does is to attempt to capture a completion; e.g. by calling the current Argument's "complete(...)" method. Then itstarts backtracking to find more completions. As a result, a completion parse may do a lot more work than a normal parse.
The astute reader may be wondering what happens if the "MuParser.parse" method is applied to a pathological MuSyntax; e.g. one which loops for ever, or that requires exponential backtracking. The answer is that the "parse" method has a "stepLimit" parameter that places an upper limit on the number of main loop iterations that the parser will perform. This indirectly addresses the issue of space usage as well, though we could probably improve on this. (In theory, we could analyse the MuSyntax for common pathologies, but this would degrade parser performance for non-pathological MuSyntaxes. Besides, we are not (currently) allowing applications to supply MuSyntax graphs directly, so all we really need to do is ensure that the Syntax classes generate well-behaved MuSyntax graphs.)
As the parent page describes, the command syntax "picture" has two distinct parts. A command class registers Argument objects with the infrastructure to specify its formal command parameters. The concrete syntax for the command line is represented in memory by Syntax objects.
This page documents the syntactic constructs provided by the Syntax objects, and the XML syntax that provides the normal way of specifying a syntax.
You will notice that there can be a number of ways to build a command syntax from the constructs provided. This is redundancy is intentional.
The Syntax base class
The Syntax class is the abstract base class for all classes that represent high-level syntactic constructs in the "new" syntax mechanisms. A Syntax object has two (optional) attributes that are relevant to the process of specifying syntax:
These attributes are represented in an XML syntax element using optional XML attributes named "label" and "description" respectively.
ArgumentSyntax
An ArgumentSyntax captures one value for an Argument with a given argument label. Specifically, an ArgumentSyntax instance will cause the parser to consume one token, and to attempt to bind it to the Argument with the specified argument label in the current ArgumentBundle.
Note that many Arguments are very non-selective in the tokens that they will match. For example, while an IntegerArgument will accept "123" as valid, so will "FileArgument" and many other Argument classes. It is therefore important to take account the parser's handling of ambiguity when designing command syntaxes; see below.
Here are some ArgumentSyntax instances, as specified in XML:
<argument argLabel="foo"> <argument label="foo" description="this controls the command's fooing" argLabel="foo">
An EmptySyntax matches absolutely nothing. It is typically used when a command requires no arguments.
<empty description="dir with no arguments lists the current directory">
OptionSyntax
An OptionSyntax also captures a value for an Argument, but it requires the value token to be preceded by a token that gives an option "name". The OptionSyntax class supports both short option names (e.g. "-f filename") and long option names (e.g. "--file filename"), depending on the constructor parameters.
<option argLabel="filename" shortName="f"> <option argLabel="filename" longName="file"> <option argLabel="filename" shortName="f" longName="file">
If the Argument denoted by the "argLabel" is a FlagArgument, the OptionSyntax matches just an option name (short or long depending on the attributes).
SymbolSyntax
A SymbolSyntax matches a single token from the command line without capturing any Argument value.
<symbol symbol="subcommand1">
VerbSyntax
A VerbSyntax matches a single token from the command line, setting an associated Argument's value to "true".
<verb symbol="subcommand1" argLabel="someArg">
SequenceSyntax
A SequenceSyntax matches a list of child Syntaxes in the order specified.
<sequence description="the input and output files"> <argument argLabel="input"/> <argument argLabel="output"/> </sequence>
AlternativesSyntax
An AlternativesSyntax matches one of a list of alternative child Syntaxes. The child syntaxes are tried one at a time in the order specified until one is found that matches the tokens.
<alternatives description="specify an input or output file"> <option shortName="i" argLabel="input"/> <option shortName="o" argLabel="output"/> </alternatives>
RepeatSyntax
A RepeatSyntax matches a single child Syntax repeated a number of times. By default, any number of matches (including zero) will satisfy a RepeatSyntax. The number of required and allowed repetitions can be constrained using the "minCount" and "maxCount" attributes. The default behavior is to match lazily; i.e. to match as few instances of the child syntax as is possible. Setting the attribute eager="true" causes the powerset to match as many child instances as possible, within the constraints of the "minCount" and "maxCount" attributes.
<repeat description="zero or more files"> <argument argLabel="file"/> </repeat> <repeat minCount="1" description="one or more files"> <argument argLabel="file"/> </repeat> <repeat maxCount="5" eager="true" description="as many files as possible, up to 5"> <argument argLabel="file"/> </repeat>
OptionalSyntax
An OptionalSyntax optionally matches a sequence of child Syntaxes; i.e. it matches nothing or the sequence. The default behavior is to match lazily; i.e. to try the "nothing" case first. Setting the attribute eager="true" causes the "nothing" case to be tried second.
<optional description="nothing, or an input file and an output file"> <argument argLabel="input"/> <argument argLabel="output"/> </optional> <optional eager="true" description="an input file and an output file, or nothing"> <argument argLabel="input"/> <argument argLabel="output"/> </optional>
PowerSetSyntax
A PowerSetSyntax takes a list of child Syntaxes and matches any number of each of them in any order or any interleaving. The default behavior is to match lazily; i.e. to match as few instances of the child syntax as is possible. Setting the attribute eager="true" causes the powerset to match as many child instances as possible.
<powerSet description="any number of inputs and outputs"> <option argLabel="input" shortName="i"/> <option argLabel="output" shortName="o"/> </powerSet>
OptionSetSyntax
An OptionSetSyntax is like a PowerSetSyntax with the restriction that the child syntaxes must all be OptionSyntax instances. But what OptionSetSyntax different is that it allows options for FlagArguments to be combined in the classic Unix idiom; i.e. "-a -b" can be written as "-ab".
<optionSet description="flags and value options"> <option argLabel="flagOne" shortName="1"/> <option argLabel="flagTwo" shortName="2"/> <option argLabel="argThree" shortName="3"/> </optionSet>
Assuming that the "flagOne" and "flagTwo" correspond to FlagArguments, and "argThree" corresponds to (say) a FileArgument, the above syntax will match any of the following: "-1 -2 -3 three", "-12 -3 three", "-1 -3 three -1", "-3 three" or even an empty argument list.
The <syntax ... > element
The outermost element of an XML Syntax specification is the <syntax> element. This element has a mandatory "alias" attribute which associates the syntax with an alias that is in force for the shell. The actual syntax is given by the <syntax> element's zero or more child elements. These must be XML elements representing Syntax sub-class instances, as described above. Conceptually, each of the child elements represents an alternative syntax for the command denoted by the alias.
Here are some examples of complete syntaxes:
<syntax alias="cpuid"> <empty description="output the computer's id"> </syntax> <syntax alias="dir"> <empty description="list the current directory"/> <argument argLabel="directory" description="list the given directory"/> </syntax>
Ambiguous Syntax specifications
If you have implemented a language grammar using a parser generator (like Yacc, Bison, AntLR and so on), we will recall how the parser generator could be very picky about your input grammar. For example, these tools will often complain about "shift-reduce" or "reduce-reduce" conflicts. This is a parser generator's way of saying that the grammar appears (to it) to be ambiguous.
The new-style command syntax parser takes a different approach. Basically, it does not care if a command syntax supports multiple interpretations of a command line. Instead, it uses a simple runtime strategy to resolve ambiguity: the first complete parse "wins".
Since the syntax mechanisms don't detect ambiguity, it is up to the syntax designer to be aware of the issue, and take it into account when designing the syntax. Here is an example:
<alternatives> <argument argLabel="number"> <argument argLabel="file"> </alternatives>
Assuming that "number" refers to an IntegerArgument, and "file" refers to a FileArgument, the syntax above is actually ambiguous. For example, a parser could in theory bind "123" to the IntegerArgument or the FileArgument. In practice, the new-style command argument parser will pick the first alternative that gives a complete parse, and bind "123" to the IntegerArgument. If you (the syntax designer) don't want this (e.g. because you want the command to work for all legal filenames), you will need to use OptionSyntax or TokenSyntax or something else to allow the end user to force a particular interpretation.
SyntaxSpecLoader and friends
More about the Syntax base class.
If you are planning on defining new sub-classes of Syntax, the two key behavioral methods that must be implemented are as follows: