Support more complicated command syntaxes

Project:JNode Shell
Category:feature request

It is becoming apparent that the JNode command syntax mechanisms currently cannot handle some of the more complicated command syntaxes that we want to support. Here are some examples:

"cat a -u file:b c" - we cannot currently bind to FileArgument and URLArgument and also capture the order of the arguments.

"expr 1 + ( 2 \* 3 )", etcetera - we cannot express the "expr" command syntax in a way that reflects the expression grammar. Even if we could, the current Argument binding scheme cannot capture the parse tree.

The UNIX "test" and "find" have the same problem as "expr".

As the first step to addressing these shortcomings, I'd like to gather more examples. If JNode developers could please contribute problem syntaxes / commands as comments, that would be a great help. Thanks.


First off, these are just some ideas i had and any implementation details are just for expression.

One thing i thought of was options that have a positive and negative form. Alot of --option flags also have an --no-option format that basically 'unsets' the flag. If --option were a FlagArgument, then one could check the flag with isSet, if the --no-option flag was given, then isSet would return false.

This might be as simple as

<option argLabel="option" longName="option" negName="no-option"/>

Along with this, we might also wish to be able to set a default :

<option argLabel="option" longName="option" negName="no-option" default="true"/>

Here we initially say that the option is set, even if not specified, and have it unset if --no-option is given.
These two additions would be mostly aimed at FlagArgument, although the default parameter could also be used with other Argument types.

Concerning the cat example. Maybe something like...

<repeat label="files">
    <match shortName="u" longName="urls"/>
      <break label="files"/>
  <argument argLabel="files"/>
  <argument argLabel="urls"/>

The idea here is to give some control over the syntax parser. We dont actually need to have a FlagArgument for -u | --urls, its only purpose is to specify that arguments after it are to go to a different argLabel or in this case, tells the parser to stop adding arguments to 'files' and send the rest to 'urls'.


Another example is when options affect a list of arguments but we want to handle the options in the command, as it may just be too complex for the syntax. In the command, when we iterate over the list, we want to know what options were set before that argument was parsed.

  <argument argLabel="patterns">
    <option label="recurse" longName="recurse" negName="no-recurse"/>
    <option label="anchor" longName="anchor" negName="no-anchor"/>
    <option label="ignore-case longName="ignore-case" negName="no-anchor"/>

From the Command side, we would need some way of querying what modifiers were set for each argument in pattern. I'm not sure how this could be done, at the moment, we would need some way, within the Argument, to 'attach' any modifiers. Perhaps a different Argument type is needed for this in which the getValues() returns a Map. The keys of the map would be of the type of the Argument type used (StringArgument has String keys), and the values would be an Argument[] that may have 0 or more arguments if that argument had any of the listed modifier flags before it.

So lets say we have the command:

$ tar -xf foo.tbz2 --ignore-case '*.h' --no-ignore-case '*.c'

Initially in TarCommand i would have a boolean ignore-case set to whatever default value i want, in this case we'll set it to false. When parsing the list of arguments i could have code something like:

for(Map.Entry arg : patterns.getValues()) {


What about providing the ability to define a set of syntax, and allow it to be included into a section simply by name.

<define label="common_options">
  <option ...>
  <option ...>

and then where ever i want that block of options i can simply

<include label="common_options">

Not a high priority item, but it is something that might be handy when syntaxes grow to greater complexity. For example, with tar, there are 8 operating modes. Each mode allows a large set of options that are common across all 8 modes, but some options are specific to certain modes. At the top of the syntax tree, i could have 8 sequences, one for each mode, and be able to <include> the common options in the optionSet for each sequence, and also list options that are specific to that sequence.

  <option argName="create" shortName="c"/>
    <include label="common_options"/>
    //create specific options
  <option argName="extract" shortName="x"/>
    <include label="common_options"/>
    //extract specific options


Allow options to have a value paramater. This could be used to bundle up multiple flag arguments into a common argument, like an IntegerArgument.

A couple of examples are the mode flags of tar, and the compression level flags of gzip/bzip. Instead of specifying FlagArguments for each flag, they could all have the same argLabel, and a different value.

<option argLabel="mode" shortName="c" value="1"/>
<option argLabel="mode" shortName="x" value="2"/>


<option argLabel="clevel" shortName="1" value="1"/>
<option argLabel="clevel" shortName="9" value="9"/>

and have associated IntegerArguments like:

IntegerArgument Mode = new IntegerArgument("mode", Argument.MANDITORY | Argument.NONEXISTANT, "");
IntegerArgument CLevel = new IntegerArgument("clevel", Argument.OPTIONAL, "");


Another feature, allowing files to be specified on the command line to in included into the set of options. Many commands have some option flag that includes a file with a list of arguments, allowing the file to be included into the command line at that position, as if the user had typed each of the elements themselves.

<includeFile shortName="T" longName="--file-list"/>

Would read the file given to -T | --file-list and parse the arguments as if they appeared on the command line in place of -T.
Like i said, this is just some examples off the top of my head, not necessarily an example of how to implement, just a means of reaching an end.


Status:active» patch (comments requested)

This patch updates OptionSyntax to allow negName and value attributes.

negName is only really valid with FlagArgument. As such FlagArgument is updated to read the token to determine what boolean value to set.

the value attribute in this patch only affects IntegerArgument, but it could probably be applied to more Argument types, but IntArg was all i felt necessary for the time. If value is present in an option, then instead of a MuArgument, a MuPreset is used, with the preset of the value.


Had a second thought about this. I think both of these attributes can be satisfied with the value attribute. To support the negative version of a flag, you could :

<option argName="option" longName="option">
<option argName="option" longName="--no-option" value="false">

Leave the update to FlagArgument.

I'll work up another patch that does that, implementing negName is having its issues, and adds alot of clutter to OptionSyntax anyway.


New patch up.

OptionSyntax-value.patch9.93 KB


Sorry cluster, I've been offline for a couple of days. I'll look at this tonight.


I think the following should work for cat, and might provide some insight into building more complex syntaxes.

  <repeat minCount="1">
          <symbol symbol="-u"/>
          <symbol symbol="--urls"/>
        <repeat minCount="1">
          <argument argLabel="url"/>
      <argument argLabel="file"/>

Which can basically be read like :

for (int i = 0; i < symbols.length; i++ ) {
  Symbol s = symbols[i];
  if (s.equals("-u") || s.equals("--urls") {
    if (++i >= symbols.length) {
    for (; i < symbols.length; i++) {
      //parse url
  } else {
    //parse file

The only problem is, im not sure if it quite formats properly for help, its backwards. Help outputs

cat ( -u | --urls  ... ) |  ...

Which is correct to the exact ordering and definition of the syntax, but the parser doesnt see it that way. The parser sees it as:

cat  ... [ ( -u | --urls  ... ) ]

But im not sure how to make it 'know' that it is better to read the alternatiives backwards.


I cannot help thinking that you are missing the root problem for "cat ... -u ...". The CatCommand needs to be able to tell the difference between "cat a -u http://xxx b" and "cat a b -u http://xxx". AFAIK, any solution that captures Files and URLs in separate Argument instances will not actually solve the problem.

What (I think) we need for the 'cat' use-case is a compound argument class that will accept either a File or a URL, depending on whether a "-u" / "--url" token precedes the token that is currently being accepted ... or something else if the user chooses to use a different syntax.


New patch, this expands on the values patch, and allows flags to overwrite the values of other flags. It works for flag/integer arguments only atm and makes use of the EXISTING argument flag.

Basically, if an argument is created without the MULTIPLE flag, but with the EXISTING flag, and is of type flag or integer, then finding the argument a second time on the command line will cause an overwrite of its current value.

So you can do something like

command --option --no-option

With a syntax of

<option argLabel="option" longName="option"/>
<option argLabel="option" longName="no-option" value="false"/>

and the argument with the label 'option' will have a value of false, not true.

This works equally well when you have a list of flags that correspond to the same argument, but with different integer values. Like when specifying compression level with flags -1...-9. Create the argument with the existing flag, and create an option for each, tied to the same label, with a value of 1-9.

This patch includes the previous patch, and has been rebased to r5231

syntax-value_overwrite.patch12.29 KB


Cluster: could you please read and respond to my comment #5?

Your recent checkin for the 'cat' syntax and CatCommand makes the program behave differently to what people would expect. For example, a user would expect

   cat a -u http://xxx c

to concatenate the file 'a' then the url 'http://xxx' then the file 'c'. In fact the current implementation of 'cat' will concatenate 'http://xxx' then 'a' then 'c'. To my mind, this means that your checkin breaks 'cat', and should be backed out .... until we figure out how to capture the file/url sequence intended by the user.


I dont see how that commit 'breaks' cat, it never worked like that before. Before you had two options, cat -u [urls] or cat [files], you could not mix the two. Now you are able to cat [files] -u [urls]. I'll commit a change that allows mixed files and urls. But i have to ask, why does cat even have the option to fetch urls? Aside from all the things that could go wrong fetching urls, wget has many more features, and is specifically designed for url fetching. Imo, wget should be doing the url fetching, then cat can work as its supposed to, once the remote files are on the local filesystem.

Im starting to think too that, as far as syntax is concerned, perhaps the complex syntax tree required for expr, test and friends should be put the hands of a capable auxiliary class that is designed for creating abstract syntax trees from a series of tokens. We can probably even reuse code from another library for this task, there ought to be plenty to choose from.

Bottom line is, where do we draw the line between command line syntax parsing, and abstract syntax parsing that would make the command line parsing too complex, and should therefore be done by an auxiliary class or library.


But i have to ask, why does cat even have the option to fetch urls? Aside from all the things that could go wrong fetching urls, wget has many more features, and is specifically designed for url fetching. Imo, wget should be doing the url fetching, then cat can work as its supposed to, once the remote files are on the local filesystem.

Actually, I entirely agree with you. IMO, getting rid of the "-u" option is the best solution in the long term for 'cat'.

Re: your 2nd to last paragraph. Creating a Syntax from a tokenized stream (rather than XML) won't make much difference IMO ... it is just using a different concrete syntax for the same abstract language. OTOH, if you create a MuSyntax from a tokenized stream, the stream must effectively be some variant of BNF (Backus-Naur form). The reason I went for the current two-level approach was that I thought that developers would be more comfortable with this than a BNF-like language. Besides, the 2-level approach makes it easy to generate UNIX-line "usage" descriptions for use in "help" etc. That would be much harder starting with a BNF-style syntax specification.

Re: your "bottom line". You are right, but the question is where should the line be drawn. And assuming that (say) the "expr" command did its own argument parsing, how would it interact with JNode syntax completion. (You will notice of course that completion currently doesn't work for "expr" arguments!!)


Should command completion work for things like expr? I could see it being usable for the string operators, but other than that your only dealing with single character tokens, none of which i would expect tab completion for, and the tokens for strings and numbers, you cant complete against anyway...

Its no different than alot of commands are going to be really, you couldnt expext much in the way of command completion for awk, sed or grep internal syntaxes. I wouldnt see expr as being any worse off w/o it.

I think you missed my meaning when i was suggesting an auxiliary class/library for handling abstract syntax trees for expr and friends. I didnt mean that we should write a new syntax mechanism for handling it, i meant that we should be able to find a library already written. There must be multiple libraries capable of this, i'll see what i can dig up.

As for drawing the line. I dont think the MuSyntax parser should evolve to the level of being able to recognize precedence amongst tokens, at least not yet. In order to do expr, you would need something like a BinaryArgument that has a concept of left and right. In order to do this would either require multiple passes over the tokens, or major syntax changes that allowed specifying precedence, which still requires the ability to look forward, to know wether the next value is the right side of this operator, or the left side of the next, and i think this is beyond the scope of the MuSyntax, at least for now. Maybe once we're more certain that the MuParser and its collection of buddies is good and stable, we could look to bigger things, but thats a pretty big item imo to stick on the todo list for now.


Should command completion work for things like expr? I could see it being usable for the string operators, but other than that your only dealing with single character tokens, none of which i would expect tab completion for, and the tokens for strings and numbers, you cant complete against anyway...

With 'expr', completion should also be able to tell you what operators are available. And inline help (which is closely tied to completion / syntax handling) should be able to tell you what comes next even if it cannot offer sensible completions. Besides, it is not just 'expr' we need to consider. There may be other "complicated" syntaxes that may benefit more from completion than 'expr'.

(By the way ... I notice you've now changed CatCommand to use a StringArgument rather than FileArgument and URLArgument. This means that the user cannot complete file arguments any more. Please just revert to the original version of 'cat' / CatCommand where the user cannot mix files and URLs. Trust me ... there is no better solution unless we make changes to Syntax infrastructure.)

I dont think the MuSyntax parser should evolve to the level of being able to recognize precedence amongst tokens, at least not yet. In order to do expr, you would need something like a BinaryArgument that has a concept of left and right. In order to do this would either require multiple passes over the tokens, or major syntax changes that allowed specifying precedence.

Actually, I don't think precedence is necessary. An alternative is treat each precedence group as a distinct production in the grammar; e.g. in BNF one could write:

expr ::= add 
add ::= mul | add '+' add
mul ::= primary | mul '*' mul
primary ::= literal | '(' add ')' 
literal ::= '1' | '2' | '3'

Strictly speaking the grammar above ambiguous; e.g. it gives two parses for "1 + 2 + 3". However, this is not a problem for us since MuParser is happy with ambiguous grammars ... it just gives you the first parse it finds.

But first can we get back to making sure that we've CAPTURED THE REQUIREMENTS? Then we can decide on how to prioritize them ... and how / when to address them.


Cat is reverted.

I hear what your saying about the completion, and it sounds like a really neat feature. I think what im getting at is that as nice as it is to have features that would make commands work better and more intuitively, we need commands that work properly.

The patchset im working on atm for syntax will allow more flexible command line parsing, allowing the syntax to set preset values for flags and integers, and allowing multiple options to map to the same argument, overwriting a single value. This was mostly included in the last patch i put up. The last part includes a new argument class that wraps another argument, mirroring the argument. When accept() is called, the wrapped argument does the actual accept() of the token, and afterwards the wrapper class saves the current values of each argument it was told to track. From the command, you can call getValues() on the wrapped argument, to get the arguments as expected, wether they be strings, files or whatever. And you can call getValues() on the wrapper class to get an array of List<?> objects, each with an entry of each label it was told to track, in the order the labels were given.

Im testing it against an implementation of the head command that i have, and what im aiming for is the ability to do the following.

head a -n 15 b -n -20 c -c 400 -

which would output the first 10 lines of a, the first 15 lines of b, all but the last 20 lines of c and the first 400 bytes of stdin. I still have some testing to do, once i have it working properly i'll put up a final patch that includes the whole patch set.

In case you noticed the error in what im doing above, i realize that, at least on my linux machine, head and tail dont work like this, if i fed the above command line into head on my linux, i would get the first 400 bytes of a then b then c then stdin, its just an easy example to test my code on because it doesnt deal with alot of flags. The real head impl will work as its expected, not like the example above.


Here are two more requirements raised by recent work on FindCommand:

FileArgument has the habit of matching option names and other things that are "valid" as file names ... but not what the user expects. In the "find" case, this meant that "find . --nome foo" (where the user has mistyped the option) would bind 3 File arguments to. There is a partial workaround for this in the form of the Argument.EXISTING flag, but this is not the real answer. A possible solution might be to add another Argument flag to say "don't match an option". But that is not a good idea because it makes the command class aware of the concrete syntax. Another solution might be to add a new Syntax / MuSyntax construct to support this. For example, a Syntax that looked for an argument that starts with a "-", but doesn't consume it. Update ... a third option is to allow the syntax to (in effect) add extra flags to an Argument to modify its behaviour. So for example, the syntax for 'find' might add a "don't match an option" flag to the "dirs" argument.

Another problem that RepeatSyntax is non-eager; i.e. it tries to consume as few tokens as possible (modulo 'minCount'). IIRC, I implemented it this way in part to mitigate FileArgument's tendency to match option names! But if your syntax consists of a 'repeat' followed by other stuff, parsing a long command line will entail a lot of backtracking. So, maybe we need an option for RepeatSyntax (and maybe others) to enable eager matching.


I commented in the find command issue, but i'll reiterate here, i think the parser should 'assume' to some degree that any token that is prefixed with a '-' should match a shortName or longName of some option. If the user actually wants to give a file that starts with a '-' than it should be escaped, quoted, or put in a list following ' -- ' to denote to the parser to stop parsing flags. At least this is the behavior i would expect.


I have done some thinking about how we can handle more situations on the command line that our current syntax/argument scheme can't resolve. The problem isn't the syntax, its the view of the command line that a command is left with. This view being that arguments are grouped, in the order they were interpreted, into a single Argument for which they bind to via the argLabel. Now for the most part, this works find, but the vital piece of information we are losing is the order in which arguments were interpreted as a whole.

Now i wouldnt suggest changing anything that is in place. I like the current means of being able to call on getValue and getValues to retrieve the objects of the type expected. What we are missing is the ability to find what options came before, or possibly after some given argument. If i were to have:

tar -cf foo.tar -C ~/code/java *.java -C ~/code/C *.c

and i were to bind -C to FileArguments and the patterns to StringArguments, then i would have two values in each, but no way to determine what came before which. If i were able to query the FileArguments and say, getValuesBefore(int) where int is the absolute position on the command line of one of the StringArguments.

There are differnet ways to implement this, but the main point is that we need some way, from the command side, to determine the order of arguments amongst different different argument objects.

One idea would be to have the accept() call to argument also accept an integer. As the parser parsed tokens, and arguments accept them, the accept method would also take an integer, telling the argument that it is the nth argument to be accepted. If the accept does not throw an exception, than the parser increments the counter for the next accept, if we have to backtrack, every argument we undo, decrement the counter, and have undoLastvalue also remove the index of the value.

Then at least, we could have a bigger query interface from the command of the arguments instead of just getValue and getValues, we could also have getValueBefore(int) getValuesBefore(int) or any other query that asked for elements based on some relative point in the command line.


I have cleaned up and rebased my syntax patch, included below, here are the details.

    New syntax features.
    Option and Verb accept a value attribute to set the value of an associated
    Integer or Flag argument.
    String, Integer and Flag arguments will allow multiple accept() calls when
    MULTIPLE flag is not set, but EXISTING flag is. The behavior is to
    overwrite the single value the argument holds.
 .../src/shell/org/jnode/shell/syntax/ |    8 ++
 .../org/jnode/shell/syntax/     |    4 +-
 .../shell/org/jnode/shell/syntax/ |    8 ++-
 .../org/jnode/shell/syntax/    |    4 +
 .../shell/org/jnode/shell/syntax/ |   79 +++++++++-----------
 .../org/jnode/shell/syntax/   |   10 ++-
 .../shell/org/jnode/shell/syntax/   |   14 +++-
 .../test/org/jnode/test/shell/  |    2 +-
 8 files changed, 75 insertions(+), 54 deletions(-)
syntax-value_and_existing.patch14.95 KB


Status:patch (comments requested)» active

I'm not convinced. I started out thinking that this was a good idea (apart from the overloading of EXISTING), but now I realise that it is violating the principle of separation of concerns that underlies the current design. The Command doesn't and shouldn't care whether a syntax resets a single-valued Argument to a different value. So why should it need to set a flag to allow a syntax to do this?

The behaviour we are trying to allow here is the concern of the Syntax, and therefore should be specified in the syntax spec(s) that bind to arguments. This would then be mapped to a (new) flag on MuArgument that tells the MuParser to do the Argument.accept / Argument.complete call differently.

As you can imagine, this is going to add complexity in a number of areas. For example, the MuParser's backtrackStack will have to record the old value. But when you think about it ... MuParser would need to do this even with your proposed approach ... which means (I think) that your patch is probably incorrect.

Can I suggest that you stop trying to create patches for the syntax mechanisms for now? Try to think about / discuss the problems and solutions at the conceptual level, and leave it to me to figure out how to code the solutions. The code is doing a lot of complicated things ... and is not amenable to simple patching.

If you want to do some coding, one thing would be really helpful is if you can contribute some better unit tests for regression testing of the syntax mechanism. For example, the bug in processing the syntax for 'find' should have been picked up earlier.


Fine, the main reason i was supplying patches is because sometimes i think an explanation makes more sense in code than it does in text. For me, i rather interpret patches, rather than peoples explanations. Some people call me a geek :shrug:

If you just want to talk conceptual syntax, then the only real suggestion i have in mind is to allow an eager attribute in repeat. At the moment, i think of repeat as :

for(i = 0; i < maxCount; i++) {
  if(i == minCount || !match) break;
if(i < minCount) backtrack;

but if it were to have eager='true', then it would be more like:

for(i = 0; i < maxCount; i++) {
  if( (!eager && i == minCount) || !match ) break;
if(i < minCount) backtrack;

This would allow find to work properly as it is speced at the moment. Except that it would add eager=true to its repeat of directories.