FileArgument's file pattern matching has problems (patch V2)

Project:JNode Core
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:closed
Description

I took a look at FileArgument's getFiles() method to see if I could steal some code / ideas for pathname expansion in the bjorne shell. It looks like the code has a variety of problems:

  • It doesn't cope with filename patterns containing '/'.
  • It doesn't support '[...]' in patterns.
  • Pattern matching doesn't suppress "hidden" files like UNIX globbing does.
  • Pattern matching will behave strangely for patterns containing certain characters. For example, the pathname "a.b" will match "a.b" but also "aab", "abb" and so on. (Hint: '.' means match one character in a Java "regex".)
  • The '*' and '?' metacharacters are translated to regex character sequences that will only match "word" characters in filenames; i.e. letters, digits and underscore.

I have a prototype for a reusable PathnamePattern class in my sandbox. My primary goal is to implement file expansion for bjorne, so the primary requirement is bourne shell / POSIX compatibility. However, the class can easily be plugged into FileArgument to address its problems, and used in other places that need to do pathname "globbing".

The signature looks something like this:

public class PathnamePattern {
   // no public constructor

   // create / compile a pattern
   public static PathnamePattern compile(String pat) ...

   // test if a string is a likely pattern (i.e. contains metacharacters)
   public static boolean isPattern(String pat) ...

   // expand a pattern. The argument is the start directory for a relative pattern
   // and is ignored for an absolute pattern
   public List expand(File startDir) ...
}

What do people think? Any opinions on an appropriate Java package for this class?

I have various ideas for bells and whistles ... they will be in the javadoc on the initial checkin Smiling.

#1

Status:active» closed

I'm aware of the various problems of the filename patterns support and I want to fix them. My first idea for a fix was to apply regex escaping on the regions of the filename pattern which are marked by the occurence of the two special characters '?' and '*'. This would implement a simple DOS-like patterns support but supporting general regexp based patterns would be interesting too though that would need a different syntax probably.
I wonder what your approach is in the prototype. We might consoder resuing the code if that makes our life easier than handling the relatively simple case of the default shell separately.

#2

The approach I'm using is as follows:

  1. Do a preliminary scan for '*', '?' and '['. If none are present, the string is a plain pathname.
  2. Split the string into filename components based on '/' characters.
  3. Compile each comnponent to a regex by iterating over the component chars as follows:
    1. If char[i] = '?' --> '.' (or "[^\.]" if i == 0)
    2. If char[i] = '*' --> '.*' (or "(|[^\\.].*)" if i == 0)
    3. If char[i] = '[' look for matching ']' and translate ...
    4. If char[i] = '\' remove and char[i+1] --> protect(char[i+1])
    5. char[i] --> protect(char[i])
  4. Compile regexes using java.util.regex.Pattern.compile()
  5. Recursively walk the directory tree, matching the appropriate regex, and building the pathnames for the matches.

where protect(ch) adds a regex escape if ch is a regex metacharacter,

I'll upload the code in a day or two.

#3

This looks good. Just in case you are not using it already Pattern.quote(String) might be useful too.

#4

Here's the PathnamePattern class, and modifications to FileArgument to use it.

The pattern matching behavior of the PathnamePattern class is controlled by a flags word that you pass to the expand, compile and isPattern methods. In FileArgument, I'm using no flags, which means that just '?' and '*' matching are enabled. The flags control such things as "[...]" matching, "'" quoting, "\" escaping, hidden file support and sorting of the pathname list.

I'm going to write a blog entry about some issues with FileArgument.

#5

This version of the patch has the following minor improvements:

  • There are more flags for controlling pattern matching features.
  • The "." and ".." directory entries are optionally included in the domain of matchable names. (I hadn't realised that the File.matchFiles() methods filter these out.)
  • I've improved the javadoc comments a bit.

#6

This is a major improvement over the initial path name pattern support.
It's commited.

#7