FileArgument's file pattern matching has problems (patch V2)
Project: | JNode Core |
Component: | Code |
Category: | bug report |
Priority: | normal |
Assigned: | Unassigned |
Status: | closed |
Jump to:
I took a look at FileArgument's getFiles() method to see if I could steal some code / ideas for pathname expansion in the bjorne shell. It looks like the code has a variety of problems:
- It doesn't cope with filename patterns containing '/'.
- It doesn't support '[...]' in patterns.
- Pattern matching doesn't suppress "hidden" files like UNIX globbing does.
- Pattern matching will behave strangely for patterns containing certain characters. For example, the pathname "a.b" will match "a.b" but also "aab", "abb" and so on. (Hint: '.' means match one character in a Java "regex".)
- The '*' and '?' metacharacters are translated to regex character sequences that will only match "word" characters in filenames; i.e. letters, digits and underscore.
I have a prototype for a reusable PathnamePattern class in my sandbox. My primary goal is to implement file expansion for bjorne, so the primary requirement is bourne shell / POSIX compatibility. However, the class can easily be plugged into FileArgument to address its problems, and used in other places that need to do pathname "globbing".
The signature looks something like this:
public class PathnamePattern {
// no public constructor
// create / compile a pattern
public static PathnamePattern compile(String pat) ...
// test if a string is a likely pattern (i.e. contains metacharacters)
public static boolean isPattern(String pat) ...
// expand a pattern. The argument is the start directory for a relative pattern
// and is ignored for an absolute pattern
public List expand(File startDir) ...
}
What do people think? Any opinions on an appropriate Java package for this class?
I have various ideas for bells and whistles ... they will be in the javadoc on the initial checkin .
- Login to post comments
#1
I'm aware of the various problems of the filename patterns support and I want to fix them. My first idea for a fix was to apply regex escaping on the regions of the filename pattern which are marked by the occurence of the two special characters '?' and '*'. This would implement a simple DOS-like patterns support but supporting general regexp based patterns would be interesting too though that would need a different syntax probably.
I wonder what your approach is in the prototype. We might consoder resuing the code if that makes our life easier than handling the relatively simple case of the default shell separately.
#2
The approach I'm using is as follows:
where protect(ch) adds a regex escape if ch is a regex metacharacter,
I'll upload the code in a day or two.
#3
This looks good. Just in case you are not using it already Pattern.quote(String) might be useful too.
#4
Here's the PathnamePattern class, and modifications to FileArgument to use it.
The pattern matching behavior of the PathnamePattern class is controlled by a flags word that you pass to the expand, compile and isPattern methods. In FileArgument, I'm using no flags, which means that just '?' and '*' matching are enabled. The flags control such things as "[...]" matching, "'" quoting, "\" escaping, hidden file support and sorting of the pathname list.
I'm going to write a blog entry about some issues with FileArgument.
#5
This version of the patch has the following minor improvements:
#6
This is a major improvement over the initial path name pattern support.
It's commited.
#7