KeyboardInterpreter_*.java encoding

Submitted by s750smc on Fri, 01/07/2005 - 04:46.

General discussions

When I compile the jnode sources, I got errors with core/src/driver/org/jnode/driver/input/l10n/KeyboardInterpreter_*.java files. I guess it's related to source encoding.

I have two questions:

1. Which encoding should I use for JNode source compilation?

2. Is there any easy way to specify an encoding without changing my default locale?

what is your locale ?

Submitted by Fabien D on Sat, 01/08/2005 - 10:35.

What is your locale ?

AFAIK, until now, nowbody had to deal with the source code encoding so it is probably a standard one for a java jdk (ASCII ?).

Do you have an international jdk (because there is non-international jdk) ?

I know that, under eclipse, it is possible to specify the encoding of a file ... maybe it can solve your problem, but it's a temporary fix : we should find a better way.

Fabien

my locale

Submitted by s750smc on Thu, 01/13/2005 - 19:42.

The locale I set in my PC is Korean.

The javac reads the source under OS's default locale unless -encoding option given. And that caused the error I guess. I can change my locale to compile the JNode source, but that would be very inconvenient to me or anyone who has other locale around the globe.

Two solutions that would save me are:

1. As brihaye suggested, changing the encoding of the source to ASCII using \uXXXX syntax for non-ASCII characters.

2. In the ant build script, specifying the right encoding for the sources explicitly. Then it'll not be affected by the OS's locale.

I know that solution #1 is not exciting because it'd bother us when edit the sources. My preference is #2 with UTF-8 as the encoding. Nowadays, many text editors and IDE allow us to select the file encodings including UTF-8. And I think UTF-8 is kind of a good choice for a project like JNode targetting global use.

Let me explain my solution

Submitted by brihaye on Fri, 01/14/2005 - 22:31.

Of course, if every computer were working in native UTF-8, an UTF-8 encoding would be the definitive solution.

Unfortunately, this is not the case Sad

If an appropriate font is installed, setting an IDE to an UTF-8 encoding will visually give good results, but the IDE must *also* be configured to load/save in UTF-8.

On a french Windows 98, your magnificent korean (or whatever) characters edited with the default fonts would be generally replaced by squares, even when edited in UTF-8, and by "?" when saved with the default CP1252 encoding. Worse, they would be visually be replaced by 2 "odd" characters when edited with the default ISO-8859-1 encoding.

Even with proper fonts, I've experienced that problem myself one time : one full day of work with arabic characters was lost on the next morning !

This is because most of the IDEs forget to warn the user if the characters displayed on the screen aren't compatible with the file encoding. I've filled some bug reports for many softwares on this topic.

In other terms, if you want to manage everything in UTF-8, you have to ask 3 things to your project developers :

Install an approriate font that won't display squares (because everybody wants to delete them Smiling
Edit in UTF-8 (because double-byte characters will be converted in two odd single-byte characters, which everybody want to erase Smiling
Load/Save in UTF-8 (because, the OS will erase the characters itself if it isn't done before Smiling

Well, it seems to me a too difficult task to achieve Smiling

The "\uXXXX" syntax, although ugly, remains the best solution when you work with developers all around the world. They all have two thing in common :

they are lazy Smiling
they all can deal with ASCII

Cheers,

p.b.

Decision

Submitted by admin on Sat, 01/15/2005 - 08:43.

Ok, let's all use ASCII and encode all non-ascii characters using the \uxxxx syntax.

I'm converting all files to ASCII now.

Ewout

okay

Submitted by s750smc on Fri, 01/14/2005 - 22:55.

Okay, I guess you have a good point. Probably UTF-8 is a bit too early to use right now. I agree the "\uXXXX" syntax is going to be the best for a while.

Bernie

Question

Submitted by admin on Fri, 01/14/2005 - 08:27.

I think i'm stanrting to understand the problem. Let me check.

Is it correct the jnode does not compile for you, since your javac thinks it should read the sourcefiles using your local encoding?

Ewout

That's right

Submitted by s750smc on Fri, 01/14/2005 - 18:09.

That's right. My javac does not compile for me unless I change my OS's locale. And this is true with any javac too. There's no way for the javac to detect the encoding of the source automatically. So it assumes that the source is in the encoding associated to the OS's locale. If the source is encoded with other than your locale's, then you should give the -encoding option to the javac. Here's the JDK documentation of the -encoding option:

-encoding
Set the source file encoding name, such as EUCJIS/SJIS. If -encoding is not specified, the platform default converter is used.

When you compile java sources with Ant, you can give the 'encoding' attribute to the 'javac' task.

Bernie

Good practices

Submitted by brihaye on Sat, 01/08/2005 - 10:59.

It looks like the *.java files are encoded in ISO-8859-1.

Although it is the least worse choice for now IMHO, developers should remain as ASCII-compliant as possible.

If we look at KeyboardInterpreter_FR.java for example, we have :

keys.setKey(3, new Key('Ã©', '2', '~', KeyEvent.VK_2));
keys.setKey(8, new Key('Ã¨', '7', '`', KeyEvent.VK_7));
keys.setKey(10, new Key('Ã§', '9', '^', KeyEvent.VK_9));
keys.setKey(11, new Key('Ã ', KeyEvent.VK_UNDEFINED, '0', KeyEvent.VK_0, '@', KeyEvent.VK_AT));
keys.setKey(16, new Key('a', 'A', 'Ã¦', KeyEvent.VK_A));
...

The good practice is to use Unicode character representation, i.e. '\uXXXX'. This prevents problems where the user's default, often actual, encoding.

Not a so difficult task but not a very exciting one either Smiling

Maybe one day every computer will use UTF-8 encoding as its default to prevent such problems.

p.b.

native2ascii

Submitted by hagar on Fri, 01/14/2005 - 20:54.

We could use the "native2ascii" tool on the java files, but they or the "special" characters will not be the most readable after this is done. Anyway, I will try changing some the files in this weekend.

Martin

Active forum topics

Recent blog posts