Isolate and IsolateInvoker tracking issue

Project:JNode Shell
Component:Code
Category:task
Priority:normal
Assigned:Unassigned
Status:active
Description

This issue has been created to track progress on JNode's isolate implementation and the "isolate invoker".

Things that are known ... or believed ... to work:

  • Running a console in a new isolate via "console -n -i".
  • Some commands will run via the isolate invoker.
  • The CommandIO streams mechanism is working.
  • An isolate launched using the isolate invoker gets the current properties and sees environment variables exported using the bjorne interpreter's "export" builtin.
  • Instantiation of a command class and its arguments, and subsequent parsing all happens in the child isolate.

Things that are known ... or believed ... to not work, or not to be implemented fully or properly:

  • The "javac" command throws a different exception during class initialization.
  • Sending of isolate lifecycle link messages are fully implemented.
  • Command argument completion occurs in the parent isolate. This is incorrect, though mostly harmless.
  • There is a problem with isolates and pipelines; see issue 2996 last comment. In summary, a simple pipeline like "echo hi | cat" hangs without producing any output.
  • The Isolate API methods for requesting or forcing an isolate to exit/halt don't actually cause the isolate to stop. The approach used by the (disabled) VmIsolate.stopAllThreads() method to kill threads is fundamentally unsafe.
  • ^C has no effect. At the moment there is no safe way to kill threads, proclets or isolates, so I've disabled ^C handling in the CommandShell.
  • When an isolate exits, (AFAIK) it does not close any Streams, Readers, Writers, Sockets, etc that its application opened but failed to close.
  • There is currently nothing to prevent an application thread in an isolate from taking out a monitor lock on a @SharedStatic class, or on some other object returned by the system. This could potentially block threads in the root isolate.

Anybody who is working in the isolates area: please update the above lists with any progress you make / problems you know about. And use the comments to say what you are doing, so that we don't tread on each other's toes.

Other people: please report anything that seems to be broken as a comment below.

#1

For the record. I am currently working on the 'environment' issue.

#2

The isolate invoker now passes the environment and system properties to the command in the child isolate ... and it seems to see them. Also, the problem with streams seems to have ... ummm ... gone away.

#3

The problem where a command class and its dependent Arguments were instantiated in the parent isolate has been fixed. Instantiation and command argument parsing now happens in the child isolate.

#4

I have an idea for a scheme for killing isolates safely: that is, a way that doesn't incur the risk of damaging operating system data structures. Some aspects are a bit nebulous, so please bear with me.

First we need to understand the current (commented out) approach, and why it is unsafe. The current approach to killing an isolate is to simply find all active threads in the isolate and (provided they are not the current thread) call VmThread.stopForced(null) on them. This marks the vmThread as stopping, then calls the VmThread.doStop method which will release of all of th threads monitor locks and then remove it from the scheduler.

The reason that this is unsafe is it is possible (and in some cases guaranteed) that the thread being killed is in the middle of some call into the JNode's operating system. (This could be something obvious like reading a file or waiting for an AWT event, or it could be something like loading a class or waiting for the isolate's (hypothetical) local GC to complete.) There is a serious problem calling doStop() on a thread while such a call is in progress. If the thread was in the middle of updating an operating system data structure, it won't get a chance to complete the job. Even 'finally' clauses won't get run for the thread. Instead, the monitors that were preventing other threads from seeing the data structures in their intermediate / incomplete state will be be released, and ... anything could happen.

I hope that it is clear why this is bad.

We need to avoid killing threads while they are (or might be) in the middle of updating operating system data structures. (This would also apply to any non-OS in-memory data structures that is shared with threads outside of the isolate. But that is problematic ... so we must avoid any such sharing.)

The first step is identifying when a thread has JNode OS calls in progress. One way to do this would be to identify all possible OS entry points and tag them with some new annotation. The thread killer would scan all stack frames for the thread being killed looking for frames for methods with the annotation. Unfortunately, I don't think we could reliably identify all entry points. But a more workable way would be examine the fully qualified class names for each stack frame, looking for package names that start with "org.jnode.vm", "org.jnode.driver", "org.jnode.js" and so on.

If a thread has no OS calls on the stack, it can simply be killed. If does have OS calls on the stack, we need to keep the thread so that any OS calls can complete when whatever is currently blocking them gets unblocked. But when the outermost OS call (i.e. the one nearest to the bottom of the stack) returns, we want the thread to go away. I think that the way to achieve this is hack the stored PC and frame pointer in the outermost OS call's frame so that a return does a "long jump" out of the thread's run method.

Are there holes in this idea? Things that I haven't thought of?

#5

In principle that idea reminds me a lot of what I've been thinking about a while ago.

But for the case the idea above turns out to be too difficult I have other ones too...

One thing is to try to use java.util.concurrent instead of synchronized at critical places in the OS where isolate threads can interfere. Other tool in the box is to use the Uninterruptible annotation for fast updates to OS datastructures so that it's not possible for a thread to get interruped by any means while doing it. Sill related to this, wherever we can we could block the isolate thread on the OS bundary and pass the work to an OS thread to be completed, in this case the worse it can hapen is that when the OS thread gets ready with it and puts the result in a queue (for instance) then there will be none to pick it up. Such results can be flushed over time.

So to summarize this we could probably avoid problems by using a fault tolerant programming style in the kernel.

A good analysis of the kernel interface is required in any case.

#6

What about @Uninterruptible? At the moment i believe all it does is prevent a thread from yielding. Critical sections would normally want to be protected by this, so that they are not blocking the resource they are using. Could we make stopping a thread respect this annotation also? Even if a thread is within the operating system, it may not be in a critical part, and therefore still be killable. We would just have to pay attention to what would be considered critical and protect it accordingly.

Perhaps @Uninterruptible shouldn't be used on such a broad scope, as it has its side effects on yield() too, Either way, the same idea applies. An annotation on methods that modify operating system structures. I think a blanket ban on whole packages is a little too much, as there are plenty of good reasons why a thread can be in those packages with no plan of causing any mutations.

That being said. It might actually be a good idea to start with a blanket ban on killing threads in specific packages, then slowly peel back once certain areas have been confirmed and tested to be safe. I know for a fact that the memory manager is very fragile to this. If an object has been allocated, but its size field has not been written in the header, then the heap is automatically corrupted! This is a very small window, but none the less, it suffers from this exact problem.

Edit:
Sill related to this, wherever we can we could block the isolate thread on the OS bundary and pass the work to an OS thread to be completed

I like that idea. Keep userland threads out of the OS when possible and setup a producer/consumer interface. OS threads would likely be scheduled at a slightly higher priority to lower OS level contention. This would also give us much better control if one wanted to implement a quota-system to prevent isolates/threads from hogging OS level resources. The only major downside might be in making sure there isn't alot of lag time. But we won't know how much, or if there even is any, unless we try.

#7

I'll add some other ideas and thoughts.

I wanted to start a thread about @Uninterruptible before and I think it should not be used on such a broad scope as cluster said. At the moment it is used in some points as a kind of synchronization. Imho @Uninterruptible's only "valid" use-case is as a "speedup" for methods where you _know_ that yielding would hurt performance. Every other usage pattern (especially synchronization) just will hide bugs until we have SMP going.

Regarding concurrency: I like lockless algorithms very much. But besides being difficult to implement properly and error free, I suppose just a small amount of the VM code could be made lockless at all. For some algorithms it will just get too hard or too inefficient to make it lockless.

Another thought is, what happens when you lock to an .class object that is ShardeStatic. E.g. what happens, if you lock a "org.jnode.vm.SomeClass.class" .. this would not be a call to "kernel code" but might trigger similar concurrency bugs, isn't it?

Regarding Steve's comment, I wanted to note about the Kernelmode annotation. But afair it just tells the compiler to use the kernel stack instead of the thread's user stack. And it is used in very view lowlevel methods.

And as a last addon: I wondered how e.g. Linux does that. And imho the big difference is, they have a difference between kernel locks and user locks. You can interrupt user locks but you can't do that for kernel locks (ok I know, there _are_ kernel locks that can be interrupted.. but anyway..). So perhaps we might come to a solution if we differ between the two types of locks. Though I have no idea how we could make this distingtion.

Generally I second levente's point that we need a good analysis of the kernel interface and define what this is at all.

And sorry I could not add anything usefull for solving that, but wanted to add some thoughts of mine.

#8

Another thought is, what happens when you lock to an .class object that is ShardeStatic. E.g. what happens, if you lock a "org.jnode.vm.SomeClass.class" .. this would not be a call to "kernel code" but might trigger similar concurrency bugs, isn't it?

I wouldn't think that is something that should be done period. There should be a well known 'lock' object, or better yet, if you are in the operating system, using proper spin-lock/semaphores, and not synchronzied {}. synchronized is fine for java apps, but not for OS level code. It's just too much of a blanket synchronization mechanism, and it hinges specifically on monitor, which is not the best synchronization mechanism for a kernel.

As for linux, i think the advantage they have is the fact that they do a context switch into kernel land via syscall. For CPUs that have no syscall, i think they either throw the 'Big Kernel Lock', which is something we really dont want to get into, or they provide use some other mechanism for doing a context switch into kernel-land. In effect, this would be quite similar to the idea of OS level threads, that wake up upon request from user-land threads, user-land thread yeilds, context switch to kernel thread, and then back again. The major difference is we're not using hardware support, but this might also make porting to other architectures that much easier.

#9

I wouldn't think that is something that should be done period.

That is a fine principle. But (IMO) we need isolates to be bombproof even if applications do stupid (or even deliberately nasty) things.

It would be best to stop the application from taking out a monitor lock on a shared-static class in the first place. If that is not feasible, I'm not sure that it necessarily matters that the lock just gets broken. What matters is not the lock per-se, but whatever it is protecting. If the application is using a lock on a shared-static class to protect its own data structures ... well tough luck.

Actually, the problem of application locks on 'shared static' classes is a subcase of a larger problem that needs to be addressed in the long term to "harden" isolates. If anything in the OS layers uses monitor locks on objects that can be seen by application code, a malicious application can interfere with (block) the OS by locking the object.

Anyway, I think we need to treat locks on shared static classes (and the more general case) as a separate problem. I'll add it/them to the list in the issue description.

Back to the topic of this discussion, I don't like the idea of suborning @Uninterruptible to mean "don't kill". If we take the annotation approach, we should create a new annotation (say @Unkillable) to mean that. @Uninterruptible would almost certainly imply @Unkillable, but there are lots of cases where @Unkillable does not imply @Uninterruptible.

(Aside: I've always thought of @Uninterruptible as a mechanism for stopping hardware interrupt triggered interruption / rescheduling. AFAIK it is intended to be used to prevent harm / breakage ... not as a hack to make certain OS operations go faster. But none of this is documented anywhere that I know of.)

Any annotation-based approach is (IMO) problematic because it relies on developers finding all methods that need the annotation, annotating them, and not remembering to do this for ever after. I really don't think that this is sustainable ... too much opportunity for human error.

Leaving aside the question of how we identify a thread that shouldn't be interrupted, what are your thoughts on my idea (see comment #4 last paragraph) for dealing with an uninterruptible thread by tweaking the return PC / FP? Would it work?

#10

The annotation method may not be bullet-proof, and yes it would require the programmer to pay attention to where he's using it, no different than the other annotations that are used. Its just that the annotation approach allows the kernel dev more precise control.

The long jump approach would probably work just fine. And for the time being, it would also probably solve the issue, for which we could revisit later, if we decided to make this more precise in the future. At the moment we clearly need _some_ solution. The only 'hole' i see is a possible race condition. If you make sure that the thread is not in a running state when you change the return pointer, then you would probably be ok. But if the thread is running, especially in smp, and you change the return pointer, but the thread has already returned, then the kill attempt would simply fail. Perhaps this isn't as much of a problem as it seems to me, but at least to be safe, keep it in mind. This is making that assumption that, when you go to kill the thread, you would check its frames for such os calls, and if one is found, commit to killing it via the frame PC. If at the moment between making that decision, and actually changing the PC, the thread itself does its return, then changing the PC would fail, as that frame would be in the process of being torn-down. So long as this is properly synchronized though, it shouldn't be an issue.

As for @Uninterruptible stopping hardware interrupts, im not sure if entering/returning from an @Uninterruptible method actually outputs cli/sti instructions. I had thought it simply prevented yield() from following through.

If you want a test case, the mm has a very obvious race condition when being killed, like i pointed out earlier, between memory allocation and header output. If the header is not output, specifically the SIZE field, then the heap will be corrupt, as this size field is used to find the 'next' object in the heap when walking.

#11

As for @Uninterruptible stopping hardware interrupts, im not sure if entering/returning from an @Uninterruptible method actually outputs cli/sti instructions. I had thought it simply prevented yield() from following through.

I didn't say that. I said "hardware interrupt triggered interruption / rescheduling" ... and I was trying to say thread preemption. Now I technically wrong about that too. But if you view things at the Java level (rather than the level of the native code emitted by the compiler). I believe that what we have at the moment is tantamount to thread preemption. (The purist in me says that we ought to rename @Noninterruptible to @Nonpreemptible, and reserve @Noninterruptible for the (hypothetical) case where we DO want the native code compiler to emit STI / CLI instructions. But that is "busy work" at the moment.)

IMO, the "mm" example only illustrates why annotations aren't a sustainable solution. We'd need to carefully analyse large amounts of existing JNode code to figure out where to add the annotations. And I don't think we can do it to the level of accuracy required ... either in the first instance, or over time. And the problem is that if we get it wrong, the result will be lots of potential race conditions (bugs) that occur rarely, and are next to impossible to reproduce. OK ... so call me a pessimist Smiling

Anyway, after reading peoples' responses, I realized that we don't actually have to decide one way or the other (i.e. annotations versus package names). We can implement both approaches and select one or other via a build-time or boot-time switch.

I've just figured a complication. Suppose that application code calls into JNode OS code (identified as noninterruptible) which then calls back into application code, and that the inner application code then gets stuck in a loop or blocked. My scheme described previously won't cope with that. Instead, we need to arrange that the inner application code gets interrupted, but exits from the OS callback with an unchecked exception / error. The OS call handles (or propagates it) normally and exits to the outer application code which behaves as before. The key thing is to avoid executing any application code, no matter where it is on the call stack. And this is complicated by the fact that some code (e.g. java.util.*) is called by both OS and application code, and needs to be treated differently for the two cases.

The other point to note is that allowing kernel developers to "be more selective" about what is interruptible doesn't achieve a lot assuming that the set of package names (e.g. "org.jnode.vm.*", etc) include all code that should be uninterruptible. While it is a good idea to kill off the threads quickly, the insertion of lungjumps etc should mean that it doesn't matter if they hang around after the isolate has been killed off. (I'm assuming that we've dealt with the complication described in the previous paragraph!) The case where an operating system call is waiting (indefinitely) on a pipe read / socket read / etc could be handled by the mechanism that closes an isolate's streams, sockets, etc.

#12

Ah my mistake, i read that a little too quick.

Back to topic, do you think there is a possiblity of the race condition i mentioned being an issue? At least it is better to have 1 race condition with a possibly very mild side effect and an extremely small exposure window, versus the thousands of race conditions that currently exist and are much more dangerous...

#13

I wanted to provide a bit of history regarding the @Uninterruptible annotation. As noted correctly above, it is just a mean of telling the JIT compiler to _not_ insert yieldpoints in the code. I was thought as a mean of lowlevel locking (where you can not use Monitors, i.e. synchronized) for critical sections in kernel code. And as this mechanism fails when you call regular application code the bootimage builder keeps giving you tons of warnings about that. Stuff like:

[bootimage] Method calls interruptible method:
[bootimage] 	caller: org.jnode.vm.scheduler.MonitorManager#testThreadId!(I)V

Now back when Ewout started to work on the SMP stuff he saw the need of a real locking mechanism. This was used in Monitor and MemoryManagement (the Address.atempt(..) code that uses CAS instruction). If you go back in the svn history you can see old versions of Monitor using @Uninterruptable instead of CAS.

So as it has no real mean of locking in a SMP world, I called it a "performance hack" as most places where the annotation is used is "critical" in a way that a reschedule would not be much usefull.

What sounds interessting is the @Unkillable idea by Steve. I'm not sure I understand how you would use it, especially after your concerns that kernel code can call back to user code. But what I would propose in that regard is: We could simply extend the JIT to set a flag once we enter a @Unkillable method in the thread's stack and unset it, once we leave the method (actually it would need to be a counter as we can call kernel code recursivly,..).

It is still the question how to use this extra information, and before talking about that I'll give another possible bad use case: It might still be possible a Isolate has 2 threads, one within kernel code, one outside of kernel code. We already stopped the one outside of kernel code. Can there be cases where the thread in kernel code will not return without the other thread?

Anyway, regarding this flag/counter for @Unkillable: We could further extend the yield handler / reschedule code in a second way: When we want to kill a Isolate, we set a "signal flag" for that Isolate. The yield code continues to check the TSI flag first (as it used to be to have fast code), but if it is set, the code further checks the Isolate's "signal flag". If it is indicated to "kill" the Isolate we stop all threads if the @Unkillable-counter is 0, otherwise we continue to reschedule (all Isolate's thread or just the ones that are within Unkillable code).

#14

I think a proper long-term solution is to restrict kernel space, or more specifically, kernel entry points that cause structure mutations, from being entered by user-land threads period. (We _could_ still allow user-land threads 'read-only' access to kernel structures) When the user-land thread needs to e.g. allocate memory, create a file descriptor, or open a socket, then the thread will tell some API that it needs something performed, it will block (in a safe way that it can be killed), and the system will context switch into kernel-land and process the request. It's not 'that' difficult to outline all the entry points, if you define a proper API. I'm not sure if it was meant to be used in this way, but InitialNaming and the Service interfaces seem like a ripe target for defining specific entry points for user->kernel requests. It wouldn't be used as the 'only' entry points, as some of the JDK code might bypass the InitialNaming lookup, i haven't looked that deep into it, was just an idea that popped into my head. (There also is reflection to deal with too...)

At any rate, this would all take a good deal of time to implement, test and debug properly. It definetly should be a priority to harden the kernel interface and identify entry points, but until that time, I'd say that any fix is better than no fix...

#15

Assigned to:Stephen Crawley» Anonymous

Deassigning issue while I do some more Bjorne work