[Home]SvenSteinseifer

FlowBasedProgramming | RecentChanges | Preferences

Sven Steinseifer

Java expert, working on Parasuite product.

Error handling in JavaFBP

Up to and including revision 63 in SVN, error handling has been very difficult in JavaFBP. In Java it is typically done using exceptions. But the execute() method of the Component class did not allow the component developer to throw an exception in case of an error and let the JavaFBP framework react accordingly. Instead, the developer was responsible for cleanly shutting down the component and making sure the component did not get activated again (by closing and draining all incoming connections). To let the caller (or user) know what had happened, I came up with the following idea: I introduced a special component which was connected to all components of a network and collects all occurring error messages (exceptions). After the run this component could be asked if there had been an error during execution of the network. In short, error handling was done without the framework as it did not support it properly.

In my opinion, it would be nice if a developer could simply let the component throw an exception in case of an error and let the framework handle the rest. In conversation with Paul it turned out that it may be sensible to provide more than one error handling strategy:

  1. "All-Or-Nothing" strategy: In case of an error shut down the whole network immediately.
  2. "Graceful Degradation" strategy: In case of an error only shut down the parts of the network which depend on the erroneous component and let the rest keep on going.
  3. ?

The All-Or-Nothing strategy would normally be used for testing purposes or with small networks whose results are only of interest when they are complete. For production, especially when big networks are involved, one would use the Graceful Degradation strategy.

Implementation considerations and preparations

  1. Allow the Component.execute() method to throw exceptions. Exceptions thrown in this method are considered as component errors by the framework.

Implementing All-Or-Nothing

FBP networks can be considered as distributed systems. A component does not have knowledge about other components and networks only know about their direct descendants. Therefore we have to implement a distributed shutdown.

A component should behave like this:

If it encounters an error,

  1. it enters the error state and
  2. tells its mother that an error has occured.

If it is asked to terminate, it interrupts its execution.

A subnet should behave like this:

If it encounters an error signal from one of its descendants,

  1. it enters the error state,
  2. tells its mother that an error has occured and
  3. asks its descendants to terminate.

If it is asked to terminate, it shuts down its descendants.

To implement this, I did the following:

In Component.java:

  1. Surrounded the contents of the run() method with a try-catch block.
  2. Set the component status to ERROR.
  3. Called mother.signalError(this, e) with e = the occurred exception
  4. Introduced a terminate() method receiving the new status as parameter which
    1. sets the new status and
    2. calls the thread's interrupt() method

In Network.java:

  1. Introduced a signalError() method which receives the occured exception as parameter and checks whether an error already occured. If not, it
    1. stores the exception in a field called "error" and
    2. asks the network's descendants to terminate (comp.terminate(StatusValues.ERROR)).
  2. Skipped the deadlock detection in case of an error (waitForAll() method).
  3. Allowed go() to throw Exceptions.
  4. Let go() pass on FlowErrors.
  5. Let go() throw the occured exception if any.

In Subnet.java:

  1. Overrode signalError() with the following: If the subnet is not already in the error state,
    1. call the mother's signalError() method and
    2. call the subnet's terminate() method.
  2. Overrode terminate() with the following:
    1. Ask the components to terminate,
    2. set the new status and
    3. terminate the subnet.

Some notes:

  1. Calling the interrupt() method of a thread does not always stop the thread immediately. Especially when the component is doing some processing without running into a wait() or Lock.lockInterruptly() call it won't terminate immediately. Eventually, it will of course stop when the adjacent components have closed their ports. If this should be a problem, I don't know if it is, a solution would be to check the thread's interrupt flag on each invocation of a framework method. Or, the component developer could do this check at some places in his code. By the way, this can be done using the interrupted() or isInterrupted() method of Thread.
  2. The new terminate() method does not put the component in the ERROR state. Instead it receives a parameter containing the new state. I have done this to allow networks being shut down externally, e.g. when a user cancels the operation. For this to work, I've added a terminate() method (without parameters) to the Network class, which shuts down the network.
  3. The FlowError class does not extend Error anymore. Instead, it extends RuntimeException which also does not need to be catched by application code. But now we can handle FlowErrors with the new error handling code. Extending Error should be reserved for really severe errors which leave the system in an unrecoverable state (out of memory, vm errors, etc.).

Implementing Graceful Degradation

This needs some more thinking about.

Miscellaneous

  1. I added two helper methods, declareInputPort() and declareOutputPort(), to the SubNet class. We need this in our project because we turn XML network descriptions into JavaFBP network descriptions. For this to work we have to define a general subnet class without annotations and declare the ports at run-time.
  2. Tracing will now be disabled if the trace file cannot be created or opened.
  3. Added a static setTracePath() method to the Network class. This can be called to specify the path where the trace files should go.

Deadlock testing and subnets

As of revision 71 the deadlock test considered each network (main or subnet) separately. Now imagine a network consisting of a component producing packets very slowly (say one each second) connected to a subnet consisting of an input port IN and a packet swallowing component (say Discard). After the first packet was consumed by Discard, the status of the components in the subnet is the following:

IN: SUSP_RECV

Discard: DORMANT

Now the deadlock test would say: We have a deadlock, because nobody is active. But this is not true. The packet producer outside of the subnet is active, eventually producing the next packet and IN will get activated again. Thus, the deadlock test has to consider the whole network. Solution: Network.listCompStatus() now recursively queries the status of all components in all subnets.

Tracing

When using subnets the main network and the subnets write concurrently to the trace file which may result in lines like these:

  	Network: Pass.IN: ReceivNetwork?: Active: discard

Resolution: A possible solution would be to let subnets call the traceFuncs method. But I think it would be nice if each subnet would create its own trace file.

This would create a Thread-<x>-fulltrace.txt file for the main network and a <subnet name>-fulltrace.txt file for each subnet.

Tracing in JavaFBP-2.4 has now been enhanced with a number of new features, as has the latest (unreleased) version of C#FBP. These features are described in FBPTracing.

A future enhancement may be to route trace information over a (network-general) connection to a separate component which will look after writing out the trace data. This arises from the fact that sometimes production runs have to be run with tracing enabled, so you want tracing to slow down production components as little as possible.


FlowBasedProgramming | RecentChanges | Preferences
This page is read-only - contact owner for a password | View other revisions
Last edited March 26, 2009 9:27 pm by PaulMorrison (diff)
Search: