Flow-Based Programming: Cognates

Flow-Based Programming - Chap. XXVII

Related Concepts and Systems

This chapter has been excerpted from the book "Flow-Based Programming: A New Approach to Application Development" (van Nostrand Reinhold, 1994), by J.Paul Morrison.

To find out more about FBP, click on FBP . This will also tell you how to order a copy.

For definitions of FBP terms, see Glossary

[This is a survey of the stage the “related” work was at up until 1994. Of course, many other systems have appeared since then, and indeed there are now companies expressly founded to capitalize on these concepts and similar ones. Some of the systems described in this chapter are still alive and doing well – but maybe this chapter is chiefly of interest as a historical record.]

Material from book starts here:

This chapter provides an overview of some of the work going on in universities and business which has concepts in common with FBP. My main problem with trying to give a good overview is that new software is appearing faster than I can keep pace with it! Every time I open a magazine I see work that has resonances with FBP, which I therefore resolve to read. Then I have to follow up their bibliography, which results in more to read, and so on!

The above is a roundabout way of apologizing to all those researchers and developers out there: if I have omitted your favourite piece of software or research effort from what follows, please accept this apology and send me information about it for my ever-growing files!

In a similar vein, I will summarize what I perceive as the salient features of each item, and I may have misunderstood their thrust. In most cases I have had no contact with the authors, and I will inevitably tend to look at their work through FBP-coloured glasses. Again, please contact me to exchange ideas or to throw brickbats!

For the above reasons, what follows is probably only a fraction of what is out there - I just hope I have included enough to pique your interest.

FBP seems to me to stand on three main "legs":

asynchronous processes
data packets (IPs) with a lifetime of their own
external definition of connections

You will find one or two of these in many systems, but it is the combination of all three which makes FBP unique. And even when you find all three of these items present in a system, you will find that they have been used to address a particular area or problem, such as distributed systems. I believe many of their developers have not yet understood the potential of this combination for improving all application development.

The first system listed below is not even an application development system, but a simulation system. In the preceding chapters we have often remarked that FBP allows simulations to be "grown" into full-scale applications, and the developers of GPSS saw this as a very intriguing possibility.

GPSS (General Purpose Simulation System)

This is a simulation system developed by Geoffrey Gordon and his team at IBM in the early 60s. I am including it because it shares all three characteristics above (the data packets were called "transactions"), and it was a very successful system for simulations. I also include it because it profoundly influenced my thinking about how to do application development. Gordon had hopes of extending it to run as an operating system control program, and did quite a bit of work in this area. Many people have also dreamed that it should eventually be possible to "grow" an application from its simulation, and in fact this was tried out in 1963 (30 years ago!) using GPSS, and was quite successful. Using FBP this process is now very straight-forward.

MASCOT (A Modular Approach to Software Construction, Operation and Test) (K. Jackson and H. Simpson 1975)

This system, built by Ken Jackson and his team at the Royal Radar Establishment in England, shares all three of the above characteristics also. I believe it has become quite widely used within the military establishment in England. Jackson's motivations for developing MASCOT were similar to ours, plus a desire to make interrupt-driven software resemble the remainder of the computer load, as a lot of their signals work involves handling asynchronous interrupts. In MASCOT an IP is called a "message", as in much of the work listed below, and a connection a "channel". The basic MASCOT unit of software is a subnet, called a "subsystem", and is drawn graphically, and then converted into a textual representation. MASCOT's services are somewhat lower level than FBP's, but its channel management is very similar overall. Here's a quote on how well the developers feel they met their objectives:

" ... the overall philosophy of MASCOT operation ... is to process data when it is available, to explicitly wait when no data is available and to pass on a stimulus to adjacent data users when data of interest is passed on.... Further, the use of the control queue within the interrupt handling software enables the expression of interrupt handling software to be unusually straightforward. The elimination of polling and searching within the kernel is complete...

"Conclusion

"MASCOT provides a very basic yet sound machine-independent kernel to produce a suitable environment for real-time programming.... Finally MASCOT looks to be a promising base upon which to build the high integrity systems which are the subject of current research."

You'll note the term "message": this reflects the idea that processes send messages to each other. Some of FBP's IPs are definitely messages, but our experience is that the use of messages as the main communication vehicle implies a finer process granularity than the level we have found to be most productive.

CHIEF

This seems a classic case of parallel development, as this work must have been going on at about the same time as ours was going on in Canada, but was totally independent of it. J. Boukens and F. Deckers, working for the Shell Company in Holland, came up with a system which shows a number of really remarkable similarities to FBP. In fact, the main difference is that their diagrams are vertical, whereas FBP's are horizontal! Another difference is that they allow multiple consumers for a connection (causing the records to be automatically copied), while in FBP we do not allow this. On the other hand, FBP dialects usually have an off-the-shelf Replicate component, which has the same effect. CHIEF's ports are numbered, as in AMPS and DFDM. To convert one of their diagrams into machinable text, they list the connections (called buffers), naming their producers and consumers. A list of such specifications is called a "cooperation". To program individual processes they developed their own interpretive language, whose execution was integrated with the multithreading driver. There is also a discussion of deadlocks.

When reading their paper, and especially the discussion after their presentation, I felt very strongly their excitement over these concepts, and perhaps some frustration that others weren't feeling this excitement too...

Actually, speaking of parallel evolution, after I had been working on these concepts for a few years, I came across two papers dating back quite a few years: one was written in 1967 (32 years ago!) by two people at the Rome Air Development Center, E. Morenoff and J.B. McLean, entitled "Inter-program Communications, Program String Structures and Buffer Files" (1967), describing a program interconnect structure based on the control of data flow via intermediate queues. For a single processor, queues could be maintained in main memory. For a distributed processor structure, queues could be held on disk.

A few years later (1971) R.B. Balzer published a paper on what he called PORTS, describing work done at RAND where modules could be shielded from other modules by utilizing software commands such as "connect", "disconnect", "send" and "receive". It is interesting that this paper is one of the most cited papers in the field, but, according to private communications with the author, little came of it.

STREMA

Another related system I came across in my reading was developed by Ian Clark of IBM UK (1976). It is described as a graphic conversational language for specifying and running application processes. STREMA uses a relational model and is intended to allow relational data to be treated in a uniform manner with flat files and subroutines. In STREMA, all these are made available to the programmer as "streams", which resemble most closely the processes of FBP. You can specify graphically how "streams" are connected, and what happens to the fields in the records travelling through them - Clark uses the term "component" to describe what a field resides in as it is in transit through a given stream (not to be confused with FBP "components"). Streams drive each other, are described by a "relator", and may be subject to constraints on their components. Components (fields) have values, but they also have status: one of UNDEFINED, VALID or INVALID (similar to DFDM's dynamic attributes). As a record enters a stream, what happens is determined by the stream's "relator", and the constraints on, and status of, the incoming components. Constraints may be such things as bounds on a value, type specifications, or forcing a value not to repeat nor descend in a run. This concept can support processes as diverse as applying subroutines to streams, collating data streams, or getting data from or writing data to a relational table. Combining the concepts of relators and constraints simplifies a lot of the logic conventional programs have to do validating fields and deciding what to do if things go wrong. Clark has done a good job of combining a number of useful concepts into a single framework. Again, there are similarities with FBP, even though his thrust was slightly different.

LGDF

R.G. Babb II (1984) uses the term "large-grain data flow" (LGDF) to describe a level of granularity like FBP's, which he describes as a compromise between fine-grained data flow and traditional approaches. He points out that, under LGDF, the lowest level programs can be written in almost any language (he uses Fortran). He also starts the design process by showing data flow dependencies in diagram form and then converting them to what he calls a "wirelist". Here's a quote:

"We have come to some unexpected conclusions... The most surprising is that sequential programs are often easier to design and implement reliably when based on a parallel (asynchronous) model of computation."

In case you start getting confused as you read this chapter: please take note that the term "message" used above is not the same as the Smalltalk "message", which is basically a linearized subroutine call, comprising a selector (which indirectly specifies the code to be executed) and the parameters to be passed to it. The more common use of "message" is similar to its meaning in colloquial English: a short piece of text carrying instructions and/or information from one process to another.

NIL

This system, developed by Rob Strom and his team (1983) at the IBM Research Center at Yorktown Heights, has some very strong similarities to FBP, except that it is a programming language, rather than a coordination language. The original motivation for NIL seems to have been for programming communication software - you will perhaps have noticed that the multiple layers of communications software can be implemented as pairs of complementary processes. They also wanted good inter-process protection, which, as we have seen above, means minimizing side-effects. Like FBP, NIL also allows applications to be built up out of communicating sequential processes; only one process can own a data object at a time; "send" is "destructive" (the sent object can no longer be accessed by the sender); and so on. Strom makes the point that the ability to have processes on the same machine run in a single address space makes the cost of a message exchange very low, comparable to the cost of a subroutine call, and also makes it possible to have larger numbers of smaller-grained processes.

While it does not have an explicit coordination language, NIL is so powerful that this kind of language can easily be added - a parent process has enough facilities available that it could build a running program based on a file which specifies the connections between processes. NIL was deliberately designed to allow dynamic modification of networks, where processes can be created dynamically, and also ports of the created process can be dynamically connected to ports of other processes, both at process creation time and during subsequent execution of the process. The way it does this is by making ports objects, so that the process of connecting ports happens at run-time, under control of what are called "capabilities".

Just as in FBP, the only way processes can affect each other is via the communication channels. Strom and Yemini (in a recent paper on NIL) point out that this fact, plus an extension to strong typing in the NIL language called "typestate checking", provides a high degree of security. Since NIL has its own compiler, it can enforce typestate checking, and thus eliminate the risk of moving data to an address defined by an uninitialized pointer. DFDM, on the other hand, was explicitly designed to interface with existing languages: S/370 Assembler, PL/I and COBOL, and THREADS is C-based, so this risk is present with these implementations, but it can be minimized by good programming techniques and inspections. DFDM and THREADS also adopt the strategy of invalidating a pointer to an IP after that IP has been disposed of - this ensures that an erroneous attempt to access that IP afterwards will cause an immediate crash. Like the NIL group, we also found that this kind of environment provides very good isolation between processes. I believe there were very few cases where one process damaged another one's code or data.

Strom's group has since developed a follow-on to NIL, called Hermes (Strom et al. 1991) because Hermes was the "process-server of the gods"! They are currently building a "global desktop" which will allow applications to be developed graphically. This is reminiscent of the work we did generating running programs from diagrams, alluded to in Chapter 2, but the global desktop will also allow programs to be connected up and reconfigured while they are actually running. It is designed to be used by people who may or may not be knowledgeable about programming. Strom's team has also been doing interesting work on what they call "Optimistic Recovery" - recovery strategies based on the idea that failures are rarer than successes, so one should go ahead with logic on the assumption that things will work and only undo it if there is a failure. Their infrastructure keeps the required information so programmers really don't have to think about recovery.

In the previous chapter I mentioned Rob Strom's remark about his team's decision to think of themselves as part of the OO world. In that chapter I also alluded to FBP-like work coming out of the OO fraternity, such as the media objects of Nierstrasz et al. (1992).

Parallel Logic Programming

There is an ever-growing series of parallel logic programming systems like Parlog, Vulcan, and a number of projects proceeding under the aegis of ICOT in Japan. PROLOG itself can be combined with FBP to give some very interesting capabilities, so it is not surprising that some of these projects are starting to look a lot like FBP: for example, I find A'UM by K. Yoshida and T. Chikayama (1988) very interesting. Incidentally this article has an excellent bibliography. The subtitle of this article is itself quite evocative: A Stream-Based Concurrent Object-Oriented Language - all the good words in a single title!

This same article started me thinking again about "streams". In A'UM and some of the other systems related to it, a distinction is made between "streams" and "channels". If I understand it right, in A'UM, a "stream" runs from one source to one destination, whereas a "channel" may contain more than one stream, coming from different sources: the items in each stream must stay in sequence relative to each other, but the streams in a channel are not constrained relative to each other. In A'UM only one reader is allowed for a channel, while in Tribble's paper on channels (Tribble et al. 1987), he allows multiple readers for a channel. The authors of A'UM feel that not allowing multiple readers makes their semantics sounder and the implementation simpler. Our experience tends to support this view.

In FBP we define a stream as the set of data IPs passing across one connection, but we also allow multiple sources (but only one destination). It may well be that the distinction between stream and channel is more rigorous than the FBP concept, and I look forward to seeing how these concepts evolve. We also pragmatically allow a stream which is accepted by one process to be passed on to the next, e.g. a stream of detail records might flow from a Reader, through an Edit, and on to a Process component. Some writers might prefer to call these multiple streams which just happen to contain the same data. I admit in hindsight that our concept of streams is a little fuzzy at the edges, but I feel it has never caused confusion in practice, and has been implemented in all of the FBP dialects. Multiple destinations, on the other hand, have never been implemented in any of our implementations, partly because it is not clear whether the data should be replicated or should be assigned randomly to the receivers, like Linda's piranhas (see below) - in any case, both solutions can be realized very easily by means of generalized components.

Hewitt's Actors take processes down to the finest possible granularity: "Hewitt declared", to quote Robin Milner (1993), "that a value, an operator on values, and a process should all be the same kind of thing: an actor." This approach has considerable theoretical attractiveness, but in my view, to be practical, it basically has to be implemented as hardware, rather than software. There are also of course a number of projects growing out of Hewitt's Actors, which also seem to be on a converging path with all the other work (albeit at the more granular end of the scale), e.g. Agha's COOP (1990).

BSP

In Chapter 1, I said I would describe L.G. Valiant's work (1990) in a little more detail, so this is as good a time as any! BSP stands for "bulk-synchronous parallel", and it is of considerable interest because the author proposes it as a new "bridging model", to replace the current, von Neumann bridging model. He stresses that he is proposing it neither as a hardware nor a programming model, but to "insulate software and hardware development from each other, and make possible both general purpose machines and transportable software." The BSP model is defined as comprising the following three attributes:

A number of components,.....
A router which delivers messages point to point between pairs of components,
Facilities for synchronizing all or a subset of components at regular intervals of L time units, where L is the periodicity parameter. A computation consists of a sequence of supersteps. .... After each period of L time units, a global check is made to determine whether the superstep has been completed by all the components. If it has, the machine proceeds to the next superstep. Otherwise, the next period of L units is allocated to the unfinished superstep.

Now look back at Chapter 20, and you can see that Valiant's third attribute is very similar to FBP's use of subnets in checkpointing, except that he checks for completion on a regular basis - in FBP implementations, we usually count the processes, and then wait until that number have terminated. Otherwise it is very similar.

Valiant describes a variety of implementations, plus their appropriate performance measures, such as packet switching networks, a hypercube-connected computer and optical crossbars. Here is an interesting comment in his conclusion: "... if the relative investment in communication hardware were suitably increased, machines with a new level of programmability would be obtained." Note the juxtaposition: not just improved performance, but improved programmability.

UNIX and its descendants

There is another group of related approaches, based on the very popular UNIX(tm) system. In these systems, the connectivity seems to be less rich, but the data passing between the processes is more like IPs or file records than messages. UNIX supports the concept of "pipelining", where the output of one process becomes the input of another, and so on repeatedly. This is definitely a form of configurable modularity, and I found a lot of their experience using this technique relates closely to things we discovered using FBP.

You will find the word "pipe" used quite often for what we call "connection" in FBP. In UNIX the '|' operator represents what it calls a "pipe". Processes can be assembled into working systems by connecting them together using this operator. For instance, suppose a user enters:

ls | pr -2 | lpr

The effect is for 'ls', 'pr' and 'lpr' to be assembled on the fly in such a way that the "standard output" of 'ls' feeds the "standard input" of 'pr', and so on. So this command means "list the files in the current directory; format the result 2-up and send the results to a line printer". This is very similar to the interpreted mode of DFDM and THREADS. UNIX's equivalents to ports are the file descriptors 0 (standard input) and 1 (standard output), which are automatically open whenever a process gets control. What flows between the UNIX processes is streams of characters, rather than structured IPs, so the metaphor is not as powerful as FBP's, nor does UNIX pipelining support complex networks. On the other hand UNIX's character string orientation makes it a very suitable for text manipulation, and a large number of the well-known UNIX components address this application area.

OS/2 Interprocess Communication

Orfali and Harkey (1991) list four techniques for interprocess communication in OS/2: anonymous pipes, named pipes, queues and shared memory. Anonymous pipes can be accessed by means of a write handle and a read handle and are mostly used by parent processes to communicate with their descendants, while named pipes allow communication between unrelated processes. Named pipes are accessed using standard OS/2 file services, so code does not have to distinguish whether it is writing to a file or to a named pipe - this will be determined by the file name.

CMS Pipelines

This system was developed by John Hartmann of IBM Denmark for the CMS environment. It is also consciously modelled on UNIX, but since it is specialized for the CMS environment, it is record-oriented, rather than byte-oriented. It supports more complex topologies than UNIX does by means of a notation for ending one pipeline and starting a new one attached to a previous "stage" in the pipeline definition (using a label notation). I therefore see it as a halfway house between UNIX and FBP (again developed independently of FBP). A program may also dynamically redefine the pipeline topology by replacing itself or one of its neighbours with a newly defined pipeline.

Here is an example of a CMS Pipelines pipeline:

pipe < logged users | split , | take 5 | console

This is a CMS command to set up a pipeline which reads the file called LOGGED USERS, splits each record into multiple records, using the comma as the delimiter, selects the first 5, and displays them at the console.

Pipelines components are written in REXX, using a SUBCOM environment for the "pipe" services.

CSP (Communicating Sequential Processes)

This seminal work by Tony Hoare (1978) has been the basis for a large amount of work by other writers. Here, the external specification indicates which processes are running concurrently (using the '||' operator to indicate parallel execution), not how they are connected. The actual connectivity is implied by the send and receive commands, which must explicitly name the process being communicated with. Connections are assumed to have zero capacity.

As an example of the CSP coordination notation, here is Hoare's notation, given in his article, for what I have referred to above as the Telegram problem, as follows:

[west::DISASSEMBLE||X::COPY||east::ASSEMBLE]

Here "west", "X" and "east" are process names, while the capitalized names stand for program sections (analogous to FBP components).

The problem is that DISASSEMBLE has to know the name "X", COPY has to know the names "west" and "east", etc. Hoare's orientation seems to be towards 'write new', rather than reuse. He does mention port names as a possible solution to this problem, but doesn't stress the fundamental paradigm shift involved in changing from 'write new' to reuse, nor the importance of finding a good notation for combining black box components.

Interestingly, he also makes the same point that Carriero and Gelernter made about a subroutine being a special case of a coroutine.

You will notice the frequent occurrence of processes, connections and sometimes ports, with different names being used from system to system. Also configurable modularity at any more complex level than that of UNIX requires some agreement of names (or numbers) between the outsides and insides of processes. NIL avoids this by making ports local to a process, but allowing a parent process to pass information about connections to the child process in the form of parameters.

Linda

Now let's move off in a different direction: Carriero and Gelernter, whom I have mentioned above, have developed and written extensively about a very interesting system called Linda (1989), which has stirred up a lot of interest in academic circles. Instead of IPs, Linda uses "tuples", ordered lists of values. "Tuples" are created in "tuple space", just as FBP IPs are created in space managed by FBP, not by the components. Unlike FBP's IPs, however, a tuple just floats in tuple space until retrieved by a process which knows its identification (or part of it). Professor Gelernter uses a neat analogy in a recent Scientific American article to explain his concept of a "tuple". Imagine two spacemen working in space building a space station: one of the workers has finished working with a wrench and wishes it to be available for other workers - he or she can put it "down" (so that it follows its own orbit in space), and the other worker can then pick it up whenever convenient. In the same way, tuples or FBP IPs have an independent existence and follow their own orbit from one worker (process) to another.

How does the spaceman actually pick up the wrench? Here is chiefly where Linda diverges from FBP: in Linda, access is done by pattern matching. A process may need a tuple with value X in field Y - it just has to request a tuple matching those specifications, and it will eventually receive an appropriate tuple. Receiving a tuple can be destructive or non-destructive ("consume" or "read"). If more than one tuple matches the specification, the system picks one at random. If there are none, the requesting process is suspended. Values are not communicated back from the receive to the tuple.

One other important feature of Linda is that of "active" tuples - these are tuples which execute code at the moment of creation, and then turn into ordinary "passive" tuples. You can do distributed logic such as matrix operations this way, where each tuple does a calculation when it is created and then becomes a passive matrix element. Perhaps the nearest to this in FBP is something like the Subnet Manager, which takes a passive chunk of code and turns it into a process.

Another Linda image which is evocative is the "school of piranhas". Here a number of processes lie in wait for a tuple, and when it appears in the tuple space any one of them can grab it. This is an effective technique of load balancing. In Chapter 22, I described a performance improvement technique where we had 18 occurrences of a process executing the same disk traversal logic. In this case, we had to provide connections from the client to the 18 servers, so we had to have a "load balancing" process in between - the piranha technique would do this without the need for the extra process.

To summarize the essential difference between Linda and FBP, Linda is a bus while FBP is a tram - Linda has more degrees of freedom but I believe is more expensive in terms of resources. If one Linda process generates a series of tuples with an extra field containing numbers in ascending sequence, you could have another process request them in sequence. So Linda can simulate FBP. Conversely, FBP can just as easily simulate Linda's associative style of retrieval by having one or more processes manage an associative store. It seems to me that Linda and FBP are so closely related that systems of the future may combine both approaches. After all, there may be areas of an application where tracks are more appropriate, and others where you want more degrees of freedom - sort of like containers moving from rail to ship and back to rail. Here is a final quote from Gelernter and Carriero (1992):

"In general we see computation and programming languages as areas in which further progress will be slow, incremental and, in many cases, of marginal importance to working programmers. Coordination languages are a field of potentially great significance. A growing number of groups will play major roles in this work."

I couldn't agree more!

RIG, Accent, Mach

These are a series of network operating systems (Rashid 1988), each one evolving out of the previous one. RIG (Rochester's Intelligent Gateway) was an interprocess communication facility allowing processes to communicate by means of information packets. RIG allowed any process to access any other's port using a pair of integers (process number.port number). While this allowed port numbers to be passed around as data, it meant that port defined services could not easily be moved from one process to another. Also, if a process failed, this information could not be signalled back to processes depending on it. Processes therefore had to register their dependence on other processes with a special process which was notified of process death events (called the "grim reaper"). This makes me wonder if systems like Linda may not suffer from the same problem.

Accent evolved out of RIG, by defining ports to be capabilities as well as communication objects and adding an integrated virtual memory management system which allowed processes to transmit large objects (RIG could only transmit 2K bytes at a time). Here is a quote about Accent:

"Experience with Accent showed that a message based network operating system, properly designed, can compete with more traditional operating system organizations. The advantages of the approach are system extensibility, protection and network transparency."

Mach then evolved out of Accent because of the need in their environment for complete UNIX compatibility, and "to better accommodate the kind of general purpose shared-memory multiprocessors which appear to be on their way to becoming the successors to traditional general purpose uniprocessor workstations and timesharing systems." Among other things, Mach splits the Accent concept of "process" into "task" (basic unit of resource allocation) and "thread" (basic unit of CPU utilization).

Advanced Hardware Designs

Another characteristic of the three "legs" of FBP listed above is that they could actually describe a network of independent processors, all interacting to do a job of work. This approach to building super-powerful machines is getting a lot of attention lately, as it is generally recognized that the single processor is running out of steam. Although we can probably make individual processors smaller and faster (after all, a bit is simply a choice between two states, for instance two states of a molecule), you start to run into limitations such as the speed of light or the potential damage which can be caused by a single cosmic ray! A lot of work is going on on linking multiple processors together to achieve enormous amounts of computing power without any one processor having to be incredibly fast. Suppose we put 1000 processors together, each running at 20 MIPS (millions of instructions per second) - this would provide 20,000,000,000 instructions per second. Since such a machine is normally thought of as being oriented towards scientific calculations, so that the instructions would tend to be floating-point operations, this machine would be a 20 gigaflop machine. Multiply either of these factors by 50 (or one by 5 and one by 10), and you are into the "teraflop" per second range (1,000,000,000,000 floating point operations per second).

The two main approaches here are multiprocessors and multicomputers. I am indebted to my son, Joseph Morrison, for some of the following comments. A number of writers seem to favour multiprocessors (with shared memory) because they do not require us to radically change our approach to programming. The programming technique I have described in the foregoing pages seems to be a good match with this approach, as it can be mapped onto a multiprocessor in a straightforward manner: IPs are allocated from the shared memory, and FBP processes are spread across the available processors to obtain parallelism. All commercial multiprocessors provide concurrency control mechanisms such as semaphores; these can be used to manage the concurrent accesses to the IPs. Examples of this type of machine are the KSR 1, CEDAR, DASH, T*, Alewife - this list is from (Bell 1992).

FBP networks also have a natural mapping to multicomputers. Here parallelism is obtained by having a network of connected processors, each with its own memory. The data must be transmitted from one processor to another, as required, so communication speed and bandwidth become important considerations. Examples of this type of machine are the Intel Paragon, CM5, Mercury, nCube 2 and Teradata/NCR/AT&T systems. A number of different network topologies have been investigated - examples are meshes, tree structures, hypercubes, and ring structures.

FBP could be mapped onto multicomputer systems by again evenly distributing the processes among the processors. An IP would be created in the local memory of the processor on which the creating process resides. If an IP had to be transferred to another processor, the entire IP could be copied over the communication network. Although this sounds inefficient, communication costs can be minimized by having "neighbour" processes reside in directly connected processors, or even in some cases time-share the same processor, where the economics justify it. There is a considerable body of work on different strategies for handling communication between processors, and for doing the routing when paths are tied up or damaged, and I was struck by how similar some of the problems they have to solve are to those we had to solve for FBP. I found the article by P. Gaughan and S. Yalamanchili (1993) a good survey, as well as providing some interesting solutions and simulation results. Of course, I am not a hardware person, but it does seem that some of the techniques described would support FBP quite nicely.

Most of the academic work with multiprocessor configurations seems to be oriented towards determining what parallelism can be obtained from a COBOL or FORTRAN program. However, MIT has a dataflow computer called Monsoon, which "demonstrates scalability and implicit parallelism using a dataflow language" (Bell 1992), to be followed by one called T* which will be "multithreaded to absorb network latency". Researchers at Berkeley are using a 64-node CM5 to explore various programming models and languages including dataflow. There is an enormous amount of research going on in the areas of multiprocessors and multicomputers. This is a whole area of specialization within computing research, and one which I expect I will never get to know very much about! However, a lot of people who do have some understanding of this area see a good match with FBP. Applications designed using FBP should map very naturally onto a system of communicating processors. If you have more processors than processes, this mapping should be pretty easy; if less, then some processors are going to have to time-share, just as the present implementations of FBP do today. Here is a quote from an article (Cann 1992) comparing FORTRAN with functional languages for programming tomorrow's massively parallel machines (remember that we related FBP to functional languages in an earlier chapter):

"Tomorrow's parallel machines will not only provide massive parallelism, but will present programmers with a combinatorial explosion of concerns and details. The imperative programming model will continue to hinder the exploitation of parallelism.

"In summary, the functional paradigm yields several important benefits. First, programs are more concise and easier to write and maintain. Second, programs are conducive to analysis, being mathematically sound and free of side effects and aliasing. Third, programs exhibit implicit parallelism and favour automatic parallelization. Fourth, programs can run as fast as, if not faster than, conventional languages."

One last point about the data flowing between these computing engines: a lot of the mathematically oriented work with big computers (and most of this work is mathematically oriented) seems to assume that what should travel between the processors is either values, like integers or strings, or messages. I actually started out with values in my early work, in 1967, but became convinced over the years that you should send whole "things" (entities, records or IPs), which stay together as they travel, rather than low-level datatypes (e.g. integers or strings) which are not in the application domain. Our experience is that, if you decompose a record into individual values, it may be so expensive to recombine them that it's not worth it.

FBP without FBP

Maybe now is the time to talk about how to do FBP without FBP software. This could be a stepping stone to full FBP for a number of companies. Many of the basic concepts can be simulated with conventional languages without any special software. A friend of mine took our AMPS course, but then went to work on a COBOL application. In those days we had no FBP support for COBOL, but he told us AMPS had helped him write better COBOL programs, by suggesting better ways to structure his code.

Of course, you could even use JCL, using files for your connections, if you didn't care about overhead. Remember that the steps in a job must execute in a fixed order, so you would have to string out your network into almost a straight line - and forget about loop-type networks. Another similar approach is to use secondary transactions in IMS - this has actually been tried several times, but the overhead prevents the granularity from being made fine enough to be really productive.

A number of Data Flow COBOL products available in Japan generate pure COBOL from data flow specifications, which can then be compiled and link edited like any other COBOL program. They support rather simple networks, but require no special run-time software.

An article (Kar 1989) written by a senior engineer with Intel Corporation, shows how to do calculations using data flow, rather than synchronous code. He gives the actual C code to do this. He sees what he calls "data flow multitasking" as a way that we can use today to write programs which will not only be easy to port to the powerful, multiprocessing machines coming on stream during the 90s (and he should know!), but which will take advantage of these machines' capabilities. In the summary section he says,

"Data-flow multitasking is a promising solution to the challenges software faces over the next decade. It involves looking at a sequential program through new eyes [my italics]. ... Data-flow multitasking is particularly relevant for real-time applications."

Along somewhat similar lines, some of my early data flow research was done using a single APL program to simulate the scheduler. In this case, different processes were implemented as sections of a single APL program, and control was transferred between them by using a computed "goto". This would be very easy to do in any HLL. Another approach would be to use single data areas to represent connections, together with a switch to indicate whether a given area is occupied or not. The scheduler could then cycle looking to see which process has data waiting to be processed. Wayne Stevens pointed out (1991) that non-looping FBP components are very similar to subroutines, and therefore networks consisting mostly of non-loopers should be easy to implement directly using a HLL.

At the beginning, I mentioned the growing use of these concepts for programming distributed systems. I would like to mention the recent IBM announcement of IBM Messaging and Queuing (MQSeries). Similar efforts by other companies are listed in a recent article (Moad 1993). IBM plans to bring out a set of products which will allow asynchronous communication between a large set of IBM and non-IBM platforms. There will be a standard interface based on the concept of queues, which will relieve programmers from the complexities of making different applications communicate between different vendors' hardware and software. Instead of having to use one set of macros for VTAM, different commands for CICS, still another set for IMS, all applications will use the same simple set of MQI (Messaging and Queueing Interface) calls. This concept seems to me to be completely compatible with the application structuring ideas presented in this book. In Chapter 15, we talked about the ability to "cleave" networks across systems - the combination of FBP and MQSeries or similar software should provide a very powerful way of splitting an application across multiple systems or locations, or of moving functions from one node to another as desired. As I said in that chapter, cleaving applications between different computers introduces new problems of synchronization, but they will definitely be solved! It doesn't make sense to try to pretend that a distributed system is one big, synchronous application. If you request data from a remote location, you want to be able to do other things while the data is working its way through the system. Systems are getting just too big. Also, many of the systems which are being connected "speak different languages", so we are seeing the development of standards which will allow them to interpret each others' data formats. I predict that those problems which will inevitably arise will be solved, but not necessarily by means of one general solution for every problem.

As you read this chapter, I hope you have got some impression of how ideas spring up independently in different places and times, how they flower in unexpected places, how they cross-pollinate and give rise to interesting new hybrids. You have to be a botanist to keep track of it all! There are many other concepts and languages, other than the few I have mentioned here, which have points in common with FBP, and which have certainly been influential in the field of computing, but there is not room in this book to do them justice. They have cross-fertilized each other and in many cases only industry archivists know which led to which. Examples which spring to mind are: SIMULA, MODULA-2 (and now MODULA-3), Concurrent Pascal, Ada, Lucid, Occam, ....

It would be wonderful if any readers who are expert in these different languages could share their knowledge and insights with the rest of us. I have occasionally dreamed of collecting all the developers and theoreticians of concurrent, stream-oriented, modular systems together in a huge symposium, and seeing if we can't get some synergy going. I have found that there is something about data-orientation which seems to allow practitioners of different disciplines to communicate (just as IPs do for different languages!). I sincerely hope that we won't waste our energy in internecine wars, as has happened in some disciplines, but that we will all be able to work together towards a shared goal. There are far more similarities than differences in our work, and if we can get a real dialogue going, who knows what we might achieve together!