This is a classical programming problem, originally described by PeterNaur, commonly known as the "Telegram problem". It affords a very simple illustration of the difference between FBP and conventional styles of programming - it is a surprisingly hard problem in conventional programming, but relatively easy in FBP.
The task is to write a program which accepts lines of text and generates output lines of a different length, without splitting any of the words in the text (we assume no word is longer than the size of the output lines). This turns out to be surprisingly hard to do in conventional programming, and therefore is often used as an example in courses on conventional programming. Unless the student realizes that neither the input nor the output logic should be the main line, but the main line has to be a separate piece of code, whose main job is to process a word at a time, the student finds him or herself getting snarled in a lot of confused logic.
In FBP, it is much more obvious how to approach the problem, for the following reasons:
So we have established that we should have IPs represent words somewhere in the application. We should have a Read Sequential component on the left of the network, reading input records from a file, and a Write Sequential component writing the new records onto an output file. Here is a partial network:
*-------* *-------* | | | | | RSEQ |-- ?? ........................ ?? --| WSEQ | | | words | | *-------* *-------*
Now the output of Read Sequential and the input of Write Sequential both consist of streams of IPs (records) containing words and blank space, so it seems reasonable that what we need, at minimum, is a component to decompose records into words and a matching one to recompose words back into records. Given the problem as defined, I do not see a need for any more components, but I want to stress at this point that there is no single right answer. What you select as your basic black boxes depends on how much they are going to be used versus how much it costs to create them.
Let us add our two new components into the picture:
*-------* *-------* *-------* *-------* | | | | | | | | | RSEQ |--->| DC |--------->| RC |--->| WSEQ | | | | | words | | | | *-------* *-------* *-------* *-------*
Now we have another matched pair of components - in the diagram I have labelled them DC (for Decompose) and RC for Recompose).
One additional point has to do with Parametrization?. The Write Sequential and Recompose components will have to know what size records to create - this can be read from a file descriptor, or from a parameter specified with the network definition. A useful tool in FBP is the concept of InitialInformationPacket (IIP). This will typically also be used for the file designations or paths for the occurrences of the Read Sequential and Write Sequential components.
Now we have a matched pair of useful components, but, of course, you don't have to use them together all the time. Let's suppose we simply want to count the number of words in a piece of text. We have already mentioned the Count component in the Concepts chapter - it simply counts all its incoming IPs and generates a count IP at the end. This type of component is sometimes called a "reference" component, meaning that the original input IPs are passed through unchanged, while some derived information is sent out of another output port. So the resulting structure will look something like this:
*-------* *-------* *-------* *-------* | | | | | | | | | RSEQ |--->| DC |--------->| COUNT | *-->| WSEQ | | | | | words | | | | | *-------* *-------* *-------* | *-------* | | *------* COUNTWe can keep on adding or changing processes indefinitely, and yet the mental model stays very natural and straight-forward!
To see how one might document such a diagram using (enhanced) wiki notation, see TelegramProblemWR.
In conventional programming, something has to be the "boss". From the above discussion, we see that words are the key concept needed to make this problem tractable. Once we realize this, we can go ahead and code it up, using something like the following call hierarchy:
*-----------* | | | MAIN | | | *-----------* | *--------------*---------------* | | *-----------* *-----------* | | | | | GETWORD | | PUTWORD | | | | | *-----------* *-----------* | | | | *-----------* *-----------* | | | | | GETREC | | PUTREC | | | | | *-----------* *-----------*
As we have said above, it is not at all obvious at first in conventional programming that this is the right way to tackle this job. Most people who tackle this problem start off by making GETWORD or PUTWORD the "boss" program, and promptly get into trouble. So we now realize that we have to "bring in a boss from outside" (call it MAIN), instead of "promoting" GETWORD or PUTWORD to boss. Now MAIN can call GETWORD and PUTWORD to retrieve and store a single word at a time, respectively. To do this GETWORD must in turn call GETREC and PUTWORD must call PUTREC, to look after the I/O. Now note that all four of these subroutines have to "keep their place" in streams of data (streams of words or streams of records). In the old days we did this by writing them all as non-reentrant code, so the place-holder information essentially became global information. This is quite correctly frowned on as poor programming practice (see GlobalVariable), as it has a number of significant disadvantages, so today we normally manage this kind of place-holding logic using the concept of "handles".
The general idea (for those of you who haven't had to struggle with this kind of logic) is for MAIN to pass a null handle (in many systems this will be a pointer which is initially set to zero) to GETWORD. When GETWORD sees the null handle, it allocates a block of storage and puts its address into the handle. Thereafter it uses this block of storage indirectly via the handle, and at the end of the run, frees it up again. Although this block of storage is allocated and freed by GETWORD, it is considered to be owned by MAIN. This same logic is also used between MAIN and PUTWORD, between GETWORD and GETREC, and between PUTWORD and PUTREC.
At the basic level, our problem is that subroutines cannot maintain internal information which lasts longer than one invocation. In contrast, FBP components are long-running objects which maintain their own internal information. They do not have to be continually reinvoked - you just start them up and they run until their input streams are exhausted. Thus, FBP components can do the same things subroutines do, but in a way that is more robust and, something of considerable interest for our future needs, that is also more distributable. It is important to note that FBP does not prevent us from using subroutines, but my experience is that they are most appropriate for such tasks as mathematical calculations on a few variables, doing look-ups in tables, accessing data bases, and so on - in other words tasks which match as closely as possible the mathematical idea of a function. Such subroutines are said to be "side-effect free", and experience has shown that side-effects are one of the most common causes of programming bugs. Hence subroutines which rely on side-effects for their proper functioning are a pretty poor basis on which to build sophisticated software!
Here is an OO "collaboration diagram" for this problem as it would normally be implemented using a conventional ("passive object") approach: