FlowBasedProgramming | RecentChanges | Preferences

There is a very common type of batch program, called a MasterFileUpdate? (Update for short). It is likely that as much as 50% of all batch programs are updates, which amounts to a sizable number of computer cycles.

The simplest form of update program involves passing a file of detail records against a master file. Details will typically be adds, changes or deletes, and will be matched against the master file records on the basis of a key which consists of one or more fields, usually alphameric. The details may be sorted in the order add, change, delete, within a given key value, or delete, add, change, depending on whether one wants to be able to add a record of a given key in a given run, and then delete it again, or the reverse (delete it, and the add it in the same run).

Now, strange as it may seem, there has never been a piece of reusable code that facilitates writing this kind of program. Instead, there is a diagram showing the canonical form of this program, usually called the "balance line" technique. This logic diagram uses a number of switches, is hard to understand and harder still to modify. Now consider that this diagram only represents the very basic logic of an update. Update programs in real life are much more complex: they may have multiple input files, multiple control breaks (for totals and subtotals), and various control reports. Of course, the basic problem is that we are coding this program to run on a von Neumann machine, so the exact sequence of every event has to be preplanned, with the result that the difficulty of coordinating data use and modification in such an environment far outweighs the complexity of the "real" business logic... "When you are up to your a** in alligators..."

Now, rather like the TelegramProblem, it turns out that it is much easier to code such a program in the FBP environment. Not only that, but a reusable software component is available to facilitate the task even more. This component is not immediately obvious from the description of the task, but, once invented/discovered, you can't do without it!

Modelled on the Unit Record "collator" machine, this component takes two streams of data records and merges them based on a shared key or keys. If you think of this as an independent "machine" being fed from two "file read" machines, and in turn feeding a downstream machine or machines, clearly the job of the downstream machine(s) has been vastly simplified by the fact that it or they see a single combined stream of masters and details, rather than having to try to correlate two separate streams.


|      | 
| Read |---------*
|      |         |
*------*         |        *---------* 
                 *------->|         |
                          | Collate |----------> .........
                 *------->|         |
*------*         |        *---------*
|      |         |
| Read |---------* 
|      |

If the input streams are respectively

M1 M2 M3 M4 M6 ....


D1 D1 D2 D5 D5 ....

the output of the Collate would be

M1 D1 D1 M2 D2 M3 M4 D5 D5 M6 ....

where the numbers following the M's and D's represent key values. Note that you can get M's without D's, and D's without M's...

This suggests a natural generalization of the function of Collate: since it has to use key values to figure out how to merge the streams, why not have it use this same information to insert delimiter InformationPackets (IPs)? The earlier implementations of FBP used "break" IPs indicating various levels (e.g. major, intermediate, minor), but later implementations used "brackets", where the IP only has to indicate whether it is an open or closed bracket. So the Collate output shown above, now becomes (assuming only one level of grouping):

( M1 D1 D1 ) ( M2 D2 ) ( M3 ) ( M4 ) ( D5 D5 ) ( M6 ....

If the key is composed of multiple fields, it will be found that this naturally leads to multiple nested groupings, e.g. country, state, city, street, etc. These higher levels sometimes appear in the master file as separate record types, but often in practice have to be deduced from the sequence of key values.

This relieves downstream components of the need to keep track of changes in key value - much of the logic in an update is related to key value changes, rather than actual incoming records, so Collate allows even that logic to be triggered by the arrival of an IP of a particular type.

One other obvious generalization of Collate is to allow it to handle other numbers of input streams than 2. One interesting case is that of 1 input stream - in this case, obviously we are not using the "merge" function, but the ability to insert bracket IPs is still very useful.

The logic for Collate is very simple: assume there are n input ports, labelled IN[0] to IN[n-1]. The output port is labelled OUT. For simplicity, assume a single key field at a fixed position in each incoming InformationPacket. Assume also an array of InformationPacket handles called iph.

Then we have, schematically:

for i = 0 to n - 1
   receive IP from IN[i] and store its handle in iph[i] 

while not end of stream on all input ports
   determine IP 'i' in iph with lowest key 
        (treating end of stream as greater than highest possible)
   send IP in iph[i] to OUT

   receive IP from IN[i] and store handle in iph[i] 

By observation, one can see that the number of outgoing IPs matches the number of incoming, as the initial receives on each input port are balanced by the last receives on each input port (as they detect end of stream).

FlowBasedProgramming | RecentChanges | Preferences
This page is read-only - contact owner for a password | View other revisions
Last edited November 9, 2004 9:52 pm by PaulMorrison (diff)