Problem-Oriented Mini-Languages

Flow-Based Programming - Chap. XVII

Problem-Oriented Mini-Languages

This chapter has been excerpted from the book "Flow-Based Programming: A New Approach to Application Development" (van Nostrand Reinhold, 1994), by J.Paul Morrison.

To find out more about FBP, click on FBP . This will also tell you how to order a copy.

For definitions of FBP terms, see Glossary

Material from book starts here:

"One of the greatest advantages of little languages is that one processor's input can be another processor's output" - Jon Bentley (1988)

"Another lesson we should have learned from the recent past is that the development of 'richer' or 'more powerful' programming languages was a mistake in the sense that these baroque monstrosities, these conglomerations of idiosyncrasies, are really unmanageable, both mechanically and mentally. I see a great future for very systematic and very modest programming languages" - E.W. Dijkstra (1972)

Problem-oriented mini-languages can be thought of as essentially an extension of the types of parametrization we talked about in Chapter 5. Bentley, like many other writers, has pointed out the importance of finding the right language for a programmer to express his or her requirements. This language should match as closely as possible the language used in the problem domain. If you view the programmer's job as being a process of mapping the user's language to a language understood by the machine, then clearly this job becomes easier the closer the two languages are to each other.

You may have gathered, correctly, that I am not overly excited about the HLLs (Higher-Level Languages) of today. They appear to me to occupy an uncomfortable middle ground between more elegant options at each end of the spectrum. At the low end: machine code - Assembler is fine; at the high end: reusable "black box" modules. Some jobs can really only be done in Assembler (although C is moving in on its turf) - the problem is portability. Black box modules solve the portability problem much more effectively than HLLs - you just port at the function level, rather than at the code level. Historically, HLLs evolved out of "small" languages like FORTRAN by the gradual accretion of features as people tried to extend the languages to new application areas. Most of them are on firmest ground when dealing with arithmetic because here they had traditional notations and experience to build on.

In my opinion, there are three major problem areas common to almost all HLLs:

the data types of variables,
constants
the basically synchronous nature of these languages.

We have already talked about variables in the context of the usual computer concept of storage. Data types we have talked about in the chapter on Descriptors. Now let's talk about constants.

Constants are usually not (constant, that is)! Constants are vastly overused, mostly because it is so easy to hard-code a constant into a program. To take an extreme position, I would like to see constants only used when they represent the structure of the universe, such as Planck's constant or , or conversion rates between, say, kilometres and miles - they're not likely to change. A case can also be made for allowing metadata (data about data) although much of this can be avoided by judicious use of descriptors and generalized conversion routines. I also vote for zero and one - they're too useful (for things like clearing and incrementing counters) to give up! Also to one and zero let's add their Boolean cousins, "true" and "false". But that's the lot!

An ex-colleague of mine claims that he knew someone who wrote a program for a company which had exactly 365 employees. You guessed it - an employee left the company, and all the date calculations were off! While this story is probably apocryphal, there is an interesting point here: the only way it could have happened is if the program used the same constant for both numbers. Many shops ban literals, in an attempt to reduce this kind of thing, whereas, of course, it only helps if people are also encouraged to use meaningful symbol names. If, as many programmers do, the constant in question was labelled F365, it would actually make this kind of error more likely.

Literals would actually have prevented this kind of thing from happening, whereas calling a constant F365 works the other way! One of the constants should have been NUMBER_OF_DAYS_IN_YEAR (that's non-leap, of course), and the other one shouldn't have been a constant at all! Remember: most constants aren't!

So far we have mostly talked about numeric values. There is another class of constants that one runs into quite often in programs: strings which identify entities or objects. Consider a test like

IF PROVINCE = 'ONT'.....

I would argue for two reasons that this shouldn't be used: the first point is that in HLLs we are forced to compare two character strings - whereas what we would like to ask is (in English):

If this province is Ontario,....

This may not look very different, but in the first case we are dealing with how affiliation with Ontario is encoded in a particular field; in the second, we are asking if the entity being referenced is the entity Ontario (with all its connotations). Smalltalk is better in this regard: PROVINCE would be an attribute of one object which contains the handle of another object (ontario) belonging to the class Province, and you can ask if two objects are the same object (==). Another programmer, or even a different field of the same object which needs to reference Ontario, might specify it using the value 2, so you would spend all sorts of machine resources converting back and forth between 2 and 'ONT'. Nan Shu has identified conversions between codes as one of the most common functions performed by business application code (Shu 1985).

The second point is more subtle: even if you can refer to unique objects as a whole, rather than by an indirect encoding, should you? Consider the following sort of code, which we often find in business applications:

IF PROVINCE = 'NB' OR PROVINCE = 'PEI' OR PROVINCE = 'NS'...

If Canada adds a new province, how do you find all the lists like the above, and how do you decide if the new province should be added or not? What is the concept that makes this set of provinces different from all the others? This is an example of a very common problem with code which is probably responsible for significant maintenance costs in shops around the world. At this point you will probably realize that we haven't done anything yet in FBP to prevent people doing this. And as long as we are stuck with today's computers as the underlying engine, we probably can't, unless we ban constants entirely! What we can do is provide tools to help with this kind of situation (i.e. support the logic which people are trying to implement), and raise programmers' consciousnesses by means of walk-throughs, inspections, apprenticeship or buddy systems, or whatever.

The general concept which I would like to see known and used more widely is what IBM's Bucky Pope calls "class codes". He suggests you first ask what is the underlying concept behind the list; you then build a table or data base, and implement the concept as an attribute of the entities (in this case, provinces). So the above test becomes something like:

FIND PROVINCE IN PROVINCE_TABLE 

IF MARITIME, .....

The overhead goes up a bit, but maintenance costs go down drastically, and since it is now generally accepted that human time is a lot more expensive than machine time (Kendall 1977), it seems short-sighted to keep on perpetuating this type of code, and incurring the resultant costs. By the way, this principle applies equally if only one province happens to be mentioned in a particular list: how do you know that a new province won't show up which shares attributes with the one you've picked?

If we are going to (almost) ban constants from code, put entity attributes and what have been called "variable constants" on disk, why not put logic on disk as well? This gets us into the domain of rules-driven systems. Remember we said above that FBP gets rid of a lot of the non-business logic, so most of the remaining logic in code should be either general-purpose, e.g. the logic to Collate two data streams, or it should be business-related, e.g. IF INCOME > $50,000, CALCULATE SURTAX USING FORMULA ..... Now, both the criterion and the formula are certainly going to change, and on a regular basis, so why put them in code which requires programmers to change it, after which the changed programs have to be recompiled, approved, promoted, the old ones archived, and so on and so on? It seems much better to put the whole thing somewhere where it can be maintained more easily, and perhaps even by non-programmers. I believe that much business logic, perhaps almost all, can be put on disk and interpreted using simple interpreters.

One important question remains: whether to put our attribute tables, rules, etc., in load modules (created with a compiler and linkage editor) or in data files. The former is really a half-way house, as you still need programmers to maintain them, but at least the information is separate from code, so it can be shared, and is much easier to manage and control than when it is buried in many different code modules. However, I believe putting this kind of information into data bases is a still better solution, as it can be updated by non-programmers, e.g. clerical staff, and it can have separate and specific authorization.

In rules-driven systems, straight sequential logic may not be the best way to express the rules. PROLOG and its derivatives provide a very interesting approach to expressing logic. It should be stressed that logic programming is not incompatible with FBP - I once wrote an experimental component which drove a PROLOG set of rules to perform some logic tests on incoming IPs. The effect was like having a friendly data base, because you could ask questions like "list all the mothers older than 50" even when you had not stored the attribute "mother" explicitly in the data base (just tell PROLOG that a "mother" is anyone who is female and has offspring). The logic programming people have independently explored the possibilities of combining logic programming with parallelism, e.g. such languages as PARLOG.

Now, if we can put our rules out on disk, avoiding such perilous traps as variables and constants, and restrict the code proper to non-business-related logic, then it seems that there really isn't much role for today's HLLs!

So far, we have talked about the basics of most programming languages as being arithmetic and logic, but there exist languages today which address quite different domains. Over the years, we have seen a number of other kinds of specialized languages, such as SNOBOL (pattern-matching), or IPL V (list processing) appear and sometimes disappear. Since in conventional programming it is hard to make languages talk to each other, it is generally the richer ones which have survived. FBP, on the other hand, makes it much easier for languages to communicate, which suggests that what we may see is a larger number of more specialized languages communicating by means of data streams. Based on our experience with human languages, what would be very nice is if they could all share the same syntax, but work with different classes of objects (semantics). It has been found that humans have a lot of difficulty switching from one syntax to another, whereas we can hold many sets of words independently without getting them confused. This is supported by recent work with bi- and multi-lingual communities - people tend to use one syntax for both languages, or a hybrid, but they can keep the vocabularies quite well separated.

A colleague of mine makes the interesting point about the above scenario that it ties in with the new science of Chaos. If you consider the total application space as chaotic, then you will get areas of order and areas of disorder (Olson 1993). Each mini-language can handle its own area of order, and can communicate using standard interfaces (IPs) with other areas of order. Each mini-language defines its own paradigm - while no paradigm should be expected to do the whole job, judicious combination of many paradigms is often highly effective.

I would argue that, any time a set of parameters reaches a certain level of complexity, you are approaching a mini-language. The parameters to the IBM Sort utility almost constitute a language about sorting, and in fact IBM has added a free-form, HLL-like syntax to the older-style pointer list which it used before. Since decoding the sort parameters is relatively fast compared with the sort itself, it is reasonable to make the sort parameters as human-convenient as possible. The semantics of a set of sort control statements are quite simple, just referring to objects of interest to the Sort (and its user).

Earlier in this book I mentioned a prototyping system which I built, in which you could describe screen layouts in WYSIWYG form. For fun I added a graphics subsystem to it, which let you specify simple pictures. The problem was: what mental model would its users find convenient? I decided to use the idea of colour-filled polygons, which had been used successfully by the Canadian videotext system, Telidon. I later added the ability to have curved edges, as well as straight ones, plus various fill patterns, such as cross-hatching. While this choice of mental model may seem fairly obvious, I could have used bit maps, straight lines, or some other primitive, but I found the polygon idea seemed to be an especially good bridge between the artist and the computer. Not only was it a good match with the software I was using (GDDM), but also I was pleased to discover how fast I could develop a drawing or modify an existing one. The artist I was working with found that he was able to adapt to this medium, and I was impressed with the quality of what we turned out, working together! I use this perhaps rather simple example to make the point that it is the mental model which is important, and which makes the most difference in how usable people find a tool. Humans are visualizing creatures, and, if they have trouble developing a mental model of your tool or system, it will never become real to them, and they will have endless trouble with it.

If you have spent the last few decades labouring with a conventional HLL, by now you may be wondering how one can do anything without using variables or constants! Well, quite a bit, as it turns out. In a later chapter (Streams and Recursive Functions), we will talk about a style of programming which is called generically applicative or functional. When you combine the idea of functions with recursive definition, it turns out that you can express quite complex computations without ever using a variable (Burge 1975)! As for constants, as long as we put most of them on disk, I'll be quite happy! Of course, I am not suggesting that tax specialists have to learn recursive programming in order to be able to describe tax calculations to the computer, but, based on various experiences, I have a strong intuition that, by judiciously combining a number of the ideas described so far in this book and some that are about to be described, we could develop user-friendly languages which would be decades ahead of the rather user-hostile tools we are forced to use today.

In the next chapter I am going to describe some work I did on a suggested approach to a mini-language for describing business processes running in an FBP environment, taking advantage of the fact that there are other precoded FBP components (such as Collate) to do a lot of the hard work. It is applicative, in that it doesn't use variables but defines its outputs purely in terms of operations on its inputs. This is not supposed to be a definition of a complete language, but more of a sketch of how a language might look which breaks with many time-honoured but rather shop-worn traditional solutions.