FBP vs. FBP-inspired Systems
While a lot of credit is due the NoFlo team for bringing FBP to the attention of the computer world, what the developers of NoFlo call "FBP" in fact differs in a number of respects from FBP as it has evolved over the last 40+ years. While NoFlo shares with FBP a number of technical and philosophical ideas, NoFlo is much more similar to what we now call "conventional" programming - procedural, algorithmic, one-thing-at-a-time - and does not truly embody the "FBP paradigm shift", in which application development can be thought of as like designing a data processing "factory". The latter is a very different way of looking at application development.
Ali Razeen, at Duke University, has pointed out, in a very insightful 2015 note, that a number of people have now built software which has the componentry and "configurable modularity" features only of FBP, usually in combination with some visual representation, and assume they have built an FBP implementation. He then goes on to say that these should not be viewed as true FBP implementations, as they are missing some key characteristics of true FBP - mainly, asynchronism and information packets with unique ownership and lifetime - and so typically miss out on the critical paradigm shift... and a number of its attendant benefits. NoFlo is an example of this type of system. Because of the proliferation of such packages, we will use the term "FBP-inspired" (as suggested by Joe Witt of HortonWorks) when it is necessary to distinguish between them and FBP proper. You may also see the phrase "classical FBP" showing up from time to time, particularly in discussions with proponents of FBP-inspired systems.
We will be using the term "von Neumann paradigm" from time time. For those unfamiliar with the term, it refers to a computer design where a single instruction counter walks through a program accessing a uniform array of non-destructive-readout memory cells. This has in fact been the standard computer architecture for several decades, but people are increasingly finding it inadequate for today's challenges, as shown by frequent cost and schedule overruns, weird bugs, and difficulty maintaining large applications. More and more writers have started to point out that these problems derive in large part from the architecture itself. Unfortunately programmers are exposed to this approach from the very start, and have a great deal of difficulty breaking loose from it! Ken Kan has pointed out this quote from Edsger Dijkstra (thanks, Ken!):
It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.
With all due respect to Dijkstra, it's not just BASIC! I have frequently detected a certain degree of nervousness on the part of many programmers encountering FBP for the first time, at not being able to control the exact timing of every event in a running application! This is in part due to the very sensitive nature of the von Neumann storage model, and the fact that it confuses data with its storage medium.
I have been wrestling with how best to convey the difference between the old "von Neumann" storage mental model and that of Flow-Based Programming, and I am starting to think that the description in Chap. 3 of the book, "Flow-Based Programming", says it best. Since it is a little long for this essay, I would ask the reader to click on Chap. 3 - Concepts online, or look it up in their copy of the book - do a find on Fig. 3.1, and continue from there.
We sometimes refer to FBP as a "new/old" paradigm, because in fact its approach and methodology has parallels with Unit Record systems, which were used for the first data processing applications and were highly asynchronous and component-oriented. When these applications started being replaced by computers, which seemed so much more powerful, a lot of useful concepts were lost... which FBP is now reintroducing.
An application built using FBP may be thought of as a "data processing factory": a network of independent "machines", communicating by means of conveyor belts, across which travel structured chunks of data, which are modified by successive "machines" until they are output to files or discarded. The various "machines" run in parallel, or interleaved, as determined by the number of processors in the machine. It should be pointed out that this same image can be applied to networks of computers or other devices - Wayne Stevens pointed out that FBP provides a "consistent application view" from "maxi" to "mini". Granted each FBP process is a von Neumann program, but it runs independently of all other processes, and so tends to be quite simple internally. Almost all of the data that an FBP process deals with is held in "information packets" (IPs) or in method local storage. Unlike in conventional programming, the programmer does not have to worry about controlling the exact sequence of events - all s/he needs to concentrate on is the transformations that apply to the data to convert the original inputs to the desired output.
More importantly, the ways data is viewed in FBP vs. conventional programming (as well as many FBP-inspired systems) are completely different: in FBP, data is managed in packets (IPs), which have a well-defined lifetime, from creation to destruction, and can only be owned by one process at a time, or be in transit between processes - just like real-life objects. In conventional programming, data does not have a well-defined lifetime or clear ownership, as the data is confused with its storage medium. This, in combination with the single-threaded restriction, leads to many of the weird bugs that bedevil today's complex systems, as it is so sensitive to the exact timing of events that a minor timing error can have catastrophic results!
FBP supports data processing applications (business or scientific), typically long-running and high volume, and, as we have shown, involves a way of thinking (the new "paradigm") that is fundamentally different from that of conventional programming. This paradigm is actually more similar to engineering than to conventional programming, and, not surprisingly, involves a period of what might be called "apprenticeship", during which the practitioner is getting comfortable using its concepts. Conventional programming, by comparison, is as if you gave an engineer a bunch of blueprints and some girders, and told him or her to go build a bridge! It's not surprising that so many systems built using conventional technologies in recent years have suffered from cost overruns, logic glitches, etc., etc., and the problem is getting worse!
While data-oriented models have been used for application design for a number of years, up until now there was no easy way of converting these designs into running programs. Programmers could indeed design systems using data-oriented thinking, but then had to laboriously convert these designs into procedural code. In comparison, FBP provides a seamless transition from design to implementation, and our experience with it shows that it results in more maintainable and in fact better performing systems. It also facilitates communication between designers, programmers, maintenance staff and users. One large program written using an early ("green thread") implementation of FBP had been running in production for almost 40 years (as of the beginning of 2014), processing millions of transactions a night, while undergoing continuous maintenance during all that time, often by people who weren't even born when it was written!
While an FBP process is a "black box" component with its own internal environment and control thread, a NoFlo process is essentially a cloud of callbacks linked by instance variables. By comparison, the FBP mental model is much simpler - indeed, very similar to that of conventional programming - where basically each process has a single high-level method, which can then call subroutines in the regular way, since each process has its own independent call stack. There is then no confusion between the method's local storage and the process object's instance variables. Henri Bergius was able to simulate many FBP-inspired characteristics on the Node.js infrastructure, but some rather basic, and necessary, FBP techniques have no obvious counterpart in NoFlo. For instance, basic FBP business functions such as "Collate" require a process to be specific about which port it wants to receive from, and to be able to suspend until data arrives at that port - this function, or something similar, is being introduced gradually into NoFlo, but it logically requires a related architectural concept, missing from NoFlo, called "back pressure", where an upstream process will be suspended if the connection it feeds into becomes full. One other strange (from an FBP point of view) restriction is that, in the NoFlo world, a process can only send or receive from the highest level method - this becomes obvious when you think about how callbacks manipulate the stack.
In a major divergence from classical FBP, mentioned above, NoFlo lacks the concept of information packet (IP) "lifetimes", by which an IP is tracked from creation to destruction and can only be "owned" by a single process at a time, or be in transit between processes - mainly because they are still stuck in the von Neumann concept of data as a set of anonymous pigeon-holes - confusing the data as "object" with the "location" of the data. This in fact is the reason so many subtle bugs show up in conventional programs. This also explains NoFlo's insistence on allowing one output port to connect to multiple input ports, implying automatic replication of data. If data is seen as an "object", this makes very little sense, just like being able to send one pigeon to multiple destinations at the same time! Or having a single soft-drink bottle pass through two different machines at the same time! Conversely if your view of data is not as an object, you will see nothing wrong with this image. Here is a description from Henri Bergius on how the basic send/receive linkage works in NoFlo:
and with regard to "back pressure":
Right now the NoFlo buffers are only limited by system memory.
Adding limits and backpressure is certainly something to consider down the line. Hasn't really been a consideration for things NoFlo is usually used for, though.
A consequence of this is that NoFlo requires that all data be processed by one node before being passed on to the next. This becomes prohibitive if we are dealing with large volumes of data packets. Back pressure is the only way I am aware of that allows "infinite" amounts of data to be processed using finite resources! The NoFlo team tells me that they have been making changes to NoFlo to bring it closer to FBP, so we shall see what the future brings.
It is too easy to just make FBP work for JS, but what we really want to do is make JS work for FBP!
Because FBP is not trying to cram all sorts of logic into a single thread, it has actually a fairly simple set of scheduling rules, and FBP components have a fairly simple internal structure. In general, processes in FBP-inspired systems cannot decide which input port to receive data from (and possibkly be suspended). I therefore thought I would compare one commonly used component in classical FBP against the same function written in NoFlo. The result is in "Concat" Component. Here is my conclusion, at the end of this article:
I may be biassed, but I definitely prefer the FBP version...
For those wishing to gain experience with FBP, there is no substitute for reading the book (Flow-based Programming, 2nd edition), and then starting to use one of the FBP implementations such as JavaFBP, C#FBP or JSFBP, or even the C++/Boost implementation currently under development, as described on the FBP web site. JavaFBP has the advantage of being closely integrated with a powerful diagramming tool, called DrawFBP, although DrawFBP can support any data flow language - and indeed can support high-level, language-independent, design as well.
For the time being, users wishing to work with FBP can code up networks using JavaFBP, C#FBP or CppFBP by hand, or JSFBP. Alternatively, they can use the DrawFBP drawing tool, written using Java Swing, which is also quite general, and can in fact generate networks for JavaFBP and C#FBP, as well as the .fbp notation used by NoFlo and CppFBP, plus NoFlo JSON networks. If JavaFBP is chosen, DrawFBP can load any chosen components, display its description and ports, and even check whether all required ports are connected.
While DrawFBP does not support run-time network execution, the networks it generates are complete programs. Its diagrams are stored in XML format, and additional generators can be added easily, or users can build their own generators using the XML format as input. DrawFBP also has the capability of carving out a piece of a network and converting it into a subnet.
FBP and OO
For a discussion of the differences and similarities between FBP and OO, see Comparison between FBP and Object-Oriented Programming (Chapter 25 of the 2nd edition).