I have started another page called VorlathDiscussion --pm
For an explanation of the following by JohnCowan in terms of the 4-colour theorem, do a find on "Cowan."
I thought I'd put my ideas here for anyone interested. If anyone wants my real info, leave a request with contact info. I'm also on MSN.
I am rewriting this right now. Feel free to leave comments. I will be checking this page once more before the final change.
These are the same thing. This might be a shocking statement, but there is no reason why one can't naturally evolve into the other.
At the lowest level in machine code, instructions read in values, do an operation on them and then spits out a result. If you'll notice, this involves multiple steps even at this low level. The first step involves reading in the instruction itself. The second involves reading in the values. The third involves executing the operation itself such as addition for example. The last step involves outputing the result. Instead of waiting for all four steps to complete before executing the next machine instruction, we can start to execute the next one as soon as the first step in completed for the previous instruction. Different architechtures can use different amounts of steps. But now, we can theoretically execute as many instructions as there are steps. The only drawback is that there cannot be any dependancies between machine instructions that are inside the pipeline. If an instruction needs the result from an earlier operation, then it will have to wait until that computation is complete. This is called a stall.
See this page for an alternate explanation on pipelining: http://cse.stanford.edu/class/sophomore-college/projects-00/risc/pipelining/index.html
So sequential programs are in fact executed in a somewhat parallel fashion. And if you look closely at how machine code works by reading in values, operating on them and outputting a result, this is awfully similar to a Flow Based Component, is it not? Yet, in FBP we have elementary components that cannot be broken down into smaller components. My question is: why not? I'm not arguing that we shouldn't program elementary components. What I'm wondering is why could it not be broken down any further? If it can't be broken down, then this component would necessarilly be written in a sequential fashion and share the same problems as regular sequential code. Maybe not on a larger scale, but it would have the same problems on a component scale.
Another thing is that pipelining doesn't require synchronisation between stages. The only synchronisation needed is to be notified when all stages have completed their work, so that each section can move forward to the next stage. In FBP, since each component is executed asynchronously, the stream of data or the connection between the components must be synchronised. You cannot read and write to the same stream without risk of corruption.
So there are two issues at hand here. One is the sequential nature of components. There is a gap between the parallel nature of sequential machine code and the parallel nature of components. The number of stages in the processor's pipeline is the only amount of instructions that can be run in parallel. The remainder of instructions in the component will have to wait their turn. I believe this to be a discrepency that need not be there. The second issue is the synchronisation problem. The amount of synchronisation for streams can also be removed or reduced.
I believe that any large scale system works best when it is self symmetric. Look at the Universe. It has super symmetry everywhere you look. If this paradigm works for the Universe, then is this not the holy grail of paradigms? I realise that I'm biased towards hardware, but no one can deny that this is what our software runs on. It seems to me that creating a self symmetric system based on this hardware would be the way to go. I posit that FBP components betray that notion somewhat in its current incarnation. There is a fundamental gap between parallel instructions and parallel components. This gap creates a non-symmetric system. I believe that FBP can be improved drastically. My point is that there is an opportunity here to make FBP superior to anything out there not by a mile, but by lightyears. Even more than it is now.
Let's consider three components chained together:
IN -> A -> B -> C -> OUT
We cannot execute all of these in parallel without synchronisation. The buffers between each components cannot be written and read at the same time without risk of corruption. Here is perhaps a better diagram showing that I/O (input and output streams) needs to be sync'd between components.
For a true pipeline, it should be possible to execute all three components at once without any sychronisation at all. We could split the intput and output streams into two separate and independant sections. In this fashion, you can safely access the streams without synchronisation. When all four processes are done, what was put in the output end will now show up on the input end of the next component. And now you can repeat the process. Figuring out when all four processes are done is not complicated. Even if one value is synchronised for this purpose, it is only done once per process and completely removes the need to synchronise the input and output streams. Here is a diagram showing both cycles of this pipeline technique.
But we can do better. In the above scheme, we need to update the streams every time after all components are done. This is extra processing that we don't need to do. If we only execute every second process in parallel, then we no longer need the swapping. Since no two components can ever access the stream at the same time, we just alternate the processing cycle without any need for stream synchronisation. First A and C would excute. Once they are done, B would execute. This cycle would go back and forth until there's nothing to process.
In a perfect world, there should never be asynchronous components. The only thing that can cause asynchronous events is hardware. It's a rather unfortunate turn of history that there are so many blocking OS calls. Files are the worst. But there is never a need to block, especially when using FBP. Asynchronous events are handled by interrupts. The interrupt will restart a "source" component once it has inserted its data in a queue. A normal component needs to disable interrupts to grab the items produced by the hardware and then reset the list. It will keep the items in an internal list ready to output its data once its controller activates it. This means that the technique described in the previous section can continue work even with asynchronuous events.
Just like a V8 engine, only every second piston/component would be active at any time. That's why I've called this crankshaft threading. It requires half as many processing units as there are components and eliminates synchronisation of streams. Unfortunately for us, OS's aren't designed to support this. They hide hardware events in a procedural manner. You can have callbacks or asynchronous messages sent, but you still need to work around the way the OS handles things.
So we've seen how we can pipeline components almost exactly like intructions on the processor are pipelined. This is fine, but we're only adapting to current technology. What about the future? What if something else comes out that is much more parallel than this, but perhaps on a much finer scale? Atomic and organic computers will be real one day. In the future, computers will mold themselves into a configuration dictated by our software. Each atom or molecule will be able to execute but the simplest of operations. But because there can be millions of molecules all operating at once, even FBP will waste much computing power if groups of molecules have to be configured as a unit to fit the description of an entire component. Because a component cannot be reduced any further, a whole section of normally parallel molecules will have to be executed sequentially thus wasting excessive amounts of processing power.
This is the gap I was talking about earlier. I think there is a solution to this.
Object Oriented Programming concentrates on encapsulating functionality and data so that we don't have to worry about how something is done just as long as it does get done. This is flawed logic as I'm sure I don't have to explain to anyone supporting FBP. But there's more to it than that. This isn't about what's good or bad. This is about how it can be made better. Code and Data are different. Trying to hide one and expose the other seems near sighted to me. We should embrace this duality and use the qualities of both to their fullest potential. That's why I believe that hiding data only serves to diminish what we can do with computers. Conversely, FBP personifies this ideal of duality.
If we use these properties not only for components, but for all aspects of programming, then every single line of code can be made parallel to some extent. That is the key. Inside components, there can be variables, simple statements that can be compiled directly to machine code and function calls. I believe the reason there has been no unification of paradigms is because most programmers label things as in the previous sentence. This is wrong. There is only data and code. Nothing else no matter the label. If we treat functions as taking input with its parameters and we allow multiple return values, then would this not be similar to a regular component? And do not regular statements use variables and produce a result in a similar fashion? So you would have symmetric entities in order of magnitude of compound-components, components, functions, statements, instructions. They all work the same way. This is the code aspect of programming. What links them all together is data. In a way, you can view things from both points of reference. Perhaps code is what binds data together.
Compilers are now great at separating distinct control flows in our functions. This allows for much optimization. If we model our functions and statements exactly like components behave, then future (and possibly current) compilers can make every single piece of code in your software be parallel no matter how big or small. The only requirement is that functions no longer have scoping. Or at least restrict block scoping to within itself.
If you need to bind a function to some variable so that you don't have to pass it the same parameters over and over again, you can use currying. It perfectly fits the idea of FBP. So you can simulate OOP if you want to, but from within a FBP paradigm.
Now that we have a self-symmetric, self synchronising system, would this not be enough to get excited over? Perhaps, but that's not enough for me. Why not go all the way? What's the holy grail of all holy grails? Portability. Right now, there's some idea that a VM or bytecode will solve portability problems. I don't know if it's just me, but I don't understand why anyone would believe this. Bytecode is yet another instruction set. We already have tons of those. That's what created the problem in the first place. Creating yet another one is only going to aggravate the situation. This seems so obvious to me that I'm baffled when I hear comments in support of it.
Something else that bothers me is that if you want to use two devices, but you want to use the exact same commands on both, then you're going to have to go with the lowest common denominator and give up any enhanced power or functionality that one of the devices is capable of. This is often found on PC's when we compile software to native code. Do we compile for 386, 486 or Pentiums? There are also huge differences between PII, PIII and PIV. Each one introduced a new set of registers and instruction set. If you want your software to work on all processors, you can only use 386 instructions. The very same thing happens with bytecode. You can't use specialised features found on certain computers or OS's. Sure, you can work around it, but it's a real pain when you really need those features.
Instead of portability by lowest common denominator, why not portability by specialisation? In FBP, if we have components, then it shouldn't be too hard to have components that at a high level are the same across platforms, but at a lower level use specific components to that platform. So a video codec on one platform would be significantly different than on another. This is to achieve the most speed available. You should also be able to use alternatives depending on what platform you're on. If one computer has a feature that is well known to its users, but not found on any other platform, does it not make sense to include it anyhow if this is a large percentage of your user base? The other platforms would do without or have an alternative. This is still portability. It's just not what's being pushed at the moment. Many of these specific components would be found within subcomponents anyhow, so you wouldn't even know about it. When your software is run on that computer, the appropriate custom components would be loaded up.
An example of this could be the system tray. On linux with KDE, a special KDE system tray component would be loaded up. On Windows, a different component would be used. File Systems are also different. Dialogs. We all know about the GUI crisis in Java. Making all systems look the same is not what the users of each system wants. The application looks alien to them.
And again, if this idea can be used on components, then it can be used all the way down to assembly code because it's all the same thing. So if you can load different instructions on different systems, that's your portability right there. Each system can take advantage of vector processing, floating point units or simulate what it needs if certain functionality isn't there. The smallest turing complete instruction set has SIX instructions. So you can emulate anything. Higher level functions could be translated directly into instructions if they are available on that platform. That's using each system to its fullest. For example, adding two arrays together is often done in a loop. This is ridiculous. We need to think in the form of components and put this in a function that does the array addition for us. On vector computers, this function can be ditched by the compiler and use vector instructions instead.
That leads me to my next point. And this one is obvious. Paul Morrison has mentioned this many times. Reusing components should be a priority. So I think common vector instructions/functionality should also be available as components. Adding, subtracting, multiplying arrays should be in a component. Notice that arrays can be thought of streams. So FBP is inherent here too.
Now think about all of these features. A portable language that takes advantage of each system to its fullest and is automatically scalable to parallel or distributed architechtures of any kind. FBP is great. I think it can be awesome.
I came to these conclusions and ideas mostly on my own. I'm so glad Paul contacted me and that this wiki exists. I hope sharing my ideas is appreciated as I believe them to be in synch with FBP. I hope that what I've mentioned here contains at least SOME things that are new. Finding out that Paul has thought of all this years ago took some of the satisfaction out of figuring this out. But now (after a day haha) I think there could be real benefits of joining with others on this.
Currently, I'm working on a programming environment for FBP. It's not a language exactly. It has an intermediate language which I've finished the parser. It supports qualified types (ie. integers that can be bounded or qualified such distance vs. speed). I'm now trying to sort my ideas out on how to pass the data back and forth. I think I'm very close to being able to have a simple compiler. The HLL is not very fancy at this point. The toughest part will be writing the component layout design part. Basically, every part of the compiler has to be written in some other language and lots of it will have to be hard coded. After it works, I need to port it so that it compiles under itself. So it's a little weird, but I've done this kind of thing before.
I'm hoping that I can get this going soon. At this time, I don't want to bore anyone with details that may change. So I'll end it here. Let me know your thoughts.
I've been reading some more. There's lot of info in Paul's book. He seems to be promoting asynchronous components. I used to believe that sequential code could not be made parallel, but I no longer share that view. In fact, FBP passes data from one component to the next in an apparent sequential manner. Sure, each component is executing in parallel, but I see no reason why this can't be done with sequential code as long as scoping is significantly reduced or eliminated. My opinion is that if sequential code uses the same retristions as components, then it would become self-similar. Thus, you can make sequential code parallel in the same manner as components. And although the techniques I mention above can't be used all the time, there are plenty of places where it can be used.
The program counter is also something that is frowned upon from what I've read. But my technique above takes advantage of this by using it for removing a lot of synchronisation. Is this wrong? Am I taking a step backwards? I think it would work rather well. I'm just mentioning this because I think my ideas are contrary to Paul's view of FBP. I'd like to hear some views on this and maybe come to a consensus. BTW, I want to remove any explicit notion of threading. This should be handled by the compiler. You can of course add modules to the compiler to tell it how to handle the threading, but it shouldn't be part of the language. I want to be able to write software with not a single place that mentions threading, yet would scale well automatically on parallel machines or distributed systems. Being explicit comes AFTER the software is written. So you can say, run this part here cuz this is where the monitor is attached and this other part there cuz that's where the database is. Stuff like that. But you don't change the software itself.
I don't mean to belittle the opinions that sequential code can't be made parallel or that components should be asynchronous. I'm just curious about it. I find this topic fascinating. Hopefully, there are others out there. Help me out, eh?
BTW, I agree that procedural code on its own cannot be made parallel. I should clarify that I'm talking about procedural code inside components. I think a functional approach without side-effects would be better.
Here is a rephrasing of some of these ideas by JohnCowan - many thanks, John!
Consider an arbitrary network of Components, and let's color them such that no two directly interacting components have the same color. Many straightforward networks will require only two colors. This is the case that Vorlath describes.
Take all components of one color and let them run concurrently; all components of other colors are blocked. Now each component can read from its input buffers without locking them, because the components that write to those buffers are blocked; likewise, each running component can write to its output buffers without locking them, because the components that read from those buffers are blocked. This eliminates all overhead from buffer locking. Indeed, if the components don't involve external I/O, they can be run sequentially in an arbitrary order rather than concurrently.
When every non-blocked component has either emptied one of its input buffers or filled one of its output buffers, the scheduler stops, blocks all components of the current color, and moves on to the next color. The up-to-four colors form a cycle.
Just thought I'd also mention something about qualified types. Or smart types. I'm going to use name/value pairs to define each primitive type. You can use these for structures as well. The point being that these properties are "outside" of the actual value in a separate layer. You can also have references to other objects, but the pointer is stored in a property. So you can't access it directly. The actual value that you can change is the index. So you can change this value all you want, and it won't matter. Pointer arithmetic without the drawbacks. It will be bounds checked (unless you have that turned off at your own risk). If you want to change the entity that the reference points to, you have to explicitely set the property. So far I have 5 primitive types. Integral, Floating point, Fixed Point, String, and Reference. You can also have structures.
There would be two layers. One for the actual data used. And another for the properties. The compiler will discard the properties at compile time if they are not needed.
Function arguments are always const. Multiple return values allowed. Comparison between primitive types is done with an appropriate comparison function in the same manner as addition or any other math operator is implemented. Comparison between structures is done on a one by one basis with each item unless the value item is overridden. This allows each structure to produce another structure or primitive type that doesn't contain references. The reason for this is that reference matching will recurse and sometimes we want to avoid that. You can still do explicit reference matching though. As you would compare pointers. Not sure yet if there's a need for this with all the restrictions I'm putting in.
Note that comparing a float that has a subtype property of "meters" will not be equal to an float of "feet" if they have the same value. One will be converted before the comparison is done. And yes, I know you shoudn't check for equal on floats. It's just an example. You can do cool things like have different scales of magnitude too. kilometers and miles. There's all sorts of measurements that would benefit from this. This would help with entities that internally are similar, but represent completely different things. For example, a complex number and a pair of (x,y) coordinate values are identical in representation, but used for very different things. The compiler can now catch the error right away. In the past, we had to rely that the names of these structures were different. But some languages don't have that restriction. Some languages allow the same operation on different structures just as long as they are structurally identical. With properties (or attributes), there can be a cleaner way to deal with these issues.
Could you imagine:
feet dist1=10; meters dist2=20; meters total = dist1 + dist2;
And it would give the correct answer in meters.
That's exactly what I want! --pm
JohnCowan has also made some comments on this section:
As for units, it's a very difficult problem to solve in the general case. To take just *one* of the difficulties, torque and energy are measured in the same units (newton-meters = joules = kg m2 s^-2), but they cannot be added or subtracted meaningfully. It's easy to think about typing variables in Feet and Meters as Vorlath mentions, but not so easy to get everything right. Fortress (Sun's new language, makes a stab at it, though: see http://research.sun.com/projects/plrg/ for details.