Yes compilers have become better in optimising code, but everything after has stayed the same. The linking phase of C / C++ programs is still largely a brute force operation, including everything the program might need and very often code that never will be executed. This leads to enormously bloated programs, that have to be:
- stored in non-volatile storage, and
- in the RAM of the system that execute them.
A simple “Hello World” might need a few Mbytes and links in 10000’s of functions. While this is less of an issue in desktop type systems, due to them having ample of cheap (D)RAM available and reasonably sized caches, the latter is not true for embedded systems, which represent the ever growing bulk of computer driven systems on the planet.
Needing a lot of (D)RAM does not only cost money, but also energy, because (D)RAM needs to be continuously refreshed and operates often with 100’s of wait states compared with the superfast GHz CPUs. Thus this becomes part of the power wall we are currently hitting. And to follow Moore’s law, the only way forward is more parallel processing cores on the same die, even if that doesn’t increase the access speed to the external (D)RAM. In the end, chips are pin-bound. Performance is cache bound on such chips and therefore code size still matters.
However this is only a side line of the real problem that developers hit today, when trying to exploit the parallelism. First of all the approach used today with threading is a difficult to get right approach, due to the state-space exploding easily beyond what a single developer can keep in his head, and traditional testing cannot cope with this. The situation is worsened by the fact that most thread synchronisation mechanisms are hard to get right. However, there is good news.
The problem has already been solved over 30 years ago, by C. A. R. Hoare who developed the formal Communicating Sequential Processes (CSP) process algebra. CSP has been implemented in software and in hardware systems, and even today there are implementations of it available for many different programming languages, JCSP for Java, PyCSP for Python, C++CSP for C++, and libCSP2 for C, to just name a few.
However, most of these CSP implementations are still designed for single CPU systems, even if there are environments, which spread CSP systems over multiple CPUs, even across the internet. CSP at first sight is also not intuitive to the “normal” developer, who often develops from within a sequential programming paradigm, whereas CSP takes more of a top-down approach.
But 30 years of history means also that these problems have been recognised and people have started to work on these problems, and have improved the situation. Preceded by work on the Virtuoso parallel RTOS (acquired by Wind River Systems in 2001), one novel result of such work is the so called Interacting Entities approach developed by Altreonic and implemented in their network centric OpenComRTOS.
It provides a scalable concurrent way of programming, whether the target is a single chip multicore device or a networked heterogeneous system with 1000’s of nodes. In the Interacting Entities approach there are two types of entities: tasks and hubs. Tasks can be compared with “clean” threads, i.e. they are active entities with their own, user defined functionality but with a private workspace.
Hubs on the other hand are passive entities that implement the interactions between the tasks. Hubs represent synchronisation primitives, such as semaphores and mutexes, but they can do much more, and there is the ability for the users to develop their own hubs, thus allowing very complex synchronisation protocols to be implemented easily. Tasks only interact via hubs, thus there are no direct task to task interactions.
With this strong task decoupling between active and passive entities and each of them having a unique address within the system it is possible to distribute a system over many CPUs and still achieve the same logical behaviour without having to rewrite the code. This mechanism is now also being ported from the original embedded implementation in C (only requiring 5 KiBytes/node) to other language environments like Python, called Python Interacting Entities (PIE).
Other languages to follow are Java, Ruby and Haskel. This demonstrates that a straightforward efficient programming paradigm that works across heterogeneous hardware platforms as well as across heterogeneous programming languages for new multi-processor platforms is not a pipe dream requiring lots of new research. The fundamental work was already done by Dijkstra and further formalised by Hoare and others. Interacting Entities is the pragmatic superset that make things happen.
More information available on www.altreonic.com
Computing has hit ‘power wall’