
Notes on real-time computing architectures
Real-time analog-oriented systems often have embedded microcontrollers (MCUs), if not DSPs or some synergism of the two. Various MCU architectures can be applied to real-time, embedded applications in instrumentation, power conversion, and motion control. This article surveys some design features and tradeoffs of μC architectural types applicable to analog applications.
Memory-Based Computing
The early generations of MCUs were based on architectures with few registers. Their instruction sets made up for this by including addressing modes that eased access to the right memory locations. For instance, the 8-bit 6502, used in the Apple II computer and many Japanese-made VCRs, has one accumulator register, A, and two index registers, X and Y. With Y, it is possible to do table addressing with the indirect, indexed address mode.
For instance, a table-driven step-motor could be advanced by driving its four power switches (for the +/-A and +/-B windings) by incrementing through the table and rolling over to the other end of it. The direction of the motor depends on which direction the program advances through the table. The indirect, indexed addressing mode makes it possible to retrieve in one instruction from memory the switch states for a given step.
With powerful addressing modes, this architecture might be well-suited for real-time control, and in some ways it is. The minimalist design of the 6502 (having 7000 transistors), along with its deterministic instruction pipelining, results in an instruction set in which many of the instructions are only two machine cycles. It in part inspired the design of the increasingly popular, though well-seasoned, ARM (originally Advanced RISC Machine) architecture.
Register-based computing
Reduced-instruction-set computing (RISC) architecture has caught on as a serious challenger to memory-based architectures with their complex-instruction-set computing. The “complex” aspect is the aforementioned addressing modes, which reduce the number of instructions required to access memory but at the expense of more computing cycles required to compute the addresses. The idea emerged in the 1980s that by adding more registers to the CPU, fewer of the von-Neumann-bottleneck memory accesses would be needed, making the machine run faster. This results in simple instruction sets with fast – one or two cycle – instructions. The tradeoff is that many more instructions are needed to accomplish the same task. The ARM-based MCU is a leading example.
Atmel has put in an 8-pin DIP or SMD package a complete, in-circuit-programmable controller for a power converter or dc motor drive. Move up to a 20-pin ATtiny2313, which has been sold in the surface-mount package at low volume for as little as $0.85 US by a major component distributor, and you have the core component of a three-phase motor controller. The AVR parts, like their PIC competitors by Microchip, have an 8-bit (data) microprocessor core surrounded by useful I/O, including serial communications, counter/timers, parallel ports, comparators, ADCs, and PWM generators. The PWM of AVR MCUs is centered; the on-time duration occurs in the middle of the switching cycle.
Interrupts that occur at the end of the switching cycle, and also at the end of the on-time, trigger the code to switch the bridge to the next state while PWM drive is off. This reduces stress on bridge power-switches. The extensive I/O of MCUs takes up more silicon area than the core processor.
Like the PIC, the AVR has a DSP-like architecture – a Harvard architecture, which separates program and data memory. The advantage is that there are now two channels from CPU to memory, and instructions and data can be transferred simultaneously. The CPU-memory single-channel limitation, or “von Neumann bottleneck,” is eased somewhat. The only loss of flexibility with separated code and data spaces is that self-modifying code cannot be used. This is not a significant loss, nor does it affect most kinds of programming.
These processors are also horizontally microprogrammed, with a long instruction word (16 bits), so that multiple program bits can be applied simultaneously in executing the instruction. This results in single clock cycles for most instructions.
Another DSP-like feature is that the AVR and PIC are register-based. They lack the more powerful addressing modes of conventional (CISC) processors, such as the older 6800 or 6502, which provide memory-based indirection and indexing into tables in data memory. In these newer processors, it is even necessary to compute table look-up addresses by keeping the index value in one register and the base address of the table in another. The index register is added to the base address whenever an indexed table address in SRAM is needed. (Or alternatively, the indexed address is kept in a register and a subtraction from the base-address register is used to obtain the index value.) This adds machine cycles to program execution time, though overall, if the registers are used to hold the data under computation, fewer data-memory accesses are required, which minimizes the von Neumann bottleneck. For programmers accustomed to CISC processors, this DSP-like or RISC architecture is at first frustrating. Register-based computing takes some adjustment in coding technique.
The data-intensive instructions of these register-based processors are dual operand, allowing both data and result registers to be named in one instruction. In accumulator-based, single-operand machines, results are limited to appear in a single accumulator register. More instructions (and machine cycles) are required to move the results to memory or another register. By specifying two operands, the result has multiple possible destinations and can be put immediately where it is wanted.
MCUs versus DSPs
The register-based, single-cycle instruction (or RISC) approach to MCU architecture is where DSPs have always been. The major functional differences between the AVR and Analog Devices, Inc. ADSP2100 series of DSPs are that the DSPs have a longer instruction word (24 bits for the ADSP21XX), a hardware multiply (which some MCUs now have), a barrel shifter, and address computation hardware. MCUs are evolving into DSPs, and can be considered low-performance, low-cost DSPs. Meanwhile, DSPs are acquiring more I/O around their cores, leading to a μC-DSP convergence that is nearly complete for mainstream embedded computing.
The classic cost-performance design tradeoff depends mainly on the application volume and required computing “bandwidth”, not time-to-market, for either kind of processor can be programmed in assembler or high-level languages. After a computer architectural core is chosen, over time a large amount of useful code accumulates. This investment of effort will, in the long run, have a great influence upon choice of processor and may even be the determining criterion. For even though code may be written in a high-level language, the software development environment for a different instruction set may need to be different.
Development time (and cost) can be highly impacted by having to start at square one with a strange, new compiler. No wonder the 6800 instruction-set core is still marketed after 30 years from its market introduction. Its second-generation design, the 6502, continues largely through Renesas for high-volume office and consumer electronics applications. It was in 1975 that its designer, Chuck Peddle, was handing out free samples to engineers. One of them was Steve Wozniak.
Zero-operand machines
On the other end of the operand spectrum from dual-operand RISC machines, some lesser-known processors are emerging that are instead zero operand. No data registers are used as such. All data is handled on a hardware stack. (A stack is an ordering of data items accessed on a last in, first out basis, like a stack of trays.) The top register roughly functions as the accumulator does in single-operand machines, and all the data items – as many as needed for the operation – come from the top end of the stack. Data moves up and down, on and off the stack as operations execute. (Or rather, the stack pointer increments and decrements through fast local memory in the CPU.) Global memory is accessed only for more permanent storage of data, such as global variables.
The benefit of stack-based computing has been known for a long time in software engineering, and stack push, pop and put addressing modes are found in the newer designs of MCUs, both CISC and RISC. The superior efficiency of stack-based computing has been demonstrated in computing theory, and popular languages are not uncommonly implemented using data stacks. Hewlett-Packard RPN calculators have a stack-based user interface. The Motorola 68000 has instructions facilitating the use of stack frames, to hold local variables for routines.
The leading high-level stack-based language, Forth, has been around about as long as FORTRAN or LISP and continues to be the choice for real-time MCU computing for some. It is available for almost any instruction set. Those who use it generally find it to be the superior language for embedded applications, though so many Forth programmers in the ’70s were raving about it that it developed somewhat of a “geek” reputation, even among software engineers (whom the rest of the world considers geeks anyway). With the advent of UNIX, the world went crazy for C, and Forth found a niche as the open-boot firmware standard in Sun computers. Otherwise, Forth persists, mainly among hardware-oriented programmers who still appreciate its productivity benefits and unmatched extensibility.
Beside the influence of C, Forth probably has not caught on because most programmers prefer names for local variables instead of keeping track of where they are on a stack. Stack notation eases this problem immensely and avoids having to contrive meaningless names for local variables. Even so, some famous computer applications have been written in Forth and it lurks “under the hood” to a greater extent than is generally realized. It is a versatile tool for writing languages and user interfaces.
Forth is a logical extension of assembler programming in that routines are given names and linked into lists which can be searched by the Forth interpreter and are called words. Forth itself consists of a defined set of words in the list, and programming merely adds new words to the list. There is no artificial boundary between language and program. All words have a run-time routine associated with their name, even data. For example, the run-time action of variables is to return the address of the variable on the top of the stack, which run-time routine a word invokes places it implicitly into an object class. Most good assembly code is organized hierarchically as a sequence of subroutine calls to lower-level routines. The lowest routines are optimized in machine (assembly) code for performance. In typical implementations, Forth has a few machine-dependent words at low level and the rest are compiled using them.
Forth allows more words to be added to its set of words, or dictionary, by the programmer, and makes no distinction, other than in the standard specification, between Forth words and application words. They are all part of the same openly-extensible computing environment. The words programmers write merely extend the language-defined set, including words for compiling. In contrast, languages such as C have a fixed (and limited) set of functions. While libraries of routines can be added, the compiler-time capabilities of C are fixed by language definition. Instead, Forth has extensibility exceeding that of LISP, allowing the programmer to define new compiling words. Newer languages like Java incorporate some Forth-like extensibility features.
Zero-operand, or stack-based, or Forth machines made their first major commercial debut as the Harris RTX2000 series. These processors were relatively expensive and did not succeed in the market, but retreated into radiation-hardened aerospace applications. Since then, Forth’s inventor Chuck Moore has developed a fast stack machine, and others have FPGA-based designs (see Stack Computers for a list of commercially available stack machines).
Greg Bailey of Green Arrays Inc. has developed a chip with 144 interconnected stack machines (see Green Arrays’ site for details.) While stack machines have not caught on widely yet for embedded computing, the underlying theory of stack computing, and the elegance known to their enthusiasts, suggests that they will eventually be rediscovered. The advantages are too elementary to be ignored forever.
Stack-oriented, register-based computing
For those who like powerful addressing modes, the undesirable aspects of register-based computing are reduced by using register-based MCUs as stack machines. Forth implementations on such machines as the AVR MCUs illustrate this. The top of the stack is defined to be one of the registers of the ALU and another retains the temporary next of stack value that continues with the rest of the stack in SRAM. By using a stack for parameter passing, programming is made easier, though RISC does not necessarily reduce the number of computing cycles or memory use over CISC. The leader in CISC computing, Intel, continues to compete with the ongoing interest in RISC. In the interim, the compromise computing scheme will probably win out: a RISC programmed as a near-zero-operand virtual machine.
Electronics designers seeking an edge in embedded computing technology should look into zero-operand computing, whether implemented as MCUs or software. What limits its use is the limited variety of commercially available stack machines lacking a variety of different packages, I/O features, and software development systems. Yet that is always true for emerging technology. Any of the multiple commercially-available RISC machines with stack addressing modes constitutes an optimal interim solution.
Dennis Feucht has his own laboratory, Innovatia, on a jungle hilltop in Belize, where he performs electronics research, technical writing, and helps others with product development. He has written a four-volume book-set on analog circuit design, has completed a book on transistor amplifier design and is working on a book on power electronics.
This article first appeared on EE Times’ Planet Analog website.
Related links and articles:
News articles:
From electronics to maths and money madness
Richard Feynman and homomorphic filtering
The engineering desk-to-bench ratio
