MENU

ISS and architectural exploration

ISS and architectural exploration

Technology News |
By eeNews Europe



Every time the conversation on performance analysis and architecture exploration crops up, the questions turns to ISS or Instruction Set Simulator.  ‘Do you have the ISS for XYZ processor?‘ This leads to a discussion on what is an ISS suitable for.  Many EDA companies have developed ISSs, with the false promise of solving everything from software debugging and verifying the hardware, to auto-generating a board with all the peripherals pre-loaded. This gains an impression that the ISS is the solution for all your system development needs.

 

In reality, architectural exploration is an innovative choice to obtain results faster with quality results. An Instruction Set Simulator provides the user with the ability to load the Operating System and execute the compiled code.  This is a good solution for early software debugging.  It is not a good solution when you are experimenting or trying out new architectures such as a new bus topology, different memory hierarchy, or processor clock speed sizing. Moreover the OS and the executable are tied to one processor family.  If you want to evaluate another processor family, or a processor with a different set of peripherals, you need to get a new ISS and recompile the entire code.  Moreover, there is a significant lag between the processor release and the ISS availability.  An alternate to an ISS is imperative code execution for architecture exploration.

Code execution presents a very specific sequence of instructions executing in a mostly repetitive fashion.  In Table 1, one will see the instructions that execute for an inverse Fast Fourier transform of OFDM communication software. Notice how it is made up of a series of floating point add and a few branch instructions.  If one is evaluating the architecture of a system, the first 20 instructions of this sequence are typically sufficient.  The code has been written by one software engineer based on one standard.  If the second part of this code is written to another standard with an entirely different structure, the instruction sequence will be slightly different. 

 

Table 1: Instruction sequence for an inverse Fast Fourier transform

ISS and executable software have been demonstrated to be ineffective in evaluating the metrics of an optimal architecture.  1000 lines of code will simply repeat a small set of instructions, thus providing very little variability.  Table 1 shows the instruction sequence for a 1000 lines of code.  This is especially true for DSP and data flow code.  Control logic has little more diversity but still the sequence is not different. 

Emulating software operations

A number of good alternatives exist to emulate software operation for architecture exploration.  System modeling experience shows that three types of software modeling work quite well.  At a statistical-level, a delay value for each function is sufficient to trigger the traffic on the bus and the memory devices.  At the hardware-level, an application-specific instruction allocation called instruction-mix table provides an extremely accurate representation of a software task.  The last method is to annotate performance-intensive portions of the code and generate instruction trace during execution.  This last technique is good to test the architecture behavior for a benchmark or set of benchmarks.  This is also good to evaluate how a piece of code will behave in a multi-core environment.

The first approach requires a table with the name of the task and the associated delay.  During execution, the processor model does a table lookup and based on the task (A_Task_Name in Table 2) from the RTOS delays the processor based on the number and type of instructions in the task.

 

Table 2: Instruction mix table for a software task

The application-specific instruction allocation technique is the most versatile and can be used for software testing, hardware verification and architecture optimization.  As shown in Table 2, each software task or thread has a number of instructions and percentage of different types of instructions. In the case of My_Task_1, we have 10% of integer, 48% floating point, 10% logical, 7% load-store, and 25% brand instructions.  This table is fed into a software generator block that generates the instruction sequence based on an intelligent algorithm.  This sequence is used for the hardware testing, thus providing a more realistic test of the platform architecture.

Table 3 shows the output for My_Task_1.  To get an accurate distribution of the instruction type within a code structure, use a good decompiler such as Hey-Ray, Intel Vtunes or boomerang.  The number of tasks or threads will differ based on the application.  Getting this amount of flexible instruction sequence to simulate is hard to achieve using an ISS but fairly easy using a good software generator. 

Moreover, you can run the tasks in order, random order or based on the input request.  This mechanism can provide a lot more variety in terms of cache access, hit-miss ratio, bus activity and pipelines flushes.  One can modify the task instruction mix and study the impact on your architecture by simply modifying the percentage table.  This is quick to do and is not locked to a specific code implementation.  Moreover the variety allows for a much larger level of architecture testing. If you look at the generated out for My_Task_1, you can see diversity in the instruction sequence, allowing for a much larger level of testing.

 

Table 3: Instruction sequence output associated with the first line of the instruction mix table

Simulation Model

To view and simulate a model that uses this application-specific instruction mix table, go to https://www.mirabilisdesign.com/new/software/demo/Partitioning/SoC/Power_Perf.htm.  Accept all security warning and the model will load up in the Web Page as an Applet.  You can click on the GO to run the simulation.  Similarly you can change a parameter in the model view and click on GO.  You will see the changes in the reports.

A recent TechTalk at https://youtu.be/_csv53LlXp8 by Robert Juliano Ph.D., Sr. Director of Applications, Mirabilis Design covers a similar topic.

Conclusion

The instruction-mix table method of software emulation offers the most advantages for architecture exploration.  Using this approach, the designer can view the depth of the pipeline, identify the cause of a stall, power management algorithm impact, memory hierarchy operation, performance slowdown of load/store requests, and cache coherency algorithm quality. The simulation reports provide significant visibility into the architecture operation and allow for great optimization of the system throughput.

A number of other approaches can also be used for architecture exploration.  They are extremely hard to generate.  This includes hand-annotating specific sections of the code; generating a bus trace with a list of instructions, and tapping the Operating System for cache accesses.  These approaches are implementation-specific but can be targeted for a timing-intensive function.  So, the next time you are doing architecture exploration, look at your options for the software emulation to test the architecture. Look beyond the ISS.  Look at the instruction-mix table.

About the author:

Deepak Shankar is the Founder of Mirabilis Design, a systems engineering software solutions provider.  Mr. Shankar has been involved with architecture exploration of embedded systems, semiconductors and real-time software for over 20 years.  While at Mirabilis Design, he has developed new methodologies and solutions to streamline the validation of system specification, make architecture exploration extremely accurate and accelerate the systems engineering process.  Prior to Mirabilis Design, Deepak Shankar has worked at Cadence, Spincircuit and Memcall in technical, marketing and executive management roles.  Mr. Shankar has published over 30 articles in technical journals around the world and has been the lead speaker at various IEEE and other Organizations.  Mr. Shankar has a MS in Electronics from Clemson University, MBA from University of California Berkeley and a BS in Electronics and Communication from Coimbatore Institute of Technology.

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News

Share:

Linked Articles
10s