Interfacing QDR-II+ Synchronous SRAM with high-speed FPGAs, part 1

Technology News |
By eeNews Europe

Quad data rate synchronous static random access memory (SRAM) is an integral part of next-generation networking equipment operating at higher throughput rates. QDR SRAM offers low latency compared to dynamic random access memory (DRAM). The random transaction rates of QDR SRAM are higher than for DRAM, as well.

QDR SRAM modules are suited for high bandwidth applications and used for look up tables, buffering packets, linked lists, etc. SRAMs are also a popular choice for Level 2 (L2) cache for FPGA-based systems. QDR SRAMs are typically interfaced to application-specific networking processors or high-speed FPGAs. Getting the best performance from both processor and memory requires properly interfacing the two. This article takes a closer look at the challenges and pitfalls, and the techniques to optimize the system.

The basics of QDR-II+ SRAM
The latest offering of QDR SRAMs operate up to 633 MHz and have improved data valid window to enable host processors to capture data easily at high speeds. QDR uses two different input/output (I/O) ports: a read port used to read from the memory and a write port used to write into the memory. There are independent clock domains for the read and write ports. Data is written and read on both the rising and falling edge of a clock (i.e., double data rate). Four data items are transmitted per clock cycle,  and hence are called quad data rate memories.

Let’s review the hardware details of interfacing the QDR-II+ SRAM with an FPGA. QDR-II+ SRAMs are available in densities from 18 Mb to 144 Mb. They are internally organized as having two or four blocks. These are available as burst-of-two or burst-of-four devices; the names indicate the minimum number of data words that can be written to or read from the memory in a single transaction.

Consider a QDR-II+ SRAM with 18 Mb density and having 18 data lines. This means it is organized as 1 Mb ×18. For a burst-of-two device, 36 bits of data can be written and read in a single transaction (i.e., at the same time). For a burst-of-four device, 64 bits of device can be read and written in a single transaction. Internally for a burst-of-two device, there are two blocks of memory which are 512 K ×18 each and for a burst-of-four device, internally there are four blocks of memory which are 256 K ×18 each.

The number of address lines for the burst-of-two devices is 19 and the number of address lines for the burst-of-four device is 18 (see figure 1). Both devices have 18 data lines for writing into the device and 18 separate data lines for reading from the device. The address is indicated by “A,” the write port is indicated by “D,” and the read port is indicated by “Q.”

Click image to enlarge.

Figure 1a: Block diagram of 18 Mb (1 Mb ×18) QDR-II+ burst-of-two SRAM

Click image to enlarge.

Figure 1b: Block diagram of 18 Mb (1 Mb ×18) QDR-II+ burst-of-four SRAM

Power requirements
The main power requirements for the SRAM are Vdd and Vddq. Vdd is the core power of the system. This is used to power up the core of the memory and is used to keep the contents of memory intact. Vddq is the I/O power and is responsible for input/output transactions. The voltage levels on the output lines are a function of the I/O power. Typically for high-speed systems, the core voltage and I/O voltage are different.

Over the last few years, there has been a drastic reduction in operating voltage to save power. Having different core and I/O voltages ensures that the high switching noise from the I/O will not affect the core voltage. Proper bypassing and decoupling techniques have to be used to ensure proper power integrity of the system. This is very important for reliable operation, especially when the memory is located far away from the power supply on the board and the same power supply is used to power multiple chips in the design. A decoupling capacitor prevents voltage swings on power and ground lines, gives low impedance path from power to ground plane, and provides a return path between power and ground planes.

Both Vdd and Vddq pins must have multiple capacitors. Small capacitors with low series inductance along with large bulk capacitors must be placed in parallel to provide burst current at high-frequency transitions in the power supply. Decoupling capacitors with low capacitance must be placed as close to the memory as possible, and bulk capacitors must be placed close to the de-coupling capacitance. This will help to minimize the current loops and hence lower the radiation in the system.

Clocking system for QDR-II+ device
The clocking system for QDR-II+ devices can be divided into input clocks and output clocks. Input clocks are referred to as K and K# clocks. These are provided to the memory by the external controller. These are not differential clocks but are single-ended; however, they are out of phase with each other by 180°. The rising edge of the K clock is used to capture synchronous inputs on the device. All accesses are initiated on the rising edge of the K clock. All synchronous data inputs pass to the input registers and to the core of the memory using the K and K# clocks. The K and K# clocks also pass data from the memory core to the output registers.

The other set of clocks are the output clocks, CQ and CQ#. These clocks help to simplify data capture for high-speed systems. The CQ clock is referenced with respect to the K clock, and CQ# is referenced with respect to the K# clock. The CQ and CQ# clocks are generated by the QDR-II+ device and are called echo clocks. The data on the Q pins are source synchronous with respect to the echo clocks. The user has to shift the echo clock to latch the data. The echo clocks can be phase shifted through board trace delay or by using on-chip circuitry in an FPGA. If circuitry within an FPGA is used to capture the echo clock, then the trace length of the CQ and CQ# clocks must be same as that of the Q pins so that the FPGA can phase shift and capture the data accordingly.

A phase-locked loop (PLL) internal to the chip generates the echo clocks. The advantage of using echo clocks to capture the data is that any jitter that present in the K/K# clocks does not propagate to the output clocks. There is a pin on the QDR-II+ device called the DOFF# pin. This pin is used to switch on or switch off the PLL inside the device. During power up when the DOFF# pin is tied high and 20 µs of stable K/K# clock is provided, then the PLL is locked and the echo clock is generated synchronously to the K/K# clock. When the DOFF# is made low, the PLL is switched off and there is sub-optimal performance of the memory. There is a minimum frequency for the K/K# clock for the PLL to lock; this frequency is provided by the QDR-II+ SRAM manufacturer in the datasheet. Using a frequency below will not lock the PLL, which can affect memory performance.

Locking of the PLL is very critical for the proper operation of the memory device. The following conditions have to be satisfied for the PLL to lock to the correct frequency:

  • DOFF# must be high
  • Stable K/K# clock has to be provided for a time specified in datasheet (20 µs).

Switching off the PLL of the device is used by the external controllers to train the memory. When the PLL is off, the maximum speed of operation of the system is limited. The FPGA uses this mode to check for the operation of the devices before actual memory operations begin.

Read and write operation
The control signals for the read and write operation are RPS#, WPS#, QVLD and BWS#. RPS# is sampled on the rising edge of the K clock and a read operation is initiated when RPS# is low. A write operation is initiated on the rising edge of the K clock when WPS# is low. BWS# is sampled on the rising edge of the clock and is used to write selectively to one particular byte of the memory. De-selecting BWS# ignores the corresponding byte of data, so that it is not written to the memory. The trace length for address lines, ‘D’ lines, and the control lines should be closely matched. QVLD is an output signal that indicates valid output data. QVLD signal is edge-aligned to the CQ and CQ# lines.

Programmable output driver impedance
The QDR-II+ SRAM chip has a pin called the ‘ZQ.’ A resistance has to be connected to this pin to ground. The value of resistance connected adjusts the output driver impedance. The value of resistance must be five times the desired output impedance of the driver; to obtain an output impedance of 50 Ω, for example, the value of ZQ should be equal to 250 Ω.

For high-speed digital devices, terminating the driver impedance with the transmission-line impedance is critical to proper signal integrity for the overall system. The impedance is matched by making the source impedance equal to the load impedance. By changing the value of resistance connected to ZQ pin, the Zsource of the output drivers can be changed accordingly (see figure 2).

Zload must be equal to Zline, which in turn must be equal to Zsource. Zsource is controlled by ZQ, in this case. Zline is the characteristic impedance of the trace that can be matched to be equal to Zsource. The PCB from the memory to the memory controller acts like a transmission line. The Zload should also be matched near the FPGA end. The entire system has to be simulated using IBIS models available for the memory to determine the actual termination values.

Figure 2: Configuration of output driver impedance.

Termination of signal lines
Signal integrity is a very important aspect of high-speed digital design and is also very important for interfacing of QDR-II+ SRAM. The drive modes of the inputs and outputs for QDR-II+ SRAMs are high-speed transceiver logic (HSTL). HSTL is a standard interface for digital ICs that calibrates the signal to a reference voltage rather than ground. This enables smaller swings in I/O signal and improves performance by improving signal integrity. HSTL is now becoming a de-facto standard for high-speed digital systems. HSTL requires a reference voltage level that is 50% of maximum voltage. This has to be provided to the Vref pin of the QDR-II+ SRAM.

Figure 3: HSTL I/O levels

It is very important to terminate all high-frequency signals because mismatched impedance causes signals to reflect back and forth along the transmission lines, causing ringing and thus impacting the reliability of the system. To eliminate reflection at the source, the impedance of the source must be matched with the transmission line impedance. To eliminate reflection at the load, the impedance of the load must be matched with the impedance of the trace.

Although multiple terminations schemes are available, the most popular and recommended method to terminate the signal is to perform termination at the load with a pull up resistance to Vddq/2 (see figure 4). This scheme requires a separate voltage source that can sink and source currents to match the receiver outputs transfer rates. The value of the pull up resistance can be adjusted to match the load and ensure signal integrity is proper.

Figure 4: Parallel termination at the load.

All input pins of the QDR-II+ SRAM must be terminated for proper signal integrity. The K/K# clocks must each be terminated separately by having a pull up resistance to Vddq/2. K/K# signals are not fully differential signals and common termination resistance between them is not recommended. The output driver impedance must be matched accordingly to the board impedance for best performance. Certain parts in the QDR-II+ families have on-die termination. These are resistances that are present within the chip and can be programmed according to the termination required. These help in reducing external components and hence conserve board space.

The block diagram below summarizes all the connections required for designing the hardware for QDR-II+ SRAM.

Click image to enlarge.

Figure 5: Hardware connections for the interface between QDR-II+ SRAM and FPGA

This concludes the hardware details of interfacing QDR-II+ SRAM with FPGA. Part two of this article describes the firmware implementation of a QDR controller in popular FPGAs.

About the author
Reshmi Ravindran works as an Applications Engineer at Cypress Semiconductor and supports Cypress’ SRAM products. She holds a Masters in VLSI & Embedded Systems from Model Engineering College, India and a Bachelors in Electronics and Communication from Govt. Rajiv Gandhi Institute of Technology, India.


Linked Articles
eeNews Europe