# cādence<sup>°</sup>

# An Efficient, High-Performance DSP Architecture for W-CDMA Receivers

In today's world of evolving standards and fast-changing requirements, systems increasingly use programmability when developing wireless communication modem systems. This white paper describes an efficient implementation of a high-performance DSP architecture for wideband code division multiple access (W-CDMA) receivers, implemented using the Cadence<sup>®</sup> Tensilica<sup>®</sup> ConnX DSP family, that has been optimized for 3G, W-CDMA systems. This design not only provides size and power benefits, but also increases the design headroom, leaving space available for computing other future algorithms.

#### Contents

| Introduction to W-CDMA             | 2 |
|------------------------------------|---|
| Structure of W-CDMA Data           |   |
| Transmissions                      | 2 |
| HSPA and HSPA+                     | 3 |
| High-Level W-CDMA UE Receiver      |   |
| Block Architecture                 | 4 |
| Implementation Using the Tensilica |   |
| ConnX Processor Family             | 5 |
| Conclusion                         | 7 |
| Additional Information             | 7 |

Programmability is increasingly being used in wireless communication modem systems to more easily provide multi-standard support and greater flexibility in handling post-silicon algorithm changes. This programmability not only enables bug fixes and customer specification changes, but also allows design changes based on the state and condition of the communication channels.

The Cadence Tensilica ConnX DSP family based on the Xtensa® customizable processor is specifically designed for wireless communication modem (PHY layer 1) systems. Cadence has worked closely with customers and software PHY companies that create wireless communication algorithms to drive product development. The Xtensa customizable processor gives modem developers a powerful starting point for system development. You can choose to add instructions to the base instruction set architecture (ISA) from pre-defined options or create your own using the Tensilica Instruction Extension (TIE) language. The processor is optimized with an automated flow that requires no extra processor verification and keeps the development tools updated with every change. This automated process guarantees your new processor is correct-by-construction.

This white paper begins with a comprehensive summary of the algorithms for W-CDMA modem systems, followed by a detailed description of how the W-CDMA algorithms can be implemented using the Tensilica ConnX processor family for a W-CDMA system.

**Note:** This white paper is an abridged version of a much longer application note that is available from your Cadence sales representative. The application note details a use case for easily updating an existing long-term evolution (LTE)/LTE-Advanced modem system to become a multi-standard 3G/W-CDMA system.

See An Efficient, High-Performance DSP Architecture for LTE and LTE-Advanced User Equipment Receivers application note for more information on using the Tensilica ConnX DSP family to implement LTE/LTE-Advanced modem systems.

#### Introduction to W-CDMA

W-CDMA systems are used throughout the world for 3rd Generation (3G) cellular radio communications. The 3G Partnership Project (3GPP), with the participation of a number of telecommunication industry representatives, developed the W-CDMA specification into an international standard. The standard is now included as a component of the Universal Mobile Telecommunications System (UMTS), 3GPP's term for the 3G radio technologies developed within the organization.

W-CDMA is based on spread-spectrum modulation techniques, and supports both frequency division duplex (FDD) and time division duplex (TDD) modes. In spread spectrum communications, the transmitted signal is modulated so that it is "spread" over a channel bandwidth many times wider than would be required to deliver the maximum data rate in a dedicated frequency band. The frequency bands specified for W-CDMA are 5MHz wide, and in the original 1999 3GPP specification data rates up to a maximum of 2Mbps were enabled. One of the primary benefits of spread spectrum modulation is that by encoding data in a controlled manner, other user transmissions are seen as noise. This allows multiple users to share the same spectrum, resulting in greater overall system capacity and simplified RF planning.

By employing code division multiple access (CDMA), the channel bandwidth is shared by multiple simultaneous transmitters and receivers. Individual user equipment (UE) devices detect only those signals with a unique assigned code while rejecting all others as noise. Using direct-sequence spread spectrum (DSSS) techniques, each data bit is multiplied by a unique user's code for transmission at a constant "chip rate." The resulting chip rates (3.84Mchip/s in W-CDMA) are thus much higher than the bit rate, which varies depending on the service being provided. The ratio of the two is expressed as the spreading factor (SF) ratio. High data rate transmissions therefore have low SF, and vice versa. One of the critical DSP functions for a W-CDMA receiver is to detect and apply the same spreading code sequence as used by the transmitter to accurately de-spread the signal and extract the transmitted information.

Signals in W-CDMA communications are modulated with two different code types. The orthogonal variable spreading factor (OVSF) or channelization codes allow channels from a single transmitter to be separated. The OVSF code length equals the spreading factor. Orthogonality is required to enable communications with multiple users while minimizing interference. Radio network controllers (RNCs) coordinate transmissions from multiple cells within the network by managing the downlink orthogonal codes used by each base station and UE. After the OVSF codes are applied, the transmitter multiplies the data stream by a pseudo-random number (PN) scrambling code that establishes a unique base station to UE pairing.

#### Structure of W-CDMA Data Transmissions

W-CDMA transmissions consist of three categories, or layers, of channels: logical, transport, and physical. These channels enable delivery of system status, control messages, and the main user data. Logical channels designate the type of data that is transmitted—control data channels or user communications data channels.

The transport layer provides the interface between the media access control (MAC) and the physical layer. The transport layer describes detailed characteristics of the data transmissions and how they are to be processed by the physical layer. Transport channels can be dedicated to an individual user for functions such as soft handover, or designated as common channels for any user.

W-CDMA data is transmitted in 10ms radio frames, which contain 38,400 chips. Each frame is further divided into fifteen 0.667ms slots of 2,560 chips, which are allocated to user and control data. User data is transmitted in a dedicated physical data channel (DPDCH) for both downlink (DL) and uplink (UL). A dedicated physical control channel (DPCCH) is also associated with both the UL and DL DPDCH.

For uplink, the DPDCH and DPCCH are transmitted on two separate physical channels for each slot, which are I/Q code-multiplexed using dual-channel quadrature phase shift keying (QPSK) modulation. The DPDCH data rate can vary frame to frame, and can have an SF from 4 to 256. The uplink DPCCH has an SF of 256, and contains pilot bits to enable channel estimation followed by transmit power control (TPC) bits, feedback information (FBI) bits, and the transport format combination indicator (TFCI) field. The TFCI provides information to the receiver on bit rate, channel decoding, and interleaving parameters for every DPDCH frame. For downlink, the DPDCH and DPCCH are time-multiplexed in each slot on a combined I/Q branch in one data stream with QPSK modulation. The downlink DPCCH contains pilot bits, TPC, and TFCI fields, but no FBI.

Base stations broadcast system and cell-specific information in the downlink broadcast control (BCH). The BCH transport channel is associated with the primary common control physical channel (CCPCH). BCH is transmitted continuously over an entire cell to all users with a low fixed bit rate. UE receivers must decode the BCH to register a connection with a cell.

Precise timing synchronization between transmitter and receiver is required to correctly link UE with base stations, and to properly demodulate the data stream in W-CDMA communications. A common pilot channel (CPICH) is transmitted in the downlink from every base station to enable UE to estimate timing for signal demodulation, and to select the best cell with which to establish a communications link. The 3GPP specification describes two types of CPICHs, the primary and secondary CPICH, as fixed rate (30Kbps, SF = 256) physical channels with a pre-defined bit/symbol sequence. One P-CPICH is available per cell and it always uses the same channelization code. The W-CDMA specification allows dual antenna transmit diversity on any downlink channel in the cell, in which case the CPICH is transmitted from both antennas with the same channelization and scrambling code, but a different pre-defined symbol sequence for Antenna 1 and Antenna 2. Each cell may optionally have one or several S-CPICHs, with an arbitrary channelization codes and SF of 256.

The W-CDMA specification provides two separate channels for data stream synchronization that enable cell searching, and have their own spreading codes separate from the OVSF and PN codes:

- Primary synchronization channel (P-SCH)—Transmitted during the first 256 chips of every time slot to enable the UE to synchronize timing with the base station. The P-SCH is the same for all cells.
- Secondary synchronization channel (S-SCH)—One of 16 S-SCH codes is transmitted at the beginning of every time slot in parallel with the P-SCH. The UE establishes frame synchronization by decoding 15 consecutive synchronization codes from one of the 64 unique S-SCH scrambling code groups, each consisting of eight codes.

#### HSPA and HSPA+

The W-CDMA specification, originally released by 3GPP in 1999, has gone through many iterations to support new features as the technology evolved. High-speed downlink packet access (HSDPA), incorporated in 2002, added a new high-speed downlink shared channel (HS-DSCH) transport channel that enables data rates of up to 10Mbps. HSDPA incorporated 16QAM modulation, and added adaptive modulation and coding (AMC) schemes, as well as hybrid automatic repeat request (HARQ).

Subsequent releases added high-speed uplink packet uplink access (HSUPA), upgrades for high-speed packet access evolution (HSPA+), and carrier aggregation that improved resource utilization and spectrum efficiency via joint resource allocation and load balancing across multiple downlink carriers. More additions included multiple input/multiple output (MIMO) antenna capability and upgraded modulation techniques to 16QAM for uplink, and 64QAM downlink (Cat20). HSPA+ was designed for uplink speeds of 11Mbps and downlink speeds of 42Mbps, with a subsequent introduction adding MIMO + 64QAM in conjunction with carrier aggregation (Cat 28).

Table 1 summarizes the categories of UE for W-CDMA/HSPA. Two distinguishing features define each category: the number of carriers, and number of data streams.

| Category | Number of Carriers | Number of Data<br>Streams |
|----------|--------------------|---------------------------|
| 14       | 1                  | 1                         |
| 20       | 1                  | 2                         |
| 24       | 2                  | 1                         |
| 28       | 2                  | 2                         |

Table 1: Receiver configurations per UE category

#### High-Level W-CDMA UE Receiver Block Architecture

Figure 1 shows a top-level diagram of a W-CDMA UE receiver. The analog-to-digital converter (ADC) in the receiver's radio frequency block samples baseband signals from the antenna(s) at a rate of Nosf x Fc, where Fc is the transmission chip rate and Nosf is an oversampling factor.

While Figure 1 describes a two-antenna receiver, more antennas can be used. The ADC samples are filtered and pulse-shaped by a square-root-raised-cosine (SRRC) filter (SRRC1), which minimizes inter-symbol interference (ISI). A de-multiplexer block at the output of the SRRC filter (not shown) generates Nosf sample streams at the 3.84MHz chip rate Fc. After the data stream is de-multiplexed and filtered, a frequency offset compensation (FOC) block de-rotates the received signal to precise baseband frequencies. To remove potential undesired user interference signals, the receiver subtracts estimates developed in the successive interference cancellation (SIC) block from the frequency offset-compensated signal before it is passed to the synchronization and chip processing blocks.

#### Synchronization block

The synchronization block performs a variety of functions, including locating and identifying base stations, discovering the primary scrambling code for a cell, decoding the BCH, establishing frame and slot synchronization, and managing handover between base stations. It includes two main subsections:

- Cell search and handover manager blocks—The cell search block runs whenever the UE is in operation. The handover manager block triggers the synchronization state machine into operation upon initial UE power-up, or when a cell handover is required.
- Synchronization state machine—The state machine processes signals to establish frame and slot synchronization and acquire other cell information, then becomes idle until commanded to run again by the handover manager.

#### Chip processing block

All radio receivers are susceptible to multi-path distortion, which occurs when multiple copies of a signal arrive after traveling different paths due to reflections, and results in signal delays and phase differences. The spreading process in W-CDMA uses code sequences to help the receiver determine the difference between a signal and its echoes. The path searcher in the chip processing block determines the time delay profile of the strongest received multipath signals by incorporating the functions of a RAKE receiver. A RAKE receiver processes the signals by applying variable delay paths in parallel, then performing correlations with the scrambling code. Each of the correlators is referred to as a "finger" representing one of the received signal images.

The chip processing block also performs frequency offset estimation and channel estimation. Separate channel estimation and correction is performed for each of the DPDCH, DPCCH, and high-speed shared channels (HS-SCH).

The chip processing block outputs soft symbol estimates for each of the three channels. The DPCCH decoder block operates on the DPCCH control data. A soft symbol decoder, coupled to a turbo decoder, processes the soft DPDCH data symbols. The output of the turbo decoder block is sent to the SIC block to generate the estimates for interference terms, which are subtracted from the precise baseband signal at the output of the FOC block. This feedback process occurs over multiple iterations.



Figure 1: Top-level W-CDMA receiver block diagram

### Implementation Using the Tensilica ConnX Processor Family

A complete Category 24 (Cat24) W-CDMA UE receiver implementation, which processes two carriers with a single data stream, requires just four Tensilica DSPs operating at a 350MHz clock rate. Figure 2 shows a block architecture for a Cat24 receiver based on two clusters, a front-end and DPCCH processing cluster (A), and the Sync, DPDCH, and H-SCH cluster (B).

The receiver uses two instances of the Tensilica ConnX BBE32EP DSP. The ConnX BBE32EP is a high-performance, very low-power DSP built around a vector pipeline made of 32 multiplier-accumulators (MACs). The 16b x 16b multipliers provide signed and unsigned support with associated adder and multiplexer trees that enable operations such as matrix computation parallel complex multiple operations and signal filter structures.

| ConnX Processor | Туре                   | Description                                                                                                                                                                                                                                                         |
|-----------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| BBE32           | Wireless communication | 32MAC DSP with instruction set offering acceleration for DSP filter routines and OFDM algorithms.                                                                                                                                                                   |
| SSP16           | Wireless communication | Specifically optimized for soft bit processing. The archi-<br>tecture and instruction set are defined to offer very high<br>performance for wireless communication algorithms such<br>as HARQ and soft demapping.                                                   |
| BSP3            | Wireless communication | Accelerator specifically optimized for bit processing in the<br>encode path. The architecture and instruction are defined<br>to offer very high performance for wireless communication<br>algorithms such as bit Interleaving and cyclic redundancy<br>check (CRC). |

Table 2 summarizes the Tensilica ConnX processors used in the design.

Table 2: Tensilica ConnX processors used in implementation

Figure 2 shows an example of the Tensilica ConnX processors used with hardware blocks for an implementation of W-CDMA. Clusters A and B use the ConnX BBE32EP DSP. The despread offload computation accelerator block can be implemented in a custom hardware RTL block or in an Xtensa customizable processor optimized with instructions performing the cell search, path search, and DPDCH and HS-SCH dispreading operations. The FFT block can be implemented as a custom hardware block for 2K FFT computation, or in an Xtensa processor. Similarly, the turbo decoder block can be implemented as a hardware block, or in an Xtensa processor. The DigRF and rate dematcher blocks are hardware blocks.



## W-CDMA UE Cat24 Receiver Reference Architecture

Table 3 describes the functionality of the Tensilica ConnX processors used in this Cat24 implementation.

| ConnX Processor | L1 (Kbytes) | Description                                           |
|-----------------|-------------|-------------------------------------------------------|
| Receiver        |             |                                                       |
| BBE32EP (A)     | 64          | Symbol-rate operations                                |
|                 |             | SRRC filtering                                        |
|                 |             | Frequency offset correction                           |
|                 |             | DPCCH channel estimation, demodulation                |
|                 |             | DPCCH despreading                                     |
| BBE32EP (B)     | 32          | Slot-rate/frame-rate operations                       |
|                 |             | Cell search                                           |
|                 |             | Frame synchronization                                 |
|                 |             | Code generation                                       |
|                 |             | DPDCH and HS-SCH channel estimation, demodulation and |
|                 |             | soft symbol demapping                                 |
| SSP16           | 64          | Soft symbol decoding                                  |
|                 |             | Deinterleaving                                        |
|                 |             | Transport channel demultiplexing                      |
|                 |             | Physical channel/radio frame collection               |
|                 |             | Rate dematching                                       |
|                 |             | HARQ chase combining                                  |

Figure 2: W-CDMA receiver implementation with Tensilica ConnX processors

| ConnX Processor | L1 (Kbytes) | Description                   |
|-----------------|-------------|-------------------------------|
| Transmitter     |             |                               |
| BSP3            | 32          | Transmission data encoding.   |
|                 |             | Physical channel mapping      |
|                 |             | Interleaving                  |
|                 |             | Physical channel segmentation |
|                 |             | Radio frame equalization      |
|                 |             | HARQ functionality            |
|                 |             | Rate matching                 |
|                 |             | Channel encoding              |
|                 |             | Code block segmentation       |
|                 |             | Transport block concatenation |
|                 |             | CRC check                     |

Table 3: Summary of ConnX processors used for Cat24 receiver implementation

#### Conclusion

This white paper provides an overview of the W-CDMA algorithms and provides a well-balanced implementation of an optimized software programmable modem using the ConnX processor family optimized for 3G and W-CDMA algorithms. This implementation uses optimized DSPs for complex domain computation with the ConnX BBE32EP operating in parallel with the ConnX SSP16 (which operates on soft-bit processing) and the ConnX BSP3 (which operates on hard-bit processing). Compared to a single DSP, this solution is a more efficient fit to the pipelined computation between the different domains, as it offers the system developer a higher performance system, at a lower clock frequency, and with more headroom for future algorithms that may be needed. The optimized processors and lower clock frequency enables a smaller and lower energy solution compared to one using a single type of DSP. In addition, the ConnX BSP3 acts as the system controller as it is architected for high efficiency in system control code operation, eliminating the need for a dedicated system controller.

Cadence Tensilica ConnX processors offer a variety of benefits to W-CDMA and multi-standard modem developers.

- Small size, low energy—Computations can be offloaded and balanced across the optimized ConnX processor family for a higher computation per clock cycle for lower system MHz and lower energy
- More headroom—Reducing required system MHz that in turn creates more available MHz available for computation in other future algorithms
- Ease of design—This implementation supports modular partitioning of functional algorithms, fully supported with DSP libraries and an advanced auto-vectorizing compiler that means no writing assembly code

**Note:** This white paper is an abridged version of a much longer application note that is available from your Cadence sales representative.

See An Efficient, High-Performance DSP Architecture for LTE and LTE-Advanced User Equipment Receivers application note for information on using the Tensilica ConnX DSP family to implement LTE/LTE-Advanced modem systems

#### Additional Information

For more information on the unique abilities and features of Cadence Tensilica Xtensa processors, see <u>ip.cadence.com</u>



Cadence Design Systems enables global electronic design innovation and plays an essential role in the creation of today's electronics. Customers use Cadence software, hardware, IP, and expertise to design and verify today's mobile, cloud and connectivity applications. www.cadence.com

© 2015 Cadence Design Systems, Inc. All rights reserved. Cadence, the Cadence logo, Tensilica and Xtensa are registered trademarks of Cadence Design Systems, Inc. All others are properties of their respective holders. 3507 6/15 SC/DM/PDF