Tensilica FloatingPoint DSP Family

Specially designed for floating-point processing with exceptional PPA

The Cadence Tensilica FloatingPoint family of high-performing digital signal processors (DSPs) is specially designed for floating-point-centric processing while providing exceptional power, performance, and area (PPA). The Tensilica FloatingPoint DSPs offer a wide range of softwarecompatible scalability from 128-bit vector width to 1024-bit vector width. The scalability combined with configurability of the FloatingPoint DSPs provides SoC designers with the flexibility to design for a broad spectrum of applications, ranging from energy-efficient solutions for battery-operated devices to high-performance computing (HPC).

Overview

Optimized with various performance-enhancing features, Tensilica FloatingPoint DSPs offer outstanding performance per unit area and performance per unit power in floating-point computation for a wide range of applications. Based on the Tensilica Xtensa 32-bit RISC micro-architecture, the family (Figure 1) comprises the Tensilica FloatingPoint KP1 DSP, the Tensilica FloatingPoint KP6 DSP, the Tensilica FloatingPoint KQ7 DSP, and the Tensilica FloatingPoint KQ8 DSP.

Figure 1: The Tensilica FloatingPoint DSP family

Features

  • Single-instruction, multiple-data (SIMD) vector processing
  • Very long instruction word (VLIW) for parallel load/store, FMA, and ALU ops 
  • IEEE 754 vector floating-point options
    • Half precision
    • Single precision
    • Double precision
  • 8-bit/16-bit/32-bit/64-bit ALU 
  • 16b x 16b fixed-point/integer multiply-add 
  • Performance-enhanced fused multiply-add (FMA)
  • Optimized complex arithmetic 
  • Enhanced FFT, convolution, matrix, filter operations 
  • Vector divide, RECIP, RSQRT, and SQRT
  • Predicated vector instructions 
  • Xtensa LX Secure Mode option 
  • Scatter-gather option 
  • iDMA option
  • Faster clock speed compared to fixed-point DSP with floating-point unit.
  • High-performance C/C++ compiler with automatic vectorization of scalar C code and full support for vector data 
  • Rich and optimized libraries support: Eigen library, NatureDSP library, SLAM library, and math library

 

  Tensilica FloatingPoint DSPs 
KP1 KP6 KQ7 KQ8
Xtensa Platform LX LX NX NX
Vector Width (b) 128  512  512  1024
Xtensa LX Secure Mod
  •  
  •  
   
8b/16b/32b/64b ALU Ops
  •  
  •  
  •  
  •  
Half Precision  
  •  
  •  
  •  
Single Precision
  •  
  •  
  •  
  •  
Double Precision
  •  
  •  
  •  
  •  

Benefits

  • High performance and low energy consumption for a wide range of application algorithms, including support for artificial intelligence / machine learning (AI/ML), motor control, battery management, sensor fusion, object tracking, augmented reality / virtual reality (AR/VR), HPC, etc. 
  • Built upon a 32-bit scalar RISC processor for efficient execution of control code 
  • Scalable – Common ISA family from ultra-low energy and very small 128-bit FloatingPoint KP1 DSP to super-high performance 1024-bit FloatingPoint KQ8 DSP provides solution to meet PPA budget and make migration easy
  • Configurable – Selects needed vector packages without unnecessary hardware
  • Extensible – Further enhances the performance as well as differentiation through customizable instruction set using Verilog-like Tensilica Instruction Extension (TIE) language 
  • Virtually unlimited bandwidth from custom FIFO, lookup, and GPIO interfaces 
  • Xtensa LX Secure Mode provides an option to partition the memory space into secure and non-secure regions, thereby allowing the protection of secure memory from untrusted code
  • Fast development through familiar C programming in an Eclipse-based IDE along with optimized software libraries and application examples with source code 
  • Full support for hardware/software co-design 
  • Easy integration into SystemC simulations with functional, cycle-accurate, and hardware pin-level models
Figure 2: Block diagram of Tensilica FloatingPoint KQ7 and KQ8 DSPs

Scalable, Configurable, and Extensible

The highly scalable Tensilica FloatingPoint DSP family offers the SoC designer peace of mind when designing a solution that meets their PPA budget envelope. For energy-sensitive applications, the FloatingPoint KP1 DSP offers an ultra-low energy consumption solution. The FloatingPoint KP6 DSP provides balanced high performance in a small area, yielding excellent performance-per-unit area design. If much higher performance and clock speed are required, the FloatingPoint KQ7 and KQ8 DSPs (Figure 2) present superior vector floating-point operational throughput. All of the Tensilica FloatingPoint DSPs have a common ISA, making software portability and migration easy.

The Tensilica FloatingPoint DSPs offer easy, checkbox-style configurability for pre-verified instruction options. The simple approach in defining a DSP core results in seamless integrations of the feature into the hardware, the compiler, the modeling tools, and the verification scripts. These capabilities provide the solution designer with the ability to build an optimized and custom DSP with minimal to no development schedule impact, compared to what a change in hardware design would typically incur.

The performance of Tensilica FloatingPoint DSPs can be further enhanced and differentiated using the TIE language. Custom operations defined through the Verilog-like TIE language are automatically integrated and recognized by the Xtensa tool chain. The FloatingPoint DSPs also can be extended to support custom interfaces, such as queues and ports for efficient connection to external hardware blocks. These custom interfaces can be defined to match the interfaces of existing third-party IP. Hence, the FloatingPoint DSPs can access hardware offload accelerators in a deterministic single- or multi-cycle operation, greatly reducing power consumption and without impacting the shared system bus.

High-Performance Floating-Point Processing, Energy Efficiency, and Small Area Footprint

Floating-point numbers are commonly used in most technical and engineering computations. Some designers select floating-point format because of the easy handling of dynamic range of the data values, and some choose to simply just run the floating-point code generated by signal processing modeling tools. Running the floating-point code produced by modeling tools helps speed time to market and reduce the scope of the project by not converting the floating-point code to fixed-point version.

In applications that process large or unpredictable data sets, using floating-point numbers in the computation is no longer a convenience, but a requirement. In other applications, the floating-point format simply performs a better job compared to computation in fixed-point numbers. In a motor control application, for example, systems using the floating-point numbers can control the speed and torque more accurately and efficiently, resulting in better performance and greater energy efficiency compared to a system using fixed-point numbers.

The high-performance software tools accompanying Tensilica FloatingPoint DSPs provide superior auto-vectorization capability in vectorizing the scalar code to effectively utilize the vector floating-point units. The FloatingPoint DSPs also offer a vector data type and N-way programming model to make scaling between different SIMD widths easy. With the support of the optimized Eigen library, NatureDSP library, SLAM (Simultaneous Localization and Mapping) library, and math library, the FloatingPoint DSPs provide an easy programming environment, making porting and migrating floating-point software much easier.

The Tensilica FloatingPoint DSP family was specially designed to provide cost-effective and energy-efficient DSP solutions in high-performance floating-point-centric computation. Whether you are looking for an ultra-low energy and small area-cost floating-point DSP solution, or you need a super-high-performance floating-point compute engine for your complex mathematical models, you have the flexibility to find a suitable DSP solution from the Tensilica FloatingPoint DSP family.

Toolchain

Tensilica FloatingPoint DSPs are delivered with a complete set of software tools. The toolset includes a high-performance C/C++ compiler with automatic vectorization and instruction bundling to support the VLIW pipeline in the DSP. This comprehensive toolset also includes the linker, assembler, debugger, profiler, and graphical visualization.

A comprehensive instruction set simulator (ISS) allows you to quickly simulate and evaluate performance. When working with large systems or lengthy test vectors, the fast, functional Tensilica TurboXim simulator option achieves speeds that are 40X to 80X faster than the ISS for efficient software development and functional verification.

Tensilica Xtensa SystemC (XTSC) and C-based Xtensa Modeling Protocol (XTMP) system modeling are available for full-chip simulations. Pin-level XTSC offers co-simulation or SystemC and RTL-level offload accelerator blocks for fast, cycle-accurate simulations.

The Tensilica FloatingPoint DSPs support all major back-end EDA flows, and represent the ultimate in customizable DSPs from Cadence, the leader in scalable, configurable, and extensible solutions for advanced floating-point signal processing solutions. This proven development environment for both hardware and software reduces time to market and risk, as well as providing maximum flexibility in designing a broad range of applications using floating-point formats. 

For more information, visit IP.