Datasheet

Tensilica FloatingPoint DSP Family

Specially designed for floating-point processing with exceptional PPA

The Cadence Tensilica FloatingPoint family of high-performing digital signal processors (DSPs) is specially designed for floating-point-centric processing while providing exceptional power, performance, and area (PPA). The Tensilica FloatingPoint DSPs offer a wide range of softwarecompatible scalability from 128-bit vector width to 1024-bit vector width. The scalability combined with configurability of the FloatingPoint DSPs provides SoC designers with the flexibility to design for a broad spectrum of applications, ranging from energy-efficient solutions for battery-operated devices to high-performance computing (HPC).

Overview

Overview
Features
Benefits
Scalable, Configurable, and Extensible
High-Performance Floating-Point Processing, Energy Efficiency, and Small Area Footprint
Toolchain

Overview

Optimized with various performance-enhancing features, Tensilica FloatingPoint DSPs offer outstanding performance per unit area and performance per unit power in floating-point computation for a wide range of applications. Based on the Tensilica Xtensa 32-bit RISC micro-architecture, the family (Figure 1) comprises the Tensilica FloatingPoint KP1 DSP, the Tensilica FloatingPoint KP6 DSP, the Tensilica FloatingPoint KQ7 DSP, and the Tensilica FloatingPoint KQ8 DSP.

Figure 1: The Tensilica FloatingPoint DSP family

Features

Single-instruction, multiple-data (SIMD) vector processing
Very long instruction word (VLIW) for parallel load/store, FMA, and ALU ops
IEEE 754 vector floating-point options
- Half precision
- Single precision
- Double precision
8-bit/16-bit/32-bit/64-bit ALU
16b x 16b fixed-point/integer multiply-add
Performance-enhanced fused multiply-add (FMA)
Optimized complex arithmetic
Enhanced FFT, convolution, matrix, filter operations
Vector divide, RECIP, RSQRT, and SQRT
Predicated vector instructions
Xtensa LX Secure Mode option
Scatter-gather option
iDMA option
Faster clock speed compared to fixed-point DSP with floating-point unit.
High-performance C/C++ compiler with automatic vectorization of scalar C code and full support for vector data
Rich and optimized libraries support: Eigen library, NatureDSP library, SLAM library, and math library

	Tensilica FloatingPoint DSPs
	KP1	KP6	KQ7	KQ8
Xtensa Platform	LX	LX	NX	NX
Vector Width (b)	128	512	512	1024
Xtensa LX Secure Mod
8b/16b/32b/64b ALU Ops
Half Precision
Single Precision
Double Precision

Benefits

High performance and low energy consumption for a wide range of application algorithms, including support for artificial intelligence / machine learning (AI/ML), motor control, battery management, sensor fusion, object tracking, augmented reality / virtual reality (AR/VR), HPC, etc.
Built upon a 32-bit scalar RISC processor for efficient execution of control code
Scalable – Common ISA family from ultra-low energy and very small 128-bit FloatingPoint KP1 DSP to super-high performance 1024-bit FloatingPoint KQ8 DSP provides solution to meet PPA budget and make migration easy
Configurable – Selects needed vector packages without unnecessary hardware
Extensible – Further enhances the performance as well as differentiation through customizable instruction set using Verilog-like Tensilica Instruction Extension (TIE) language
Virtually unlimited bandwidth from custom FIFO, lookup, and GPIO interfaces
Xtensa LX Secure Mode provides an option to partition the memory space into secure and non-secure regions, thereby allowing the protection of secure memory from untrusted code
Fast development through familiar C programming in an Eclipse-based IDE along with optimized software libraries and application examples with source code
Full support for hardware/software co-design
Easy integration into SystemC simulations with functional, cycle-accurate, and hardware pin-level models

Figure 2: Block diagram of Tensilica FloatingPoint KQ7 and KQ8 DSPs

Scalable, Configurable, and Extensible

The highly scalable Tensilica FloatingPoint DSP family offers the SoC designer peace of mind when designing a solution that meets their PPA budget envelope. For energy-sensitive applications, the FloatingPoint KP1 DSP offers an ultra-low energy consumption solution. The FloatingPoint KP6 DSP provides balanced high performance in a small area, yielding excellent performance-per-unit area design. If much higher performance and clock speed are required, the FloatingPoint KQ7 and KQ8 DSPs (Figure 2) present superior vector floating-point operational throughput. All of the Tensilica FloatingPoint DSPs have a common ISA, making software portability and migration easy.

The Tensilica FloatingPoint DSPs offer easy, checkbox-style configurability for pre-verified instruction options. The simple approach in defining a DSP core results in seamless integrations of the feature into the hardware, the compiler, the modeling tools, and the verification scripts. These capabilities provide the solution designer with the ability to build an optimized and custom DSP with minimal to no development schedule impact, compared to what a change in hardware design would typically incur.

The performance of Tensilica FloatingPoint DSPs can be further enhanced and differentiated using the TIE language. Custom operations defined through the Verilog-like TIE language are automatically integrated and recognized by the Xtensa tool chain. The FloatingPoint DSPs also can be extended to support custom interfaces, such as queues and ports for efficient connection to external hardware blocks. These custom interfaces can be defined to match the interfaces of existing third-party IP. Hence, the FloatingPoint DSPs can access hardware offload accelerators in a deterministic single- or multi-cycle operation, greatly reducing power consumption and without impacting the shared system bus.

High-Performance Floating-Point Processing, Energy Efficiency, and Small Area Footprint

Floating-point numbers are commonly used in most technical and engineering computations. Some designers select floating-point format because of the easy handling of dynamic range of the data values, and some choose to simply just run the floating-point code generated by signal processing modeling tools. Running the floating-point code produced by modeling tools helps speed time to market and reduce the scope of the project by not converting the floating-point code to fixed-point version.

In applications that process large or unpredictable data sets, using floating-point numbers in the computation is no longer a convenience, but a requirement. In other applications, the floating-point format simply performs a better job compared to computation in fixed-point numbers. In a motor control application, for example, systems using the floating-point numbers can control the speed and torque more accurately and efficiently, resulting in better performance and greater energy efficiency compared to a system using fixed-point numbers.

The high-performance software tools accompanying Tensilica FloatingPoint DSPs provide superior auto-vectorization capability in vectorizing the scalar code to effectively utilize the vector floating-point units. The FloatingPoint DSPs also offer a vector data type and N-way programming model to make scaling between different SIMD widths easy. With the support of the optimized Eigen library, NatureDSP library, SLAM (Simultaneous Localization and Mapping) library, and math library, the FloatingPoint DSPs provide an easy programming environment, making porting and migrating floating-point software much easier.

The Tensilica FloatingPoint DSP family was specially designed to provide cost-effective and energy-efficient DSP solutions in high-performance floating-point-centric computation. Whether you are looking for an ultra-low energy and small area-cost floating-point DSP solution, or you need a super-high-performance floating-point compute engine for your complex mathematical models, you have the flexibility to find a suitable DSP solution from the Tensilica FloatingPoint DSP family.

Toolchain

Tensilica FloatingPoint DSPs are delivered with a complete set of software tools. The toolset includes a high-performance C/C++ compiler with automatic vectorization and instruction bundling to support the VLIW pipeline in the DSP. This comprehensive toolset also includes the linker, assembler, debugger, profiler, and graphical visualization.

A comprehensive instruction set simulator (ISS) allows you to quickly simulate and evaluate performance. When working with large systems or lengthy test vectors, the fast, functional Tensilica TurboXim simulator option achieves speeds that are 40X to 80X faster than the ISS for efficient software development and functional verification.

Tensilica Xtensa SystemC (XTSC) and C-based Xtensa Modeling Protocol (XTMP) system modeling are available for full-chip simulations. Pin-level XTSC offers co-simulation or SystemC and RTL-level offload accelerator blocks for fast, cycle-accurate simulations.

The Tensilica FloatingPoint DSPs support all major back-end EDA flows, and represent the ultimate in customizable DSPs from Cadence, the leader in scalable, configurable, and extensible solutions for advanced floating-point signal processing solutions. This proven development environment for both hardware and software reduces time to market and risk, as well as providing maximum flexibility in designing a broad range of applications using floating-point formats.

For more information, visit IP.

Products

Products

Cadence.AI

Products

IC Design & Verification

Products

System Design & Analysis

Solutions

Industries

Services

Technologies

Support

Support

Training

Company

Corporate

Culture and Careers

Media Center

Tensilica FloatingPoint DSP Family

Specially designed for floating-point processing with exceptional PPA

Overview

Features

Benefits

Scalable, Configurable, and Extensible

High-Performance Floating-Point Processing, Energy Efficiency, and Small Area Footprint

Toolchain