Tensilica DSPs Support in Eigen Library

November 2022

Eigen is a high-level C++ library of template headers for linear algebra, matrix and vector operations, geometrical transformations, numerical solvers, and related algorithms. Eigen is open-source software licensed under the Mozilla Public License 2.0 (MPL2). Eigen is implemented using the expression templates metaprogramming technique, meaning it builds expression trees at compile time and generates custom code to evaluate these. Using expression templates and a cost model of floating-point operations, the library performs its own loop unrolling and vectorization. Eigen automatically enables its vectorization if a supported SIMD instruction set and a supported compiler are detected. Cadence has added support for Tensilica DSPs in Eigen. Currently, the release package is available for Tensilica DSP customers. This paper describes features of Eigen Library along with vectorization support added for various Tensilica DSPs.

Introduction

Eigen Library, or simply Eigen, is a high-level open-source C++ library of template headers for linear algebra, matrix and vector operations, geometrical transformations, numerical solvers, and related algorithms. For more details, refer to the Eigen main page website [1].

Eigen supports all matrix sizes, from small fixed-size matrices to arbitrarily large dense matrices. It supports various matrix decompositions and geometrical transformations. The expression templates used in the Eigen library allow intelligently removing temporaries and enabling lazy evaluation when appropriate. Algorithms are carefully selected for accuracy and reliability. Reliability tradeoffs are documented, and safe decompositions are available. Eigen is thoroughly tested through its test suite, the standard BLAS test suite, and parts of the LAPACK test suite. The APIs are expressive while feeling natural to C++ programmers, with the help of expression templates.

Eigen library code is intuitive and looks similar to modern numeric computing languages like Matlab/Python. For example, the addition of two matrices a and b can be simply written as a+b. For more details, refer to the Eigen library quick reference guide [3].

The following code snippets show the typical way of using DSP/embedded C library functions for using custom libraries for matrix operations. Here, a specific function dsplib_mat_add_f() is called for adding two matrices a[MxN] and b[MxN] of float data types. If the data type is changed from float to double, then the function call also must be updated.

In Eigen, the above matrix addition functionality can be implemented as shown below. Here, naturally intuitive operation a+b can be specified. This also means that if the data type is changed, then Eigen automatically uses the appropriate (underlying) function. The user only needs to specify the proper data types.

We have added out-of-the-box (OTB) vectorization support for Cadence Tensilica DSPs in Eigen similar to other industry platforms. Cadence customers can use the Eigen Library package released on XPG.

Cadence provides a broad range of high-performance application-specific Tensilica DSPs, including a comprehensive development toolchain, a broad portfolio of application-specific DSPs for Audio, Vision, AI, Lidar/Radar, Communication, and other applications, as well as unparalleled customer support [6]. The Tensilica DSPs support single-instruction-multiple-data (SIMD) sizes from 64b to 1024b, which helps in many applications.

Tensilica DSP support in the Eigen library can help customers prototype and easily develop optimized ported code. Since Eigen provides unified APIs, it helps to quickly benchmark across different SIMD Tensilica DSPs during the early characterization of the system requirements.

This paper explains the Eigen features supported for Tensilica DSP SIMD instructions. It also provides details about PacketMath, which is used to achieve vectorization. It also presents the cycle performance of Eigen Library on Tensilica DSPs.

Tensilica DSPs Support in Eigen Library

Eigen is a multi-platform library. It supports many popular compilers, such as GCC, MSVC, Intel ICC, and Clang/LLVM. It also supports the industry SIMD vectorization instruction sets and achieves automatic vectorization. Eigen follows a modular implementation approach. It provides a way to add support for new processors via architecture-specific PacketMath. Figure 1 shows this concept. We have used the same concept and added PacketMath to support various Tensilica DSPs with different SIMD widths and precision support.

Figure 1: Tensilica DSPs Support in Eigen

The XPG released workspace supports the latest Eigen 3.4.0 release and can be used with applications in Xtensa Xplorer IDE and Xtensa tools. The Xtensa Xplorer IDE eases the application development and debugging on the Tensilica DSPs. The release also has a Linux package (non-IDE).

The Cadence Eigen release supports many Tensilica DSPs, as Table 1 shows

Family DSP Names
FloatingPoint  KP1, KP6, KQ7, KQ8
ConnX  ConnX 110, ConnX 120, ConnX B10, ConnX B20
Vision Vision P1/P6/Q7/Q8
HiFi  HiFi 1, HiFi 5
Fusion Fusion G3
Table 1: Tensilica DSP Families Supported in Eigen

The Cadence release package supports single and double precision data types for both real and complex data for each DSP, as Table 2 shows.

DSPs Single Precision Double Precision
Real Complex Real Complex
FloatingPoint KP1    ✓         ✓     ✓  
FloatingPoint KP6      ✓         ✓    
FloatingPoint KQ7    ✓         ✓     ✓          ✓
FloatingPoint KQ8    ✓         ✓     ✓          ✓
ConnX 110    ✓         ✓    
ConnX 120    ✓         ✓    
ConnX B10    ✓         ✓     ✓          ✓
ConnX B20    ✓         ✓     ✓          ✓
HiFi 1    ✓         ✓    
HiFi 5    ✓         ✓    
Vision P1    ✓      
Vision P6    ✓      
Vision Q7    ✓       ✓  
Vision Q8    ✓         ✓     ✓          ✓
Fusion G3    ✓         ✓    
Table 2: Tensilica DSP Precision and Data Types Supported in Eigen

Xtensa tools provide DSP-specific macros that allow enabling/disabling a particular DSP at compile time. For additional information, refer to the user’s guide of the Eigen Library For Tensilica DSPs [7].

Automatic Vectorization

Eigen provides architecture-specific base operations (base-ops), also called PacketMath, which are the building blocks of Eigen Library’s algorithmic implementations. PacketMath support in Eigen Library consists of creating architecture-specific header files. These header files contain Fusion G3 architecture-specific optimized implementations for base-ops. Eigen also supports base-ops for math functions such as sine, cosine, and tanh. These provisions add architecture-specific optimized code to provide precision and performance requirements. The math operations for Tensilica DSPs utilize optimized Xtensa C Library (XClib) kernels, which are available with Xtensa tools.

Eigen allows to overload the base-ops with SIMD processing operations, which are scalar by default. Tensilica DSPs are a natural fit for these SIMD operations. Eigen architecture performs vectorization when it detects a particular processor architecture or SIMD instruction set. This information is provided by the user in the PacketMath header. Source Listing 1 shows a code snippet of how the information is arranged.

Source Listing 1: Architecture-Specific Packet Information

Eigen works with packets of SIMD size. For example, ConnX 110 or Vision P1 have SIMD vector of 128 bits or 16 bytes. This means it can hold 4 integers, 4 floats, or 2 doubles. Note that Eigen vectorization works much better if the packets are SIMD size aligned (e.g., 128b aligned for ConnX 110 or Vision P1.) However, Eigen can also vectorize unaligned vectors and matrices, but it does it as a last resort.

The number of data elements to process is not generally a multiple of packet or SIMD size. For example, a vector may have 10 elements for vector addition. If the packet size is 4 then the last 2 elements must be processed one by one, not by (full) packets. In this scenario, a half-packet (which processes half the packet-size elements as a block) is useful and increases the vector efficiency for larger SIMD processors (e.g., ConnX B20 (512b) or Vision Q8 (1024b) Tensilica DSPs). For more details about Eigen vectorization, refer to [4] and [5].

For a SIMD instruction set, architecture-specific custom PacketMath implementations are provided. Source Listing 2 shows the padd() implementation for the Eigen Packet. The padd() implementation is called for Tensilica DSPs

Source Listing 2: Example of PacketMath Operation

Note that the Eigen library vectorization is mainly applicable to dense matrices and vectors and cannot vectorize the sparse matrix/vector operations. Tensilica DSPs do support sparse operations in Eigen but without vectorization.

Performance

Performance data is generated from Eigen’s performance test bench. Eigen provides a modified version of the Benchmark for Templated Libraries (BTL) for performance measurement. The implementation of this is in the Eigen library source code.

For performance measurement, the library is configured with dynamic-size column-major dense matrices. The performance is measured with Xtensa Instruction Set Simulator (ISS) with memory modeling (sim-local LSP).

The performance is specified in absolute cycles, so lower is better. The OTB performance is measured with the original Eigen code without PacketMath; therefore, Eigen does not get information about DSP (e.g., SIMD size in OTB). The PacketMath performance is measured with the Cadence Eigen package with Tensilica DSP support. In PacketMath, the Tensilica DSP-specific Eigen PacketMath header files are enabled. This helps achieve much better performance.

Table 3 specifies the benchmark details, which lists typical matrix processing operations. Table 4 shows the performance for these benchmarks on a 128b DSP, with a matrix size of [256x256] selected. The performance data shows the performance improvement with PacketMath vs. OTB. This improvement is mainly due to the vectorization enabled by PacketMath and the optimized implementation of PacketMath operations provided by Cadence.

Benchmark Operation Description
axpby Vector_A = (Scalar_Alpha x Vector_B) + (Scalar_Beta x Vector_A)
matrix_matrix Matrix_A x Matrix_B
ata Transpose_Matrix_A x Matrix_A
aat Matrix_A x Transpose_Matrix_A
matrix_vector Matrix_A x Vector_B
Table 3: Sample Performance Benchmarks
Benchmark PacketMath OTB OTB/PacketMath
axpby  627  1578  2.5
matrix_matrix  4.42M  15.1M 3.4
ata  2.62M  8.01M  3.1
aat  2.44M  8.31M  3.4
matrix_vector    33,433 58,213 1.7
Table 4: Eigen Performance for Tensilica 128b SIMD DSP. Matrix Size=256x256. (M: million).

Summary

This paper presents the Eigen library overview, and its vectorization support via PacketMath. It also discusses Tensilica DSP support in the Cadence Eigen package release and the relevant performance. Vectorization along with optimized PacketMath helps improve cycle performance on Tensilica DSPs with various SIMD width. The matrix operations performance clearly shows the benefit of PacketMath in Eigen library. Tensilica DSP support in the Eigen library can help customers prototype and develop the code faster and quickly achieve a good level of performance. Since Eigen provides unified APIs and well-tested industrystandard algorithms, the Cadence Eigen package is helpful in rapid development for Tensilica customers.

References

  1. Eigen main page.
  2. A simple quickref for Eigen.
  3. Eigen quick reference guide.
  4. Eigen Vectorization
  5. What happens inside Eigen, on a simple example
  6. Cadence Tensilica home page.
  7. Eigen Library User’s Guide for Tensilica DSP