|
Tensorium
|
MatrixKernel provides specialized SIMD-accelerated matrix multiplication routines for statically-sized square matrices. More...
#include <MatrixKernel.hpp>
Public Types | |
| using | Simd = simd::SimdTraits<K, DefaultISA> |
| using | reg = typename Simd::reg |
| Public Types inherited from tensorium::Matrix< K, true > | |
| using | Simd |
| using | reg |
Public Member Functions | |
| MatrixKernel (const Matrix< K, true > &m) | |
| Construct a MatrixKernel from a column-major matrix. | |
| MatrixKernel (const Matrix< K, false > &m) | |
| Construct a MatrixKernel from a row-major matrix by copying elements. | |
| MatrixKernel (size_t r, size_t c) | |
| Construct an empty column-major matrix kernel of size (r × c). | |
| Matrix< K > | mul_mat2x2 (const MatrixKernel< K > &B) const |
| Multiply two 2×2 matrices using SIMD. | |
| Matrix< K > | mul_mat3x3 (const MatrixKernel< K > &B) const |
| Multiply two 3×3 matrices using SIMD. | |
| Matrix< K > | mul_mat4x4 (const MatrixKernel< K > &B) const |
| Multiply two 4×4 matrices using SIMD. | |
| Matrix< K > | mul_mat8x8 (const MatrixKernel< K > &B) const |
| Multiply two 8×8 matrices using SIMD. | |
| Matrix< K > | mul_mat16x16 (const MatrixKernel< K > &B) const |
| Multiply two 16×16 matrices using SIMD with FMADD accumulation. This function splits each row into two registers (low/high). | |
| Matrix< K > | mul_mat32x32 (const MatrixKernel< K > &B) const |
| Multiply two 32×32 matrices using SIMD. Each row is split into two registers (16 elements each). | |
| Matrix< K > | mul_mat64x64 (const MatrixKernel< K > &B) const |
| Multiply two 64×64 matrices using SIMD. Each row is split into 4 SIMD registers (4×16 elements). Vectorized FMADD chaining is used for performance. | |
| Public Member Functions inherited from tensorium::Matrix< K, true > | |
| Matrix (size_t r, size_t c) | |
| Construct a matrix of size r × c, initialized with zeros. | |
| size_t | index (size_t i, size_t j) const |
| size_t | size () const |
| Return the total number of elements. | |
| K & | operator() (size_t i, size_t j) |
| Element access (mutable). | |
| void | print () const |
| Print the matrix to stdout. | |
| void | swap_rows (size_t i, size_t j) |
| Swap two rows of the matrix. | |
| Vector< T > | operator* (const Vector< T > &v) const |
| Multiply matrix by a vector (naïve fallback). | |
| void | add (const Matrix &m) |
| In-place matrix addition: this += m. | |
| void | sub (const Matrix &m) |
| In-place matrix subtraction: this -= m. | |
| void | scl (K a) |
| In-place scalar multiplication: this *= a. | |
| void | lerp (const Matrix< K > &A, const Matrix< K > &B, K alpha) |
| Linearly interpolate between two matrices: this = (1 - α) * A + α * B. | |
| Matrix | _mul_mat (const Matrix< K > &mat) const |
| Multiply matrix by another matrix using optimized SIMD path. | |
| Vector< T > | mul_vec (const Vector< T > &x) const |
| Multiply matrix by a vector using SIMD. | |
| Matrix< K > | transpose () const |
| Returns the transpose \( A^T \) of the matrix (column-major layout). | |
| Matrix< K > | trace () const |
| Returns the trace of a square matrix as a 1×1 matrix. | |
| Matrix< K > | inverse () const |
| Compute the inverse of the matrix using Gauss–Jordan elimination. | |
| K | det () const |
| Compute the determinant using Gaussian elimination. | |
| size_t | rank (K eps=K(1e-6)) const |
| Compute the numerical rank of the matrix. | |
| Matrix & | operator+= (const Matrix &m) |
| Matrix & | operator-= (const Matrix &m) |
| Matrix & | operator*= (K alpha) |
Additional Inherited Members | |
| Public Attributes inherited from tensorium::Matrix< K, true > | |
| size_t | rows |
| size_t | cols |
| aligned_vector< K > | data |
| size_t | block_size |
| bool | iscolumn |
| size_t | simd_width |
MatrixKernel provides specialized SIMD-accelerated matrix multiplication routines for statically-sized square matrices.
This class inherits from a column-major Matrix<K, true> and offers optimized kernels for specific square sizes (2x2, 3x3, ..., 64x64), using AVX/SIMD intrinsics for high performance. The kernels exploit register-level blocking and FMADD chaining.
| K | Scalar type (float, double, etc.) |
| using tensorium::MatrixKernel< K >::reg = typename Simd::reg |
| using tensorium::MatrixKernel< K >::Simd = simd::SimdTraits<K, DefaultISA> |
|
inline |
Construct a MatrixKernel from a column-major matrix.
| m | Source matrix. |
References tensorium::Matrix< K, true >::Matrix().
Referenced by mul_mat16x16(), mul_mat2x2(), mul_mat32x32(), mul_mat3x3(), mul_mat4x4(), mul_mat64x64(), and mul_mat8x8().
|
inline |
Construct a MatrixKernel from a row-major matrix by copying elements.
| m | Source matrix. |
References tensorium::Matrix< K, RowMajor >::cols, tensorium::Matrix< K, true >::cols, tensorium::Matrix< K, true >::Matrix(), tensorium::Matrix< K, RowMajor >::rows, and tensorium::Matrix< K, true >::rows.
|
inline |
Construct an empty column-major matrix kernel of size (r × c).
References tensorium::Matrix< K, true >::Matrix().
|
inline |
Multiply two 16×16 matrices using SIMD with FMADD accumulation. This function splits each row into two registers (low/high).
| B | Right-hand matrix. |
References tensorium::Matrix< K, RowMajor >::data, tensorium::Matrix< K, true >::data, tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().
Referenced by tensorium::mul_mat().
|
inline |
Multiply two 2×2 matrices using SIMD.
| B | Right-hand matrix. |
References tensorium::Matrix< K, RowMajor >::data, tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().
|
inline |
Multiply two 32×32 matrices using SIMD. Each row is split into two registers (16 elements each).
| B | Right-hand matrix. |
References tensorium::Matrix< K, RowMajor >::data, tensorium::Matrix< K, true >::data, tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().
Referenced by tensorium::mul_mat().
|
inline |
Multiply two 3×3 matrices using SIMD.
| B | Right-hand matrix. |
References tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().
Referenced by tensorium::mul_mat().
|
inline |
Multiply two 4×4 matrices using SIMD.
| B | Right-hand matrix. |
References tensorium::Matrix< K, RowMajor >::data, tensorium::Matrix< K, true >::data, tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().
Referenced by tensorium::mul_mat().
|
inline |
Multiply two 64×64 matrices using SIMD. Each row is split into 4 SIMD registers (4×16 elements). Vectorized FMADD chaining is used for performance.
| B | Right-hand matrix. |
References tensorium::Matrix< K, RowMajor >::data, tensorium::Matrix< K, true >::data, tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().
|
inline |
Multiply two 8×8 matrices using SIMD.
| B | Right-hand matrix. |
References tensorium::Matrix< K, RowMajor >::data, tensorium::Matrix< K, true >::data, tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().
Referenced by tensorium::mul_mat().