MatrixKernel provides specialized SIMD-accelerated matrix multiplication routines for statically-sized square matrices. More...

#include <MatrixKernel.hpp>

Inheritance diagram for tensorium::MatrixKernel< K >:

Collaboration diagram for tensorium::MatrixKernel< K >:

Public Types
using	Simd = simd::SimdTraits<K, DefaultISA>
using	reg = typename Simd::reg
Public Types inherited from tensorium::Matrix< K, true >
using	Simd
using	reg

Public Member Functions
	MatrixKernel (const Matrix< K, true > &m)
	Construct a MatrixKernel from a column-major matrix.
	MatrixKernel (const Matrix< K, false > &m)
	Construct a MatrixKernel from a row-major matrix by copying elements.
	MatrixKernel (size_t r, size_t c)
	Construct an empty column-major matrix kernel of size (r × c).
Matrix< K >	mul_mat2x2 (const MatrixKernel< K > &B) const
	Multiply two 2×2 matrices using SIMD.
Matrix< K >	mul_mat3x3 (const MatrixKernel< K > &B) const
	Multiply two 3×3 matrices using SIMD.
Matrix< K >	mul_mat4x4 (const MatrixKernel< K > &B) const
	Multiply two 4×4 matrices using SIMD.
Matrix< K >	mul_mat8x8 (const MatrixKernel< K > &B) const
	Multiply two 8×8 matrices using SIMD.
Matrix< K >	mul_mat16x16 (const MatrixKernel< K > &B) const
	Multiply two 16×16 matrices using SIMD with FMADD accumulation. This function splits each row into two registers (low/high).
Matrix< K >	mul_mat32x32 (const MatrixKernel< K > &B) const
	Multiply two 32×32 matrices using SIMD. Each row is split into two registers (16 elements each).
Matrix< K >	mul_mat64x64 (const MatrixKernel< K > &B) const
	Multiply two 64×64 matrices using SIMD. Each row is split into 4 SIMD registers (4×16 elements). Vectorized FMADD chaining is used for performance.
Public Member Functions inherited from tensorium::Matrix< K, true >
	Matrix (size_t r, size_t c)
	Construct a matrix of size r × c, initialized with zeros.
size_t	index (size_t i, size_t j) const
size_t	size () const
	Return the total number of elements.
K &	operator() (size_t i, size_t j)
	Element access (mutable).
void	print () const
	Print the matrix to stdout.
void	swap_rows (size_t i, size_t j)
	Swap two rows of the matrix.
Vector< T >	operator* (const Vector< T > &v) const
	Multiply matrix by a vector (naïve fallback).
void	add (const Matrix &m)
	In-place matrix addition: this += m.
void	sub (const Matrix &m)
	In-place matrix subtraction: this -= m.
void	scl (K a)
	In-place scalar multiplication: this *= a.
void	lerp (const Matrix< K > &A, const Matrix< K > &B, K alpha)
	Linearly interpolate between two matrices: this = (1 - α) * A + α * B.
Matrix	_mul_mat (const Matrix< K > &mat) const
	Multiply matrix by another matrix using optimized SIMD path.
Vector< T >	mul_vec (const Vector< T > &x) const
	Multiply matrix by a vector using SIMD.
Matrix< K >	transpose () const
	Returns the transpose \( A^T \) of the matrix (column-major layout).
Matrix< K >	trace () const
	Returns the trace of a square matrix as a 1×1 matrix.
Matrix< K >	inverse () const
	Compute the inverse of the matrix using Gauss–Jordan elimination.
K	det () const
	Compute the determinant using Gaussian elimination.
size_t	rank (K eps=K(1e-6)) const
	Compute the numerical rank of the matrix.
Matrix &	operator+= (const Matrix &m)
Matrix &	operator-= (const Matrix &m)
Matrix &	operator*= (K alpha)

Additional Inherited Members
Public Attributes inherited from tensorium::Matrix< K, true >
size_t	rows
size_t	cols
aligned_vector< K >	data
size_t	block_size
bool	iscolumn
size_t	simd_width

Detailed Description

template<typename K>
class tensorium::MatrixKernel< K >

MatrixKernel provides specialized SIMD-accelerated matrix multiplication routines for statically-sized square matrices.

This class inherits from a column-major Matrix<K, true> and offers optimized kernels for specific square sizes (2x2, 3x3, ..., 64x64), using AVX/SIMD intrinsics for high performance. The kernels exploit register-level blocking and FMADD chaining.

Template Parameters

K	Scalar type (float, double, etc.)

Member Typedef Documentation

◆ reg

template<typename K>

using tensorium::MatrixKernel< K >::reg = typename Simd::reg

◆ Simd

template<typename K>

using tensorium::MatrixKernel< K >::Simd = simd::SimdTraits<K, DefaultISA>

Constructor & Destructor Documentation

◆ MatrixKernel() [1/3]

template<typename K>

tensorium::MatrixKernel< K >::MatrixKernel ( const Matrix< K, true > & m )

inline

Construct a MatrixKernel from a column-major matrix.

Parameters

m	Source matrix.

References tensorium::Matrix< K, true >::Matrix().

Referenced by mul_mat16x16(), mul_mat2x2(), mul_mat32x32(), mul_mat3x3(), mul_mat4x4(), mul_mat64x64(), and mul_mat8x8().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ MatrixKernel() [2/3]

template<typename K>

tensorium::MatrixKernel< K >::MatrixKernel ( const Matrix< K, false > & m )

inline

Construct a MatrixKernel from a row-major matrix by copying elements.

Parameters

m	Source matrix.

References tensorium::Matrix< K, RowMajor >::cols, tensorium::Matrix< K, true >::cols, tensorium::Matrix< K, true >::Matrix(), tensorium::Matrix< K, RowMajor >::rows, and tensorium::Matrix< K, true >::rows.

Here is the call graph for this function:

◆ MatrixKernel() [3/3]

template<typename K>

tensorium::MatrixKernel< K >::MatrixKernel	(	size_t	r,
		size_t	c )

inline

Construct an empty column-major matrix kernel of size (r × c).

References tensorium::Matrix< K, true >::Matrix().

Here is the call graph for this function:

Member Function Documentation

◆ mul_mat16x16()

template<typename K>

Matrix< K > tensorium::MatrixKernel< K >::mul_mat16x16 ( const MatrixKernel< K > & B ) const

inline

Multiply two 16×16 matrices using SIMD with FMADD accumulation. This function splits each row into two registers (low/high).

Parameters

B	Right-hand matrix.

Returns: Result of multiplication.

References tensorium::Matrix< K, RowMajor >::data, tensorium::Matrix< K, true >::data, tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().

Referenced by tensorium::mul_mat().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ mul_mat2x2()

template<typename K>

Matrix< K > tensorium::MatrixKernel< K >::mul_mat2x2 ( const MatrixKernel< K > & B ) const

inline

Multiply two 2×2 matrices using SIMD.

Parameters

B	Right-hand matrix.

Returns: Result of multiplication.

References tensorium::Matrix< K, RowMajor >::data, tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().

Here is the call graph for this function:

◆ mul_mat32x32()

template<typename K>

Matrix< K > tensorium::MatrixKernel< K >::mul_mat32x32 ( const MatrixKernel< K > & B ) const

inline

Multiply two 32×32 matrices using SIMD. Each row is split into two registers (16 elements each).

Parameters

B	Right-hand matrix.

Returns: Result of multiplication.

References tensorium::Matrix< K, RowMajor >::data, tensorium::Matrix< K, true >::data, tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().

Referenced by tensorium::mul_mat().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ mul_mat3x3()

template<typename K>

Matrix< K > tensorium::MatrixKernel< K >::mul_mat3x3 ( const MatrixKernel< K > & B ) const

inline

Multiply two 3×3 matrices using SIMD.

Parameters

B	Right-hand matrix.

Returns: Result of multiplication.

References tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().

Referenced by tensorium::mul_mat().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ mul_mat4x4()

template<typename K>

Matrix< K > tensorium::MatrixKernel< K >::mul_mat4x4 ( const MatrixKernel< K > & B ) const

inline

Multiply two 4×4 matrices using SIMD.

Parameters

B	Right-hand matrix.

Returns: Result of multiplication.

References tensorium::Matrix< K, RowMajor >::data, tensorium::Matrix< K, true >::data, tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().

Referenced by tensorium::mul_mat().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ mul_mat64x64()

template<typename K>

Matrix< K > tensorium::MatrixKernel< K >::mul_mat64x64 ( const MatrixKernel< K > & B ) const

inline

Multiply two 64×64 matrices using SIMD. Each row is split into 4 SIMD registers (4×16 elements). Vectorized FMADD chaining is used for performance.

Parameters

B	Right-hand matrix.

Returns: Result of multiplication.

References tensorium::Matrix< K, RowMajor >::data, tensorium::Matrix< K, true >::data, tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().

Here is the call graph for this function:

◆ mul_mat8x8()

template<typename K>

Matrix< K > tensorium::MatrixKernel< K >::mul_mat8x8 ( const MatrixKernel< K > & B ) const

inline

Multiply two 8×8 matrices using SIMD.

Parameters

B	Right-hand matrix.

Returns: Result of multiplication.

References tensorium::Matrix< K, RowMajor >::data, tensorium::Matrix< K, true >::data, tensorium::Matrix< K, true >::Matrix(), and MatrixKernel().

Referenced by tensorium::mul_mat().

Here is the call graph for this function:

Here is the caller graph for this function:

The documentation for this class was generated from the following file:

includes/Tensorium/Backend/CPU_Kernels/MatrixKernel.hpp

Public Types

Public Member Functions

Additional Inherited Members

Detailed Description

Member Typedef Documentation

◆ reg

◆ Simd

Constructor & Destructor Documentation

◆ MatrixKernel() [1/3]

◆ MatrixKernel() [2/3]

◆ MatrixKernel() [3/3]

Member Function Documentation

◆ mul_mat16x16()

◆ mul_mat2x2()

◆ mul_mat32x32()

◆ mul_mat3x3()

◆ mul_mat4x4()

◆ mul_mat64x64()

◆ mul_mat8x8()