Creating an MLIR dialect out-of-tree means describing your operations in TableGen (.td) files and then implementing the connection in C++. Here’s how and why each part matters, presented as a logical workflow, not as a step-by-step tutorial for beginners, but as a technical, narrative explanation.
The process begins by defining the dialect itself in a file like include/TutoDialect/TutoDialect.td
. This file is concise: it gives MLIR the dialect’s name (tuto
) and its C++ namespace, which enables TableGen to generate all C++ symbols and the syntax used in MLIR files. For example:
include "mlir/IR/OpBase.td"
def Tuto_Dialect : Dialect {
let name = "tuto";
let summary = "A Tuto out-of-tree MLIR dialect.";
let description = [{
This dialect is an example of an out-of-tree MLIR dialect designed to
illustrate the basic setup required to develop MLIR-based tools without
working inside of the LLVM source tree.
}];
let cppNamespace = "::mlir::tuto";
}
On the same principle, you then describe the dialect’s operations in a dedicated TableGen file, typically include/TutoDialect/TutoDialectOps.td
. This file gathers all operations: each operation is declared with its input/output types and names, and its MLIR assembly format. For instance, an addition of floats:
include "TutoDialect.td"
include "mlir/IR/AttrTypeBase.td"
include "mlir/IR/DialectBase.td"
include "mlir/Interfaces/InferTypeOpInterface.td"
include "mlir/Interfaces/SideEffectInterfaces.td"
def AddOp : Tuto_Op<"add", [Pure]> {
let summary = "Addition";
let description = [{
Addition operation between two values.
}];
let arguments = (ins F64:$lhs, F64:$rhs);
let results = (outs F64:$res);
let assemblyFormat = [{
$lhs `,` $rhs `:` type($lhs) attr-dict
}];
}
This file is the blueprint for generating the full C++ class for the operation via TableGen, ensuring parsing, syntax, and printing are consistent and correct.
Designing an MLIR dialect outside the LLVM source tree is fundamentally about separating declaration from implementation. This architectural split declarative TableGen and connecting C++ is what allows MLIR to scale and remain maintainable, even as dialects grow.
Everything starts with TableGen .td
files.
TutoDialect.td
: This file defines the dialect, its MLIR name, summary, and C++ namespace. It is the canonical description from which TableGen generates all symbols and basic metadata.
TutoDialectOps.td
: Here, you describe all operations of your dialect. Each op is defined with its operands/results, assembly syntax, documentation, and interfaces. TableGen will turn this into a complete C++ class, including parsing, printing, and basic verification logic.
Key Point:
TableGen.td
files are the sole source of truth for the syntax, signatures, and metadata of your dialect and ops.
All boilerplate and repetitive code (parsing, printing, verification stubs, etc.) is generated from here.
The glue between TableGen and the MLIR C++ API consists of several headers:
TutoOps.h
#ifndef TUTO_TUTOOPS_H
#define TUTO_TUTOOPS_H
#include "mlir/IR/BuiltinTypes.h"
#include "mlir/IR/Dialect.h"
#include "mlir/IR/OpDefinition.h"
#include "mlir/Interfaces/InferTypeOpInterface.h"
#include "mlir/Interfaces/SideEffectInterfaces.h"
#define GET_OP_CLASSES
#include "TutoDialect/TutoOps.h.inc"
#endif // TUTO_TUTOOPS_H
This header gathers all operation class declarations that TableGen generates into TutoOps.h.inc
, so you can use them from C++.
It also includes all relevant MLIR headers (types, base classes, interfaces).
TutoDialect.h
#ifndef TUTO_TUTODIALECT_H
#define TUTO_TUTODIALECT_H
#include "mlir/IR/Dialect.h"
#include "TutoDialect/TutoOpsDialect.h.inc"
#endif // TUTO_TUTODIALECT_H
This header links MLIR to your dialect class, which is also generated by TableGen as TutoOpsDialect.h.inc
.
It makes your dialect visible and instantiable within MLIR.
TutoOps.cpp
#include "TutoDialect/TutoOps.h"
#include "TutoDialect/TutoDialect.h"
#include "mlir/IR/OpImplementation.h"
#define GET_OP_CLASSES
#include "TutoDialect/TutoOps.cpp.inc"
This file pulls in all TableGen-generated implementations for your ops.
Here you would also add any custom verification/builders/etc. for your operations if needed.
TutoDialect.cpp
#include "TutoDialect/TutoDialect.h"
#include "TutoDialect/TutoOps.h"
#include "mlir/IR/DialectImplementation.h"
using namespace mlir;
using namespace mlir::tuto;
void TutoDialect::initialize() {
addOperations<
#define GET_OP_LIST
#include "TutoDialect/TutoOps.cpp.inc"
>();
}
This is where the dialect and its operations are registered with MLIR.
The addOperations<>
macro, with the included op list, “injects” all the generated operation classes into your dialect.
This registration is what makes your ops discoverable and usable in tools like mlir-opt
and your own binaries.
When you build, TableGen runs and emits all the necessary .inc
headers from your .td
files (TutoOps.h.inc
, TutoOps.cpp.inc
, TutoOpsDialect.h.inc
, etc.).
Your C++ files include these headers, and the compiler stitches everything together.
You never manually edit the generated .inc
files, they’re regenerated automatically any time your TableGen definitions change.
After this pipeline is in place, you can write, parse, and print your custom operations in .mlir
files.
Your driver binary (tuto-opt
) is now able to:
Parse and validate your dialect/ops,
Print them in MLIR syntax,
Serve as a testbed for further extensions: types, canonicalizations, lowerings, etc.
TableGen is for declarative structure: syntax, types, interfaces, and signatures.
C++ is for connecting, registering, and (optionally) extending with custom logic.
The generated .inc
files are the automatic “bridge” between declarative TableGen and the runtime/compilable C++ world.
The build system keeps everything in sync, allowing you to focus on high-level definitions and advanced extensions.
Once these definitions are written, TableGen (invoked by CMake during the build) generates all the backend C++ (headers and intermediate sources). Then, you just need to implement the minimal glue in C++: the main dialect file, for example lib/TutoDialect/TutoDialect.cpp
, is responsible for registering all operations of the dialect within MLIR. This is done with a simple initialize()
method that adds your operations to the dialect’s table. Nothing magic, this is the key that makes your operations usable in tools like mlir-opt
or your own binary (tuto-opt
).
The main looks like :
#include "mlir/IR/Dialect.h"
#include "mlir/IR/MLIRContext.h"
#include "mlir/InitAllDialects.h"
#include "mlir/InitAllPasses.h"
#include "mlir/Pass/Pass.h"
#include "mlir/Pass/PassManager.h"
#include "mlir/Support/FileUtilities.h"
#include "mlir/Tools/mlir-opt/MlirOptMain.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/InitLLVM.h"
#include "llvm/Support/SourceMgr.h"
#include "llvm/Support/ToolOutputFile.h"
#include <mlir/Dialect/Linalg/IR/Linalg.h>
#include "mlir/Dialect/Math/IR/Math.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/SCF/IR/SCF.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "TutoDialect/TutoDialect.h"
#include "TutoDialect/TutoOpsDialect.cpp.inc"
#include "mlir/Dialect/Func/IR/FuncOps.h"
int main(int argc, char **argv) {
mlir::DialectRegistry registry;
registry.insert<mlir::tuto::TutoDialect>();
registry.insert<mlir::arith::ArithDialect>();
registry.insert<mlir::math::MathDialect>();
registry.insert<mlir::tensor::TensorDialect>();
registry.insert<mlir::affine::AffineDialect>();
registry.insert<mlir::linalg::LinalgDialect>();
registry.insert<mlir::memref::MemRefDialect>();
registry.insert<mlir::LLVM::LLVMDialect>();
registry.insert<mlir::func::FuncDialect>();
return mlir::asMainReturnCode(
mlir::MlirOptMain(argc, argv, "Tuto optimizer driver\n", registry));
}
In an out-of-tree MLIR project, the main driver source (as shown) serves as the interface between your dialect and the MLIR ecosystem. Its purpose is not to hard-code logic but to register the set of dialects you want your tool to support including your own and to delegate all actual IR handling, verification, parsing, pass execution, and pretty-printing to MLIR’s robust infrastructure.
Here, the inclusion of all core dialect headers, alongside your own, signals to MLIR what kinds of operations and types should be recognized and parsed. The dialect registry object is a central component: by inserting your dialect (mlir::tuto::TutoDialect
) and any others (arith, math, tensor, affine, linalg, memref, LLVM, func), you make their ops available as first-class citizens in your IR. This registry becomes the catalogue that MLIR uses at runtime for all dialect resolution and IR manipulation.
The key function is MlirOptMain
, which is a generic driver for IR files and passes, provided directly by MLIR. It expects to be handed a dialect registry and takes care of everything else: loading IR, handling passes, running analyses, producing diagnostics, and emitting transformed IR. It abstracts away boilerplate so that your binary focuses solely on declaring support for dialects, not reimplementing existing tooling.
There is no stepwise logic or custom orchestration here; the code is deliberately minimal, reflecting the compositional, declarative design MLIR encourages. Your dialect integrates seamlessly with all standard passes and dialects simply by being registered. The out-of-tree nature is reflected in the lack of special-casing: your dialect is just another extension point, managed at runtime via the registry, never hardwired into MLIR itself.
This is the architectural pattern that enables scalability, extensibility, and modularity in the MLIR ecosystem.
With these files in place, you simply build the project. The dialect can then be used in a .mlir
file like:
func.func @add_example(%arg0: f32, %arg1: f32) -> f32 {
%res = tuto.add %arg0, %arg1 : f32
return %res : f32
}
And you can test it using your binary:
./bin/Tuto-opt test/TutoTest.mlir
Result: your dialect and operations are fully integrated into MLIR and ready to be extended, add types, patterns, lowerings, whatever you need.
Logical summary:
.td
: all declarative stuff, syntax, signatures, metadata.
TableGen: generates classes/headers.
.cpp
: registers and connects things, and (optionally) advanced logic.
.mlir
: lets you write and test your operations at a high level.
tuto-opt
: your binary for parsing, validating, and manipulating your dialect.