C4ML 2026

Scope

Machine learning applications are becoming ubiquitous in large-scale production systems. With that growth and the scaling in data volume and model complexity, the focus on efficiently executing machine learning models has become even greater. The push for increased energy efficiency has led to the emergence of diverse heterogeneous system and accelerator architectures.

In parallel, model complexity and diversity pushed for higher productivity systems, more powerful programming abstractions, type systems, language embeddings, frameworks and libraries. Compilers have historically been the bridge between programmer efficiency and high performance code, allowing the expression of code that remains understandable and productive to port and extend, while producing high-performance code for diverse architectures. As such, compiler techniques have been increasingly incorporated into machine learning frameworks. This goes both ways: given the broadening gap between high-level constructs and hardware accelerators, compilers in machine learning frameworks also emerged as natural clients of machine learning techniques, from domain-specific heuristics to autotuning.

This workshop aims to highlight cutting edge work and research that incorporates compiler techniques and algorithms with optimizing machine learning workloads. The workshop topics span from high-level abstract representations to code generation for accelerators.

Program

The workshop features 10 presentations from leading ML compiler experts from industry and academia.

Opening Remarks

11:00 - 11:05

Session 1: Code Generation & Kernels

11:05 - 12:45

Library Liberation: Competitive Performance Matmul Through Compiler-composed Nanokernels

Arun Thangamani, Md Asghar Ahmad Shahid, Adam Siemieniuk, Rolf Morel, Renato Golin, Alexander Heinecke

Intel Corporation

Enabling Compiler-Driven Transformation of Attention Variants

Ivan Ho, Kunwar Grover, Tobias Grosser

University of Cambridge and AMD

Systematic Code Generation for ML Computations based on Multi-Dimensional Homomorphisms

Ari Rasch, Richard Schulze

University of Muenster

Lunch

12:45 - 13:45

Session 2: Accelerators & Toolchains

13:45 - 15:30

DeepTools: A Full-Stack Machine Learning Compiler for the IBM Spyre Accelerator

Prasanth Chatarasi, Shubham Jain, Alberto Mannari, Sanchari Sen, Swagath Venkataramani, Viji Srinivasan

IBM Research

From Triton to AMD NPU: Compiler-Driven Kernel Generation with MLIR-AIR

Erwei Wang, Emily Furst, Yiannis Papadopoulos, Aaron Knoll, Mike Chu, Joseph Melber, Stephen Neuendorffer, Samuel Bayliss

AMD and AMD Research

From PyTorch to Calyx: An Open-Source Compiler Toolchain for ML Accelerators

Jiahan Xie, Evan Williams, Adrian Sampson

University of California, Santa Cruz and Cornell University

Defeat the Heap: Zero-Copy Data Movement in AXI4MLIR

Elam Cohavi, Nicolas Bohm Agostini, Jude Haris, Antonino Tumeo, David Kaeli, José Cano

University of Glasgow, Northeastern University, and Pacific Northwest National Laboratory

Break

15:30 - 16:00

Session 3: Optimization & Analysis

16:00 - 17:20

Magellan: Autonomous Discovery of Novel Compiler Optimization Heuristics with AlphaEvolve

Hongzheng Chen, Alexander Novikov, Ngân (NV) Vũ, Hanna Alam, Zhiru Zhang, Aiden Grossman, Mircea Trofin, Amir Yazdanbakhsh

Google, Google DeepMind, and Cornell University

Analyzing the complexities of graph compilers for machine learning

Kshitij Jain, Satyam Srivastava

d-Matrix Corporation

XTC: A Research Platform for Optimizing AI Workload Operators

Hugo Pompougnac, Christophe Guillon, Sylvain Noiry, Alban Dutilleul, Guillaume Iooss, Fabrice Rastello

Inria, Univ. Grenoble Alpes, CNRS, Grenoble INP

Closing

17:20