HLPP 2024 Keynote

DataFlow-Threads (DF-Threads): An Execution Model for Scalable Systems

Abstract

Dataflow techniques have been investigated across various levels of granularity. 

At the instruction level, superscalar processors have successfully implemented this by allowing instructions to execute out-of-order as soon as their operands are ready. 

Programming models like OmpSs2 and OpenMP manage data flow among tasks and schedule potentially large sets of instructions (dataflow/asynchronous tasks) across available computational units such as CPU cores, GPU cores, and accelerators.
Despite these advances, the hardware-software interface still lacks: i) a clean and efficient interface to manage thread-level parallelization and ii) a universally accepted memory consistency model.

These challenges are largely due to the need for synchronization, consistency, and coherency – a long-standing issue intensified by the widespread use of cost-effective, massively parallel systems and domain-specific accelerators.
TERAFLUX and AXIOM projects have explored DataFlow-Threads (DF-Threads) as a potential solution for enhancing performance scalability while providing a simple interface for future massively parallel systems.

DF-Threads can be integrated into the architecture with a few new instructions, thereby extending existing processors to offer more efficient and effective parallelism.
Our experiments demonstrate nearly perfect scalability for systems with over 1000 general-purpose x86_64 (extended) cores running on off-the-shelf Linux-based operating systems.

Could DF-Threads represent a simpler and more efficient method for deploying highly scalable general-purpose systems?