Abstract: In distributed matrix multiplication, stragglers present a significant challenge. Coding techniques are often employed to mitigate this issue; however, their effectiveness is typically ...
CUDA-L2 is a system that combines large language models (LLMs) and reinforcement learning (RL) to automatically optimize Half-precision General Matrix Multiply (HGEMM) CUDA kernels. CUDA-L2 ...
There was an error while loading. Please reload this page.