Entity

Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents

Sparse attention is becoming increasingly important for serving large language models (LLMs) as generation lengths continue to grow. However, deploying and evaluating new sparse attention algorithms at scale remains highly engineering-intensive, slowing both human researchers and AI agents in exploring the sparse attention design. To address this challenge, we present Vortex, a system that combines a Python-embedded frontend language atop a page-centric tensor abstraction for expressing a broad

Paper · arXiv

cs.AI

Authors: Zhuoming Chen, Xinrui Zhong, Qilong Feng, Ranajoy Sadhukhan, Yang Zhou + 3 more
Published: 2026-06-04

Abstract ↗

via arXiv · 2606.06453