This paper is available as a PDF document. It presents theoretical work on geometric sparse attention achieving O(n log n) complexity versus O(n²) for standard transformers, claiming 50,000x efficiency at 1M context length through coordinate bucketing rather than approximation.
Contact kit@holdtheline.tech for a copy of the full paper.