mamba paper Things To Know Before You Buy
at last, we provide an example of an entire language product: a deep sequence design spine (with repeating Mamba blocks) + language product head.
working on byte-sized tokens, transformers scale improperly as each and every token ought to "go to" to each other token resulting in O(n2) scaling legal