standing next to me rss

the last time

do you ever think that, when you do something with someone, that might be the last time ever? in some misbegotten way back when, not yet believing that time is the great leveller?

imagine this

everyone thinks you are super in love. you leave the house at 7:30 AM to buy something for your girlfriend to take to work.

on the way back you start thinking about how this might literally be the last time you do this, and start tearing up, but you don’t want to cry in a public place.

you get back and start the painstaking task of separating the meat from the bones so she doesn’t have to do it later. you are alternately bawling and sobbing all the while.

some time later she wakes up and sees you (still peeling, it takes a while) obviously distraught.

“what’s wrong, babe?”

what’s wrong? how can I even begin to explain what is wrong? what words can I find to describe how the shards of our love chipped away, bit by bit, and fell forever into the long silence that is the only thing that will remain when we walk in opposite directions for the final time?

where else can you find two hearts with us-shaped holes in them? nobody ever falls in love thinking that they will be ground down by the emotionless boot of reality, sheer as a cliff face, featureless as Slenderman. nobody ever thinks, in all the sublime hubris of youth, that they will be unable to buck the odds. they think that love conquers all, and that is true, but not for the reason they think.

“love conquers all” really means that obsession and devotion are two sides of the same coin, and seeing it won’t work with your own eyes is no panacea for the bottomless ache inside. it means that love takes your resilience, your restraint, your self-preservation instinct, and your good sense, and whomps them a good one. it pounds them until they don’t know left from right, and hangs them out to dry.

well.

so anyways…I want to talk about Subquadratic.

as a brief primer, in computer science, we use the term “time complexity” to talk about how long certain tasks take on average.

for example, finding a given card in a deck of standard poker cards is done in “linear time”, meaning that if the number of cards doubles, the time taken also doubles (because you need to go through them one by one, but you only need to traverse the whole deck once)

now, imagine you have a box of socks. because you are a neat and orderly person, you want to sort them into pairs. however, because you are cursed by the Genie of Analogies, you can take out only two random socks at a time; if they match, you can set them aside. if they don’t, you have to put them back.

on average, if the number of socks in the box doubles, you’ll take four times as much time to sort them; we say this is done in “quadratic time” (because for each sock, you need to go through the entire box once, so the number of searches you do is proportional to the number of socks squared).

the fundamental mechanism of LLMs is called “attention”, where, for every token (group of characters) in the input, the LLM calculates the relevance of every other token. this maps nicely on the sock analogy (if you don’t look too hard): every sock needs to be considered in relation to every other sock, and every token needs to be considered in relation to every other token. the attention mechanism therefore also takes quadratic time.

when I read the original paper, I was like “yo this is insane”, because in general, quadratic algorithms are only practical for small inputs; definitely not whole ass books. but thanks to Moore’s law, we made it work!

however, we’re at the point where scaling further is impractical; despite the frenetic pace at which new data centres are being constructed, the industry has settled around “context management” - various techniques to limit LLM input, because otherwise it’s just too expensive and slow (and the quality of the output deteriorates rapidly anyway).

enter Subquadratic’s SubQ model, which claims to be a subquadratic (i.e. linear) LLM. that in alone isn’t groundbreaking; many have done that before. no, their real breakthrough, they claim, comes in doing that while matching or beating state of the art models.

if their claims are indeed valid, this would be a pretty big shift. it would mean the ability to ingest and reason across whole codebases, books, and other such repositories that are currently Way Too Big, opening up whole new avenues of LLM use.

deciding whether that is a good or bad thing is left as an exercise for the reader. the ethics of LLM usage and operationalisation (including the consumption of water and other natural resources) aside, though…the science is pretty cool.