Disrupting transformers

We can talk about Mamba, Liquid and RNNs is all we need and how they offer an optimisation and how they are relevant

🔥 On replacing transformers

Since transformers were introduced, they have taken the AI world by storm - in large part due to their ability to scale efficiently using our current hardware accelerators (GPUs). In the last two years, a few architectures have emerged as serious contenders to transformers. All of those to some extent involve rethought RNNs architectures, such as Mamba, xLSTM, Liquid and minGRU.

We think a variant of those architectures will eventually replace transformers, but should you care? In short, no.

The reason is that these architectures do not unlock any practical applications but rather optimise the cost for running AI which anyway is falling quite fast. The bitter lesson of AI says that scaling (larger models, better hardware) is the main driver of progress followed by search (AI that thinks).

To conclude, we would advise not paying too much attention on these alternative architectures unless the cost of running AI is of utmost importance to you. Even in that case, these new, alternative architectures don't come with a large supporting ecosystem yet like transformers do - so be cautious ⚠️