ModernBERT
✨ The BERT makeover
Before the ChatGPT and LLM tsunami 🌊 BERT variants used to dominate AI. In fact good old BERT still dominates structured use cases like text classification and named entity recognition, especially after the prototype phase. At the same time, there have been little to no updates to BERT models - even though a lot of the learnings from training LLMs are directly applicable to BERT as well.
ModernBERT changes that, bringing BERT to the modern era, with
📏 a context window of 8192 tokens instead of the original 512
🏎️ FlashAttention for fast inference
💿 2T tokens of training data instead of the original 300M
among other improvements.
Let’s bring ModernBERT to more domains 🩺⚖️ and languages 🇫🇷🇩🇪🇪🇸
🔗 Read more https://arxiv.org/abs/2412.13663
