← Back to homepage

What is ConvBERT and how does it work?

February 26, 2021 by Chris

Convolutional BERT (ConvBERT) improves the original BERT by replacing some Multi-headed Self-attention segments with cheaper and naturally local operations, so-called span-based dynamic convolutions. These are integrated into the self-attention mechanism to form a mixed attention mechanism, allowing Multi-headed Self-attention to capture global patterns; the Convolutions focus more on the local patterns, which are otherwise captured anyway. In other words, they reduce the computational intensity of training BERT.

Hi, I'm Chris!

I know a thing or two about AI and machine learning. Welcome to MachineCurve.com, where machine learning is explained in gentle terms.