BERT, XLNet and where to go with current NLP research

Anyone that is at least slightly interested in Natural Language Processing has already probably heard about BERT, (or Bidirectional Encoder Representations from Transformers). Together with the original Transformer paper and other, newer, models, these new neural approaches have been sweeping every single leaderboard available for NLP.

This new fad, however, has not been received without criticism. Papers like this,that claims that BERT is learning only '’spurious statistical cues in the dataset’‘ are raising concerns about how much these neural models are actually learning from data and how much we can rely on these models.

However, another criticism is that these models are just TOO EXPENSIVE (not to mention the heavy environmental impact) to train from scratch. Essentially excluding anyone, but the industry, to develop new models that could beat BERT, XLNet, and others. This Twitter thread, involving some heavy names, like Konrad Kording from UPenn, David Pfau from DeepMind and Yann LeCun (do we need to introduce him?) summarized this topic interestingly:

However, I’m forced to agree with LeCun’s closing statement:

You see, this is not necessarily a problem. As long as the industry is still releasing code and data (even if only the pre-trained models), smaller research groups can still benefit a lot from these improvements.

And we don’t even need to look much further than SIGIR 2019’s accepted short papers, where at least three papers used BERT as a base for some creative approaches. For instance, MacAvaney, et al. came with a really interesting way to add BERT embeddings to a “traditional” (can we say this?) deep IR model for ad-hoc retrieval, with impressive results (probably new SOTA for Robust 04). Another example is Sakata, et al. with also great results on retrieving and ranking FAQ answers, using BERT for ranking query answers pairs.

Smaller research groups cannot (and should not) compete with these large industry-backed laboratories. However, we can be more ‘‘inventive’’ and creative than they are. We can use these models for other tasks, modify them for things that no one has thought before.

I’m not worried about the dominance of industry on NLP research. I’m excited. Let them spend the money. Let them do the large models. And let us be creative in how we apply that. In how we improve uppon them and in how we twist and turn these models for our own good.