These schemas were the subject of a competition held in 2016 in which the winning program was correct on only 58% of the sentences — hardly a better result than if it had guessed. Oren Etzioni, a leading AI researcher, quipped,
When AI can’t determine what ‘it’ refers to in a sentence, it’s hard to believe that it will take over the world.However, the ability of AI programs to solve Winograd schemas rose quickly due to the advent of large neural network language models. A 2020 paper from OpenAI reported that GPT-3 was correct on nearly 90% of the sentences in a benchmark set of Winograd schemas. Other language models have performed even better after training specifically on these tasks. At the time of this writing, neural network language models have achieved about 97% accuracy on a particular set of Winograd schemas that are part of an AI language-understanding competition known as SuperGLUE. This accuracy roughly equals human performance. Does this mean that neural network language models have attained humanlike understanding?
Melanie Mitchell
I mean, the answer to this question is obviously no. It’s one thing to statistically infer the right answers to increasingly convoluted tests, and quite another to understand the meaning of the questions. In some ways, the machine learning approach to passing this linguistic test reminds me of ‘Dieselgate’: a system designed not to perform a proper task, but instead to fulfil requirements in a predefined scenario. In other words, you can always find a way to cheat when you know the questions in advance. The AI model may incorporate more and more examples into its training data, but that doesn’t mean it can spontaneously understand the concepts behind sentences and perform equally well in a random situation.
Post a Comment