The obstacles to teaching computers to reason like humans are significant. And with his 30 years of research experience in the field, LeCun believes Facebook can focus on 10 scientific questions to better emulate human-like intelligence. He shared a few of these during our visit.
For instance, at ages three to five months, babies learn the notion of object permanence, a fancy way of explaining that the baby knows that an object behind another is still there and an unsupported object will fall. AI researchers have not built an ML model that understands object permanence.Steven Max Patterson
The article is mostly fluff, with some interesting pieces that stand out, like that statement above. Artificial intelligence is still far from emulating human cognitive processes, as a recent article also stated, in pretty harsh terms.
Note to the tech world. The thing you call AI is just a natural-language-command-line. It's not artificial, nor is it intelligent.— Thomas Baekdal (@baekdal) January 14, 2017
I admit this paragraph about video recognition had me confused though: I understand moving pictures are harder to analyze than stills (especially with an algorithm that has no understanding of object permanence as stated above), but usually videos contain sounds that can be matched with existing recordings to identify their source (running water, animal sounds, cars, human voices). Furthermore, other companies working in this field like Microsoft and Google managed to develop systems to transcribe human speech automatically and in real-time – you would think the things people are talking about on video would help categorize it… Has nobody at Facebook considered the same approach?
Video recognition with the same accuracy achieved with images remains an open problem. Research throughout the AI community has not found a common set of feature descriptors, essentially small regions in a frame used to accurately detect the object in order to classify a wide range of video types. With video, identification problems include action recognition, saliency (which is the identification of the focus of a human viewer's attention), and the equivalent of image captioning (called video summarization).