10 August 2024

Wired: “Perplexity is a Bullshit Machine”

On June 6, Forbes published an investigative report about how former Google CEO Eric Schmidt’s new venture is recruiting heavily and testing AI-powered drones with potential military applications. (Forbes reported that Schmidt declined to comment.) The next day, John Paczkowski, an editor for Forbes, posted on X to note that Perplexity had essentially republished the sum and substance of the scoop. (It rips off most of our reporting, he wrote. It cites us, and a few that reblogged us, as sources in the most easily ignored way possible.)

That day, Srinivas thanked Paczkowski, noting that the specific product feature that had reproduced Forbes’ exclusive reporting had rough edges and agreeing that sources should be cited more prominently. Three days later, Srinivas boastedinaccurately, it turned out—that Perplexity was Forbes’ second-biggest source of referral traffic. (WIRED’s own records show that Perplexity sent 1,265 referrals to WIRED.com in May, an insignificant amount in the context of the site’s overall traffic. The article to which the most traffic was referred got 17 views.) We have been working on new publisher engagement products and ways to align long-term incentives with media companies that will be announced soon, he wrote. Stay tuned!


In theory, Perplexity’s chatbot shouldn’t be able to summarize WIRED articles, because our engineers have blocked its crawler via our robots.txt file since earlier this year. This file instructs web crawlers on which parts of the site to avoid, and Perplexity claims to respect the robots.txt standard. WIRED’s analysis found that in practice, though, prompting the chatbot with the headline of a WIRED article or a question based on one will usually produce a summary appearing to recapitulate the article in detail.

Dhruv Mehrotra & Tim Marchman

Another looming issue over the blooming LLM industry: plagiarism and copyright violations. In their hunger for information to train models, AI companies have repeatedly made dubious decisions that have resulted in a number of damning situations. The most prominent was probably OpenAI releasing a synthetic voice which sounded very similar to Scarlett Johansson; Wired reporters were able to download paywalled articles from publishers like The New York Times and The Atlantic through Quora’s Assistant bot; and the practice is not limited to startups either, as another investigation found that Apple, Nvidia, and Salesforce used thousands of YouTube videos to train AI.

Aravind Srinivas founder of Perplexity
Former Google intern Aravind Srinivas founded Perplexity three months before ChatGPT launched Noah Berger/AP

My first, cynical reaction to the Forbes story was: but isn’t this what the news industry itself does all the time? One publication comes out with original reporting, possibly exclusive to their subscribers, and yet you read about it everywhere else almost immediately. Sure, the other publications generally give proper attribution and reference the original, but that still doesn’t translate into meaningful revenues for the initial publisher – I doubt most people who read the basics of the story would pay to access the original version. The issue is more egregious in foreign media: here in Romania, I regularly see articles which are straight up translations of English-language publications – and not even high-quality translations, probably they just ran the original text through Google Translate and hit publish.

But this does reveal a fundamental flaw of large-language models, which will hamper their growth and sky-high ambitions. These models are highly dependent on access to data, and their outputs quickly degrade without original, human-produced inputs. Before launching publicly, the companies got away with data scraping and circumventing robots.txt rules as nobody was aware of their intentions. Afterwards, the backlash from artists and publishers have reduced their room for maneuver somewhat. And this all feeds back into the argument about economic viability: these startups assumed they could mine the entire web for free, as search engines have done before; what happens to their prospects when they have to factor in real costs to gather data, either through recurring payments for licensing deals, or legal fees and fines for court cases?

To provide the real-time answers expected from a search engine, as Perplexity presents itself, unrestricted and timely access to the latest information is crucial. Fabricating reporting about breaking news doesn’t look like a winning strategy to me, so figuring out some mutually beneficial arrangement with news sources seems vital to its long-term viability. But would a search engine even work if it indexes only a certain selection of news sources, instead of the entire web?

Perplexity did launch a publisher program at the end of July with partners including Time, Der Spiegel, Fortune, and Automattic. And recent reports show its rising popularity over recent months, despite these controversies. You might notice that Fortune, who first accused Perplexity of plagiarism, is on this initial list of partners; might this have been a negotiating tactic from Forbes to get better terms? It looks like Silicon Valley’s age-old mantra ‘move fast and break things’ was successful for now…

Post a Comment