Media & Entertainment

YouTube’s recommender AI still a horror show, finds major crowdsourced study

Comment

youtube ios app
Image Credits: TechCrunch

For years YouTube’s video-recommending algorithm has stood accused of fuelling a grab bag of societal ills by feeding users an AI-amplified diet of hate speech, political extremism and/or conspiracy junk/disinformation for the profiteering motive of trying to keep billions of eyeballs stuck to its ad inventory.

And while YouTube’s tech giant parent Google has, sporadically, responded to negative publicity flaring up around the algorithm’s antisocial recommendations — announcing a few policy tweaks or limiting/purging the odd hateful account — it’s not clear how far the platform’s penchant for promoting horribly unhealthy clickbait has actually been rebooted.

The suspicion remains nowhere near far enough.

New research published today by Mozilla backs that notion up, suggesting YouTube’s AI continues to puff up piles of “bottom-feeding”/low-grade/divisive/disinforming content — stuff that tries to grab eyeballs by triggering people’s sense of outrage, sewing division/polarization or spreading baseless/harmful disinformation — which in turn implies that YouTube’s problem with recommending terrible stuff is indeed systemic; a side effect of the platform’s rapacious appetite to harvest views to serve ads.

That YouTube’s AI is still — per Mozilla’s study — behaving so badly also suggests Google has been pretty successful at fuzzing criticism with superficial claims of reform.

The mainstay of its deflective success here is likely the primary protection mechanism of keeping the recommender engine’s algorithmic workings (and associated data) hidden from public view and external oversight — via the convenient shield of “commercial secrecy.”

But regulation that could help crack open proprietary AI blackboxes is now on the cards — at least in Europe.

To fix YouTube’s algorithm, Mozilla is calling for “common sense transparency laws, better oversight, and consumer pressure” — suggesting a combination of laws that mandate transparency into AI systems; protect independent researchers so they can interrogate algorithmic impacts; and empower platform users with robust controls (such as the ability to opt out of “personalized” recommendations) are what’s needed to rein in the worst excesses of the YouTube AI.

Europe lays out its plan to reboot digital rules and tame tech giants

Regrets, YouTube users have had a few …

To gather data on specific recommendations being made made to YouTube users — information that Google does not routinely make available to external researchers — Mozilla took a crowdsourced approach, via a browser extension (called RegretsReporter) that lets users self-report YouTube videos they “regret” watching.

The tool can generate a report that includes details of the videos the user had been recommended, as well as earlier video views, to help build up a picture of how YouTube’s recommender system was functioning. (Or, well, “dysfunctioning” as the case may be.)

The crowdsourced volunteers whose data fed Mozilla’s research reported a wide variety of “regrets,” including videos spreading COVID-19 fear-mongering, political misinformation and “wildly inappropriate” children’s cartoons, per the report — with the most frequently reported content categories being misinformation, violent/graphic content, hate speech and spam/scams.

A substantial majority (71%) of the regret reports came from videos that had been recommended by YouTube’s algorithm itself, underscoring the AI’s starring role in pushing junk into people’s eyeballs.

The research also found that recommended videos were 40% more likely to be reported by the volunteers than videos they’d searched for themselves.

Mozilla even found “several” instances when the recommender algorithmic put content in front of users that violated YouTube’s own community guidelines and/or was unrelated to the previous video watched. So a clear fail.

A very notable finding was that regrettable content appears to be a greater problem for YouTube users in non-English speaking countries: Mozilla found YouTube regrets were 60% higher in countries without English as a primary language — with Brazil, Germany and France generating what the report said were “particularly high” levels of regretful YouTubing. (And none of the three can be classed as minor international markets.)

Pandemic-related regrets were also especially prevalent in non-English speaking countries, per the report — a worrying detail to read in the middle of an ongoing global health crisis.

The crowdsourced study — which Mozilla bills as the largest-ever into YouTube’s recommender algorithm — drew on data from more than 37,000 YouTube users who installed the extension, although it was a subset of 1,162 volunteers — from 91 countries — who submitted reports that flagged 3,362 regrettable videos that the report draws on directly.

These reports were generated between July 2020 and May 2021.

What exactly does Mozilla mean by a YouTube “regret”? It says this is a crowdsourced concept based on users self-reporting bad experiences on YouTube, so it’s a subjective measure. But Mozilla argues that taking this “people-powered” approach centres the lived experiences of internet users and is therefore helpful in foregrounding the experiences of marginalised and/or vulnerable people and communities (versus, for example, applying only a narrower, legal definition of “harm”).

“We wanted to interrogate and explore further [people’s experiences of falling down the YouTube “rabbit hole”] and frankly confirm some of these stories — but then also just understand further what are some of the trends that emerged in that,” explained Brandi Geurkink, Mozilla’s senior manager of advocacy and the lead researcher for the project, discussing the aims of the research.

“My main feeling in doing this work was being — I guess — shocked that some of what we had expected to be the case was confirmed … It’s still a limited study in terms of the number of people involved and the methodology that we used but — even with that — it was quite simple; the data just showed that some of what we thought was confirmed.

“Things like the algorithm recommending content essentially accidentally, that it later is like ‘oops, this actually violates our policies; we shouldn’t have actively suggested that to people’ … And things like the non-English-speaking user base having worse experiences — these are things you hear discussed a lot anecdotally and activists have raised these issues. But I was just like — oh wow, it’s actually coming out really clearly in our data.”

Mozilla says the crowdsourced research uncovered “numerous examples” of reported content that would likely or actually breach YouTube’s community guidelines — such as hate speech or debunked political and scientific misinformation.

But it also says the reports flagged a lot of what YouTube “may” consider “borderline content.” Aka, stuff that’s harder to categorize — junk/low-quality videos that perhaps toe the acceptability line and may therefore be trickier for the platform’s algorithmic moderation systems to respond to (and thus content that may also survive the risk of a takedown for longer).

However a related issue the report flags is that YouTube doesn’t provide a definition for borderline content — despite discussing the category in its own guidelines — hence, says Mozilla, that makes the researchers’ assumption that much of what the volunteers were reporting as “regretful” would likely fall into YouTube’s own “borderline content” category impossible to verify.

The challenge of independently studying the societal effects of Google’s tech and processes is a running theme underlying the research. But Mozilla’s report also accuses the tech giant of meeting YouTube criticism with “inertia and opacity.”

It’s not alone there either. Critics have long accused YouTube’s ad giant parent of profiting off of engagement generated by hateful outrage and harmful disinformation — allowing “AI-generated bubbles of hate” to surface ever-more baleful (and thus stickily engaging) stuff, exposing unsuspecting YouTube users to increasingly unpleasant and extremist views, even as Google gets to shield its low-grade content business under a user-generated-content umbrella.

Indeed, “falling down the YouTube rabbit hole” has become a well-trodden metaphor for discussing the process of unsuspecting internet users being dragging into the darkest and nastiest corners of the web. This user reprogramming taking place in broad daylight via AI-generated suggestions that yell at people to follow the conspiracy breadcrumb trail right from inside a mainstream web platform.

Back as 2017 — when concern was riding high about online terrorism and the proliferation of ISIS content on social media — politicians in Europe were accusing YouTube’s algorithm of exactly this: Automating radicalization.

However it’s remained difficult to get hard data to back up anecdotal reports of individual YouTube users being “radicalized” after viewing hours of extremist content or conspiracy theory junk on Google’s platform.

Ex-YouTube insider — Guillaume Chaslot — is one notable critic who’s sought to pull back the curtain shielding the proprietary tech from deeper scrutiny, via his algotransparency project.

Mozilla’s crowdsourced research adds to those efforts by sketching a broad — and broadly problematic — picture of the YouTube AI by collating reports of bad experiences from users themselves.

Of course externally sampling platform-level data that only Google holds in full (at its true depth and dimension) can’t be the whole picture — and self-reporting, in particular, may introduce its own set of biases into Mozilla’s data set. But the problem of effectively studying Big Tech’s blackboxes is a key point accompanying the research, as Mozilla advocates for proper oversight of platform power.

In a series of recommendations the report calls for “robust transparency, scrutiny, and giving people control of recommendation algorithms” — arguing that without proper oversight of the platform, YouTube will continue to be harmful by mindlessly exposing people to damaging and braindead content.

The problematic lack of transparency around so much of how YouTube functions can be picked up from other details in the report. For example, Mozilla found that around 9% of recommended regrets (or almost 200 videos) had since been taken down — for a variety of not always clear reasons (sometimes, presumably, after the content was reported and judged by YouTube to have violated its guidelines).

Collectively, just this subset of videos had a total of 160 million views prior to being removed for whatever reason.

In other findings, the research found that regretful views tend to perform well on the platform.

A particular stark metric is that reported regrets acquired a full 70% more views per day than other videos watched by the volunteers on the platform — lending weight to the argument that YouTube’s engagement-optimising algorithms disproportionately select for triggering/misinforming content more often than quality (thoughtful/informing) stuff simply because it brings in the clicks.

While that might be great for Google’s ad business, it’s clearly a net negative for democratic societies that value truthful information over nonsense; genuine public debate over artificial/amplified binaries; and constructive civic cohesion over divisive tribalism.

But without legally enforced transparency requirements on ad platforms — and, most likely, regulatory oversight and enforcement that features audit powers — these tech giants are going to continue to be incentivized to turn a blind eye and cash in at society’s expense.

Mozilla’s report also underlines instances where YouTube’s algorithms are clearly driven by a logic that’s unrelated to the content itself — with a finding that in 43.6% of the cases where the researchers had data about the videos a participant had watched before a reported regret the recommendation was completely unrelated to the previous video.

The report gives examples of some of these logic-defying AI content pivots/leaps/pitfalls — such as a person watching videos about the U.S. military and then being recommended a misogynistic video entitled “Man humiliates feminist in viral video.”

In another instance, a person watched a video about software rights and was then recommended a video about gun rights. So two rights make yet another wrong YouTube recommendation right there.

In a third example, a person watched an Art Garfunkel music video and was then recommended a political video entitled “Trump Debate Moderator EXPOSED as having Deep Democrat Ties, Media Bias Reaches BREAKING Point.”

To which the only sane response is, umm what???

YouTube’s output in such instances seems — at best — some sort of “AI brain fart.”

A generous interpretation might be that the algorithm got stupidly confused. Albeit, in a number of the examples cited in the report, the confusion is leading YouTube users toward content with a right-leaning political bias. Which seems, well, curious.

Asked what she views as the most concerning findings, Mozilla’s Geurkink told TechCrunch: “One is how clearly misinformation emerged as a dominant problem on the platform. I think that’s something, based on our work talking to Mozilla supporters and people from all around the world, that is a really obvious thing that people are concerned about online. So to see that that is what is emerging as the biggest problem with the YouTube algorithm is really concerning to me.”

She also highlighted the problem of the recommendations being worse for non-English-speaking users as another major concern, suggesting that global inequalities in users’ experiences of platform impacts “doesn’t get enough attention” — even when such issues do get discussed.

Responding to Mozilla’s report in a statement, a Google spokesperson sent us this statement:

The goal of our recommendation system is to connect viewers with content they love and on any given day, more than 200 million videos are recommended on the homepage alone. Over 80 billion pieces of information is used to help inform our systems, including survey responses from viewers on what they want to watch. We constantly work to improve the experience on YouTube and over the past year alone, we’ve launched over 30 different changes to reduce recommendations of harmful content. Thanks to this change, consumption of borderline content that comes from our recommendations is now significantly below 1%.

Google also claimed it welcomes research into YouTube — and suggested it’s exploring options to bring in external researchers to study the platform, without offering anything concrete on that front.

At the same time, its response queried how Mozilla’s study defines “regrettable” content — and went on to claim that its own user surveys generally show users are satisfied with the content that YouTube recommends.

In further nonquotable remarks, Google noted that earlier this year it started disclosing a “violative view rate” (VVR) metric for YouTube — disclosing for the first time the percentage of views on YouTube that comes from content that violates its policies.

The most recent VVR stands at 0.16%-0.18% — which Google says means that out of every 10,000 views on YouTube, 16-18 come from violative content. It said that figure is down by more than 70% when compared to the same quarter of 2017 — crediting its investments in machine learning as largely being responsible for the drop.

However, as Geurkink noted, the VVR is of limited use without Google releasing more data to contextualize and quantify how far its AI was involved in accelerating views of content its own rules state shouldn’t be viewed on its platform. Without that key data the suspicion must be that the VVR is a nice bit of misdirection.

“What would be going further than [VVR] — and what would be really, really helpful — is understanding what’s the role that the recommendation algorithm plays in this?” Geurkink told us on that, adding: “That’s what is a complete blackbox still. In the absence of greater transparency [Google’s] claims of progress have to be taken with a grain of salt.”

Google also flagged a 2019 change it made to how YouTube’s recommender algorithm handles “borderline content” — aka, content that doesn’t violate policies but falls into a problematic grey area — saying that that tweak had also resulted in a 70% drop in watch time for this type of content.

Although the company confirmed this borderline category is a moveable feast — saying it factors in changing trends as well as context and also works with experts to determine what get classed as borderline — which makes the aforementioned percentage drop pretty meaningless since there’s no fixed baseline to measure against.

It’s notable that Google’s response to Mozilla’s report makes no mention of the poor experience reported by survey participants in non-English-speaking markets. And Geurkink suggested that, in general, many of the claimed mitigating measures YouTube applies are geographically limited — i.e., to English-speaking markets like the U.S. and U.K. (Or at least arrive in those markets first, before a slower rollout to other places.) 

A January 2019 tweak to reduce amplification of conspiracy-theory content in the U.S. was only expanded to the U.K. market months later — in August — for example.

“YouTube, for the past few years, have only been reporting on their progress of recommendations of harmful or borderline content in the U.S. and in English-speaking markets,” she also said. “And there are very few people questioning that — what about the rest of the world? To me that is something that really deserves more attention and more scrutiny.”

We asked Google to confirm whether it had since applied the 2019 conspiracy-theory-related changes globally — and a spokeswoman told us that it had. But the much higher rate of reports made to Mozilla of — a yes broader measure of — “regrettable” content being made in non-English-speaking markets remains notable.

And while there could be others factors at play, which might explain some of the disproportionately higher reporting, the finding may also suggest that, where YouTube’s negative impacts are concerned, Google directs greatest resource at markets and languages where its reputational risk and the capacity of its machine-learning tech to automate content categorization are strongest.

Yet any such unequal response to AI risk obviously means leaving some users at greater risk of harm than others — adding another harmful dimension and layer of unfairness to what is already a multifaceted, hydra-head of a problem.

It’s yet another reason why leaving it up to powerful platforms to rate their own AIs, mark their own homework and counter genuine concerns with self-serving PR is for the birds.

(In additional filler background remarks it sent us, Google described itself as the first company in the industry to incorporate “authoritativeness” into its search and discovery algorithms — without explaining when exactly it claims to have done that or how it imagined it would be able to deliver on its stated mission of “organizing the world’s information and making it universally accessible and useful” without considering the relative value of information sources. So color us baffled at that claim. Most likely it’s a clumsy attempt to throw disinformation shade at rivals.)

Returning to the regulation point, an EU proposal — the Digital Services Act — is set to introduce some transparency requirements on large digital platforms, as part of a wider package of accountability measures. And asked about this Geurkink described the DSA as “a promising avenue for greater transparency.”

But she suggested the legislation needs to go further to tackle recommender systems like the YouTube AI.

“I think that transparency around recommender systems specifically and also people having control over the input of their own data and then the output of recommendations is really important — and is a place where the DSA is currently a bit sparse, so I think that’s where we really need to dig in,” she told us.

One idea she voiced support for is having a “data access framework” baked into the law — to enable vetted researchers to get more of the information they need to study powerful AI technologies — i.e., rather than the law trying to come up with “a laundry list of all of the different pieces of transparency and information that should be applicable,” as she put it.

The EU also now has a draft AI regulation on the table. The legislative plan takes a risk-based approach to regulating certain applications of artificial intelligence. However it’s not clear whether YouTube’s recommender system would fall under one of the more closely regulated categories — or, as seems more likely (at least with the initial Commission proposal), fall entirely outside the scope of the planned law.

“An earlier draft of the proposal talked about systems that manipulate human behavior, which is essentially what recommender systems are. And one could also argue that’s the goal of advertising at large, in some sense. So it was sort of difficult to understand exactly where recommender systems would fall into that,” noted Geurkink.

“There might be a nice harmony between some of the robust data access provisions in the DSA and the new AI regulation,” she added. “I think transparency is what it comes down to, so anything that can provide that kind of greater transparency is a good thing.

“YouTube could also just provide a lot of this … We’ve been working on this for years now and we haven’t seen them take any meaningful action on this front but it’s also, I think, something that we want to keep in mind — legislation can obviously take years. So even if a few of our recommendations were taken up [by Google] that would be a really big step in the right direction.”

What Vimeo’s growth, profits and value tell us about the online video market

The next tech hearing targets social media algorithms — and YouTube, for once

YouTube releases its first report about how it handles flagged videos and policy violations

YouTube: More AI can fix AI-generated ‘bubbles of hate’

More TechCrunch

Dating app maker Bumble has acquired Geneva, an online platform built around forming real-world groups and clubs. The company said that the deal is designed to help it expand its…

Bumble buys community building app Geneva to expand further into friendships

CyberArk — one of the army of larger security companies founded out of Israel — is acquiring Venafi, a specialist in machine identity, for $1.54 billion. 

CyberArk snaps up Venafi for $1.54B to ramp up in machine-to-machine security

Founder-market fit is one of the most crucial factors in a startup’s success, and operators (someone involved in the day-to-day operations of a startup) turned founders have an almost unfair advantage…

OpenseedVC, which backs operators in Africa and Europe starting their companies, reaches first close of $10M fund

A Singapore High Court has effectively approved Pine Labs’ request to shift its operations to India.

Pine Labs gets Singapore court approval to shift base to India

The AI Safety Institute, a U.K. body that aims to assess and address risks in AI platforms, has said it will open a second location in San Francisco. 

UK opens office in San Francisco to tackle AI risk

Companies are always looking for an edge, and searching for ways to encourage their employees to innovate. One way to do that is by running an internal hackathon around a…

Why companies are turning to internal hackathons

Featured Article

I’m rooting for Melinda French Gates to fix tech’s broken ‘brilliant jerk’ culture

Women in tech still face a shocking level of mistreatment at work. Melinda French Gates is one of the few working to change that.

22 hours ago
I’m rooting for Melinda French Gates to fix tech’s  broken ‘brilliant jerk’ culture

Blue Origin has successfully completed its NS-25 mission, resuming crewed flights for the first time in nearly two years. The mission brought six tourist crew members to the edge of…

Blue Origin successfully launches its first crewed mission since 2022

Creative Artists Agency (CAA), one of the top entertainment and sports talent agencies, is hoping to be at the forefront of AI protection services for celebrities in Hollywood. With many…

Hollywood agency CAA aims to help stars manage their own AI likenesses

Expedia says Rathi Murthy and Sreenivas Rachamadugu, respectively its CTO and senior vice president of core services product & engineering, are no longer employed at the travel booking company. In…

Expedia says two execs dismissed after ‘violation of company policy’

Welcome back to TechCrunch’s Week in Review. This week had two major events from OpenAI and Google. OpenAI’s spring update event saw the reveal of its new model, GPT-4o, which…

OpenAI and Google lay out their competing AI visions

When Jeffrey Wang posted to X asking if anyone wanted to go in on an order of fancy-but-affordable office nap pods, he didn’t expect the post to go viral.

With AI startups booming, nap pods and Silicon Valley hustle culture are back

OpenAI’s Superalignment team, responsible for developing ways to govern and steer “superintelligent” AI systems, was promised 20% of the company’s compute resources, according to a person from that team. But…

OpenAI created a team to control ‘superintelligent’ AI — then let it wither, source says

A new crop of early-stage startups — along with some recent VC investments — illustrates a niche emerging in the autonomous vehicle technology sector. Unlike the companies bringing robotaxis to…

VCs and the military are fueling self-driving startups that don’t need roads

When the founders of Sagetap, Sahil Khanna and Kevin Hughes, started working at early-stage enterprise software startups, they were surprised to find that the companies they worked at were trying…

Deal Dive: Sagetap looks to bring enterprise software sales into the 21st century

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI moves away from safety

After Apple loosened its App Store guidelines to permit game emulators, the retro game emulator Delta — an app 10 years in the making — hit the top of the…

Adobe comes after indie game emulator Delta for copying its logo

Meta is once again taking on its competitors by developing a feature that borrows concepts from others — in this case, BeReal and Snapchat. The company is developing a feature…

Meta’s latest experiment borrows from BeReal’s and Snapchat’s core ideas

Welcome to Startups Weekly! We’ve been drowning in AI news this week, with Google’s I/O setting the pace. And Elon Musk rages against the machine.

Startups Weekly: It’s the dawning of the age of AI — plus,  Musk is raging against the machine

IndieBio’s Bay Area incubator is about to debut its 15th cohort of biotech startups. We took special note of a few, which were making some major, bordering on ludicrous, claims…

IndieBio’s SF incubator lineup is making some wild biotech promises

YouTube TV has announced that its multiview feature for watching four streams at once is now available on Android phones and tablets. The Android launch comes two months after YouTube…

YouTube TV’s ‘multiview’ feature is now available on Android phones and tablets

Featured Article

Two Santa Cruz students uncover security bug that could let millions do their laundry for free

CSC ServiceWorks provides laundry machines to thousands of residential homes and universities, but the company ignored requests to fix a security bug.

3 days ago
Two Santa Cruz students uncover security bug that could let millions do their laundry for free

TechCrunch Disrupt 2024 is just around the corner, and the buzz is palpable. But what if we told you there’s a chance for you to not just attend, but also…

Harness the TechCrunch Effect: Host a Side Event at Disrupt 2024

Decks are all about telling a compelling story and Goodcarbon does a good job on that front. But there’s important information missing too.

Pitch Deck Teardown: Goodcarbon’s $5.5M seed deck

Slack is making it difficult for its customers if they want the company to stop using its data for model training.

Slack under attack over sneaky AI training policy

A Texas-based company that provides health insurance and benefit plans disclosed a data breach affecting almost 2.5 million people, some of whom had their Social Security number stolen. WebTPA said…

Healthcare company WebTPA discloses breach affecting 2.5 million people

Featured Article

Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Microsoft won’t be facing antitrust scrutiny in the U.K. over its recent investment into French AI startup Mistral AI.

3 days ago
Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Ember has partnered with HSBC in the U.K. so that the bank’s business customers can access Ember’s services from their online accounts.

Embedded finance is still trendy as accounting automation startup Ember partners with HSBC UK

Kudos uses AI to figure out consumer spending habits so it can then provide more personalized financial advice, like maximizing rewards and utilizing credit effectively.

Kudos lands $10M for an AI smart wallet that picks the best credit card for purchases

The EU’s warning comes after Microsoft failed to respond to a legally binding request for information that focused on its generative AI tools.

EU warns Microsoft it could be fined billions over missing GenAI risk info