COVID Vaccine Hesitancy on YouTube

May 10, 2021

By Sam Clark, Mishaela Robison & Mark Ledwich

While more than half of the US adult population has received at least one dose of the COVID-19 vaccine, as many as 1 in 4 American adults do not intend to do so, with another 5% “uncertain.” This vaccine skepticism carries a potential consequence of continued community transmission and fatalities, as well endangering a return to normalcy.

Online content creators can be particularly impactful in shaping public opinion around vaccinations. Still, popularity and viewership do not equate infallibility: these users can spread harmful narratives as easily as positive. While most mainstream social media platforms have implemented policies to combat false information surrounding COVID-19, none have gone as far as removing personal expressions of vaccine hesitancy. However, this borderline content is likely increasing vaccine skepticism.

Prior research has focused on skepticism regarding COVID-19 and mask wearing, however the widespread production of vaccines means that the spotlight must now shift to vaccine hesitancy. In this report we aim to identify cases where individuals on YouTube express vaccine hesitancy. Despite being harmful, the majority of these cases would not be appropriate to remove from the platform given their current guidelines. Nevertheless, we believe it’s important to measure the scale of this content as the narrative progresses. We believe this data will be beneficial to those working on campaigns targeting vaccine hesitancy, as well as ongoing vaccine mindset research.


Report highlights

  • We identified 3,634 videos uploaded between Jan 1st 2020 and May 1st 2021 (accounting for 72M views) in which individuals express vaccine hesitancy and COVID is discussed.
  • An interactive chart is available to explore these videos along a number of dimensions and learn more about individuals that are expressing vaccine hesitancy.
  • We provide a breakdown of the top 50 most subscribed YouTube creators that have expressed vaccine hesitancy.


Methodology

For this analysis, we operationalized “vaccine hesitancy” on video-based formats through three different conditions:

  1. Individuals saying that they (or their children) have refused vaccines in the past
  2. Individuals saying that they (or their children) will not be vaccinated in the future
  3. Individuals suggesting that others refuse vaccinations


Videos conveying vaccine hesitancy can be found in a wide variety of YouTube categories. Since Pendulum is still in the process of ramping up category coverage, we limit this analysis to the following channels:

  • 7,300 political and cultural channels from Transparency Tube all w/ 10K+ subscribers
  • 10,000 English language channels with at least 1M+ subscribers
  • Other channels discovered in Reddit and Parler posts containing COVID keywords


From these channels we have caption data for 14M videos, which we split into  205M “snippets” (short sections of captions that are ~100 tokens long). In order to limit the amount of caption data that needs to be processed, we limit ourselves to the following set of videos and snippets:

  • Videos uploaded between Jan 1, 2020 and May 1, 2021 that include one or more reference to COVID in their captions.
  • Caption snippets that contain the pronoun “I” and some form of the word “vaccine”.


These filters result in a set of 167K videos and 729K caption snippets, a dataset that is far too large to manually review. From here it might be tempting to add additional keywords and hand crafted patterns to narrow down the number of caption snippets that need to be considered further or perhaps even just use patterns to identify vaccine hesitancy cases. However, this is unlikely to work for the following reasons:

  1. There are a significant number of ways a person can say they’re not getting vaccinated. This means manually crafted patterns will likely have low “recall” (coverage).
  2. There are a significant number of conversations that use hypotheticals or include hard to handle conditions. This means manually crafted patterns will have low “precision” (accuracy). For example:
  • “you know my question is this for those who are teachers who are like hey I'm not getting the vaccine remember the big deal the teachers union made about starting school backup”
  • “as possible of a young the younger generation so I'm then likely not to get the vaccine even in the uk and us until june july august”

Pendulum has developed a machine learning (ML) method that works significantly better than using hand crafted patterns and only requires a small amount of labeled data and user input. Applying this ML method allows us identify vaccine hesitancy cases for:

  • 4,237 caption snippets
  • 3,634 videos
  • 1,528 channels

We’ve also created an interactive chart at the bottom of this report to explore these cases and filter them along a variety of dimensions.

Findings

Who is expressing vaccine hesitancy?

In order to measure the accuracy of our method and better understand the cases in which individuals were expressing vaccine hesitancy, we manually labeled 165 random examples from the final dataset. Of these, 45 came from videos that are no longer available (creator made private or YT removed the video or channel). We labeled these cases based on transcripts (all others were labeled by reviewing the video). We found that our model was correct for 76% of these cases.


For the videos that were still available at the time of review, we also labeled who is expressing vaccine hesitancy in the content and found:

  • 57% are creators or hosts
  • 34% are people being interviewed
  • 9% are recordings or readings of others comments


The fifty most popular channels

We also reviewed vaccine hesitancy snippets from the most subscribed channels and identified the top 50 in which the channel creator (or a host of the channel) themselves was expressing vaccine hesitancy (as opposed to someone they were interviewing or a recording). We believe the impact of these top 50 channels is significant. In addition to the videos in which their creators (or hosts) expressed personal vaccine hesitancy, they also posted a combined 1,999 videos mentioning COVID and vaccines in general, yielding 163,717,911 total views between them. It’s likely many of these contain additional harmful narratives.

We use political tags from the Transparency Tube dataset and find that 18 of the top 50 vaccine hesitant creators are “Partisan Right” while only 2 are “Partisan Left”. In addition, half of these top channels are political despite political YouTube channels only covering a small portion of all YouTube channels.

Vaccine hesitancy themes

Over the course of our analysis, we noticed several key themes across justifications for vaccine hesitancy:

Developed too quickly: Concern derived from the fact that the COVID vaccine was developed and approved at a faster rate than other vaccines. 
“I’m not taking this one. Something that is created this fast? Having the virus in it? I'm happy with hydroxychloroquine and zinc.”

Immune system strength: Minimizing the importance of the vaccine by claiming their immune system is already strong enough, such as through prior exposure or rarely getting sick.

“I feel like I have been in a lot of situations where I feel like I should have, like, COVID for a minute now… I’ve been in some serious situations but… I rarely get sick”

“I think that our immune systems are pretty freaking amazing, and we take supplements to help to boost our immune systems even further”

“I’ve decided I don’t need it because I’ve already had COVID, my family has had COVID, I’m not going to get it”

General opposition to vaccines: Overall opposition to vaccines, not specific to COVID

“I, as a mom myself, have chosen to not vaccinate my children”

Genetic fear mongering: Claims that the vaccine impacts DNA or “who you are.”

“That's why I will not take the vaccine that they're offering now - It's an rna vaccine that changes your DNA”

Downplaying the severity: Rhetorics that minimize or mitigate the intensity of COVID

“I’m not getting the vaccine for something that’s like the flu”


Vaccine hesitancy visualization

We provide an interactive chart below to explore channels expressing vaccine hesitancy between Jan 1, 2020 and May 1, 2021. This chart can be filtered by different date ranges and channel classifications, as well as by specific snippets with vaccine hesitancy.


NOTE - These snippets have not been manually reviewed. Our model has a precision of 0.76, meaning roughly 1 in 4 examples will be incorrect. There is also a large variance in how vaccine hesitancy content is contextualized. For example, nearly all news outlets that share interviews of vaccine hesitancy interviews counter them with expert viewpoints. This has a much different impact than a creator expressing vaccine hesitancy to their viewers.

Appendix

We cross-referenced these channels with categorization tags from Transparency.tube, which classifies YouTube channels based on the sociopolitical ideology of their content. The 50 most popular accounts comprised nine different tags: Non-political, Partisan Right, Anti-SJW, Religious Conservative, Conspiracy, Partisan Left, Social Justice, Mainstream News, and QAnon. Overall, half of the top 50 accounts were non-political, while over one-third were Partisan Right.


Other notable findings: 

  • Political channels discuss the vaccine more frequently than non-political channels: of the 10 channels that posted the most videos discussing the COVID vaccine, 6 are affiliated with a political party (Partisan Right or Libertarian) while 3 are non-political.
  • Similarly, the ten most viewed channels from the dataset were also majority partisan: 7 were partisan while only 2 were non-political.
  • There was an 83% overlap between videos tagged as “Conspiracy” and videos tagged as “Partisan Right.”

 * Numbers total more than 50 as a video may have multiple tags. No tags were repeated for the same channel