While more than half of the US adult population has received at least one dose of the COVID-19 vaccine, as many as 1 in 4 American adults do not intend to do so, with another 5% “uncertain.” This vaccine skepticism carries a potential consequence of continued community transmission and fatalities, as well endangering a return to normalcy.
Online content creators can be particularly impactful in shaping public opinion around vaccinations. Still, popularity and viewership do not equate infallibility: these users can spread harmful narratives as easily as positive. While most mainstream social media platforms have implemented policies to combat false information surrounding COVID-19, none have gone as far as removing personal expressions of vaccine hesitancy. However, this borderline content is likely increasing vaccine skepticism.
Prior research has focused on skepticism regarding COVID-19 and mask wearing, however the widespread production of vaccines means that the spotlight must now shift to vaccine hesitancy. In this report we aim to identify cases where individuals on YouTube express vaccine hesitancy. Despite being harmful, the majority of these cases would not be appropriate to remove from the platform given their current guidelines. Nevertheless, we believe it’s important to measure the scale of this content as the narrative progresses. We believe this data will be beneficial to those working on campaigns targeting vaccine hesitancy, as well as ongoing vaccine mindset research.
For this analysis, we operationalized “vaccine hesitancy” on video-based formats through three different conditions:
Videos conveying vaccine hesitancy can be found in a wide variety of YouTube categories. Since Pendulum is still in the process of ramping up category coverage, we limit this analysis to the following channels:
From these channels we have caption data for 14M videos, which we split into 205M “snippets” (short sections of captions that are ~100 tokens long). In order to limit the amount of caption data that needs to be processed, we limit ourselves to the following set of videos and snippets:
These filters result in a set of 167K videos and 729K caption snippets, a dataset that is far too large to manually review. From here it might be tempting to add additional keywords and hand crafted patterns to narrow down the number of caption snippets that need to be considered further or perhaps even just use patterns to identify vaccine hesitancy cases. However, this is unlikely to work for the following reasons:
Pendulum has developed a machine learning (ML) method that works significantly better than using hand crafted patterns and only requires a small amount of labeled data and user input. Applying this ML method allows us identify vaccine hesitancy cases for:
We’ve also created an interactive chart at the bottom of this report to explore these cases and filter them along a variety of dimensions.
Who is expressing vaccine hesitancy?
In order to measure the accuracy of our method and better understand the cases in which individuals were expressing vaccine hesitancy, we manually labeled 165 random examples from the final dataset. Of these, 45 came from videos that are no longer available (creator made private or YT removed the video or channel). We labeled these cases based on transcripts (all others were labeled by reviewing the video). We found that our model was correct for 76% of these cases.
For the videos that were still available at the time of review, we also labeled who is expressing vaccine hesitancy in the content and found:
We also reviewed vaccine hesitancy snippets from the most subscribed channels and identified the top 50 in which the channel creator (or a host of the channel) themselves was expressing vaccine hesitancy (as opposed to someone they were interviewing or a recording). We believe the impact of these top 50 channels is significant. In addition to the videos in which their creators (or hosts) expressed personal vaccine hesitancy, they also posted a combined 1,999 videos mentioning COVID and vaccines in general, yielding 163,717,911 total views between them. It’s likely many of these contain additional harmful narratives.
We use political tags from the Transparency Tube dataset and find that 18 of the top 50 vaccine hesitant creators are “Partisan Right” while only 2 are “Partisan Left”. In addition, half of these top channels are political despite political YouTube channels only covering a small portion of all YouTube channels.
Over the course of our analysis, we noticed several key themes across justifications for vaccine hesitancy:
Developed too quickly: Concern derived from the fact that the COVID vaccine was developed and approved at a faster rate than other vaccines.
“I’m not taking this one. Something that is created this fast? Having the virus in it? I'm happy with hydroxychloroquine and zinc.”
Immune system strength: Minimizing the importance of the vaccine by claiming their immune system is already strong enough, such as through prior exposure or rarely getting sick.
“I feel like I have been in a lot of situations where I feel like I should have, like, COVID for a minute now… I’ve been in some serious situations but… I rarely get sick”
“I think that our immune systems are pretty freaking amazing, and we take supplements to help to boost our immune systems even further”
“I’ve decided I don’t need it because I’ve already had COVID, my family has had COVID, I’m not going to get it”
General opposition to vaccines: Overall opposition to vaccines, not specific to COVID
“I, as a mom myself, have chosen to not vaccinate my children”
Genetic fear mongering: Claims that the vaccine impacts DNA or “who you are.”
“That's why I will not take the vaccine that they're offering now - It's an rna vaccine that changes your DNA”
Downplaying the severity: Rhetorics that minimize or mitigate the intensity of COVID
“I’m not getting the vaccine for something that’s like the flu”
We provide an interactive chart below to explore channels expressing vaccine hesitancy between Jan 1, 2020 and May 1, 2021. This chart can be filtered by different date ranges and channel classifications, as well as by specific snippets with vaccine hesitancy.
NOTE - These snippets have not been manually reviewed. Our model has a precision of 0.76, meaning roughly 1 in 4 examples will be incorrect. There is also a large variance in how vaccine hesitancy content is contextualized. For example, nearly all news outlets that share interviews of vaccine hesitancy interviews counter them with expert viewpoints. This has a much different impact than a creator expressing vaccine hesitancy to their viewers.
We cross-referenced these channels with categorization tags from Transparency.tube, which classifies YouTube channels based on the sociopolitical ideology of their content. The 50 most popular accounts comprised nine different tags: Non-political, Partisan Right, Anti-SJW, Religious Conservative, Conspiracy, Partisan Left, Social Justice, Mainstream News, and QAnon. Overall, half of the top 50 accounts were non-political, while over one-third were Partisan Right.
Other notable findings:
* Numbers total more than 50 as a video may have multiple tags. No tags were repeated for the same channel