Comparing Covariation among Vaccine Hesitancy and Broader Beliefs within Twitter and Survey Data

Larner College of Medicine at the University of Vermont (Nowak); RAND Corporation (Nowak, Parker, Gidengil, Matthews); Pardee RAND Graduate School (Chen); Boston Children's Hospital (Gidengil)
"Twitter data could be a valid proxy measure for changes in the co-occurrence of beliefs among the population in response to interventions such as messaging campaigns by public health agencies."
Social media platforms can be conduits of both information and misinformation about vaccines and vaccine-hesitant beliefs. Social media also provides a way for researchers to observe the beliefs of a greater number of individuals than is possible through more traditional research methods, such as surveys and focus groups. However, users of any given social media platform are not representative of the population as a whole, views are volunteered rather than elicited, and the generalisability of findings from social media research is a subject of ongoing debate. In that context, this study examined whether it is possible to draw similar conclusions from Twitter and national (United States, or US) survey data about the relationship between vaccine hesitancy and a broader set of beliefs. The study found that belief distributions on Twitter generally were concordant with those elicited in a survey after bots were removed and agreement/disagreement with the topics tweeted was inferred based on weblinks.
In 2018, the researchers conducted a nationally representative online survey of 615 parents in the US through the RAND American Life Panel (ALP) to ask about knowledge and beliefs on 35 items, which included: health conspiracies, 8 political conspiracies, 9 putative vaccine side effects, 2 gestalt vaccine endorsements, and 6 items about vaccine biology/epidemiology. The researchers then developed a set of keyword-based queries corresponding to each of the belief items from the survey and pulled matching tweets from 2017, removing bots.
Furthermore, they developed a procedure to infer whether tweets were endorsing agreement or disagreement with the idea tweeted by using the website domain information included in the tweets. Because the data set had over 500,000 tweets, manually coding each tweet for agreement/disagreement was not feasible. The researchers showed, however, that agree/disagree stances of websites linked to reliably predicted the stances that multiple qualitative coders independently assigned to tweets directly. This showed website stance could be used to infer the stance of the tweeter about the topic they tweeted.
The researchers then resampled the Twitter data based on the number of topics an individual tweeted about. The motivation behind this step was that one key difference between Twitter and survey data is that tweets are the result of an individual taking the initiative to post, respond, retweet, or share an article on a particular topic. In contrast, survey beliefs are elicited using the same set of questions for all respondents, which results in many more stances of agreement or disagreement than does having people volunteer information spontaneously (as occurs on Twitter).
After inferring stance from weblinks in tweets, there was good qualitative and quantitative agreement between the survey and Twitter data. This correspondence improved further after the researchers resampled the Twitter data based on the number of topics an individual tweeted about, as a means of correcting for differential representation for elicited (survey) vs. volunteered (Twitter) beliefs. Overall, the results show that analyses using Twitter data may be generalisable in certain contexts, such as assessing belief covariation. The results also reveal that the levels of health misinformation on social media that have concerned public health officials are not just a manifestation of bots or state-sponsored trolls, as the misinformation distributions observed on social media generally correspond to attitudes in the American population.
PLoS ONE 15(10):e0239826. https://doi.org/10.1371/journal.pone.0239826; and email from Luke J. Matthews to The Communication Initiative on October 26 2020.
- Log in to post comments











































