Categorizing Vaccine Confidence With a Transformer-Based Machine Learning Model: Analysis of Nuances of Vaccine Sentiment in Twitter Discourse

FISABIO-Public Health (Kummervold); University of Oxford (Martin); University College London (Martin); London School of Hygiene & Tropical Medicine (Martin, Dada, Kilich, Paterson, Larson); University of Oxford (Martin); University College Dublin (Dada); Vrije Universiteit Amsterdam (Denny); NIHR Health Protection Research Unit (Paterson, Larson); University of Washington (Larson); The Royal Institute of International Affairs (Larson)
"Being able to categorize and understand the overall stance in social media conversations about vaccinations, especially in terms of identifying clusters of discouraging or ambiguous conversations, will make it easier to spot activities that may signal vaccine hesitancy or a decline in vaccine confidence with greater speed and more accuracy."
Because addressing misunderstandings and inaccuracies as early as possible is vital to making sound vaccine policies, multiple studies have been conducted to understand how to monitor vaccination discussions on social media. However, questions remain about how to effectively categorise the nuances in the perceptions of sentiment toward vaccines in the large volume of vaccine data shared daily on social media. (Sentiment analysis refers to the process of automatically determining whether the author of a piece of text is in favour of, against, or neutral toward the subject of the statement.) Manual annotation and analysis are difficult and time consuming, and automated processes have faced challenges in extracting complex stances such as attitudes toward vaccination from large amounts of text. The aim of this study is to test whether transformer-based machine learning could be used as a tool to assess the stance expressed in social media posts toward vaccination during pregnancy.
A total of 16,604 tweets posted between November 1 2018 and April 30 2019 were selected using keyword searches related to maternal vaccination. This data set was collected and coded to complement a larger research study on sentiments and experiences around maternal vaccination across 15 countries (Australia, Brazil, Canada, France, Germany, India, Italy, Mexico, Panama, South Africa, South Korea, Spain, Taiwan, the United Kingdom, and the United States). Individual tweets were manually coded into stance categories across four sentiments toward maternal vaccines: Promotional (in favour of maternal vaccines), Ambiguous (uncertainty with mixed sentiment toward maternal vaccines), Discouraging (against maternal vaccines), and No stance (statements or facts about maternal vaccines that do not reveal the author's stance).
After creating a final data set of 2,722 unique tweets, multiple machine learning techniques were trained on a part of this data set. The main model was based on the May 2019 Whole Word Masking variant of Google's Bidirectional Encoder Representations from Transformers (BERT). BERT is a bidirectional, contextual encoder built on a network architecture called Transformer. The BERT architecture depends on carrying out unsupervised pretraining using techniques called Masked Language Modeling and Next Sentence Prediction. The researchers trained the model on a domain-specific corpus to expose the model to the vocabulary that is typically used in vaccination posts.
Three annotators (fluent English speakers with a postgraduate degree and several years of work experience in the field) individually coded the 2,722 tweets, of which 1,559 (57.27%) were coded identically. After meeting and discussing the tweets they disagreed on, the annotators agreed on the annotating of all the remaining tweets. Although the annotators agreed on a final category for every tweet, they also reported that 6.83% (186/2,722) of tweets "could be open to interpretation". Comparing the final agreed annotating after the discussions with the annotators' initial annotating, the accuracy rates of the individual annotators were 83.3%, 77.9%, and 77.5%.
The accuracy of the machine learning model was also calculated with regard to the final agreed annotating. It was found to be 81.8%. This accuracy is better than the average of the 3 annotators, even after the annotators had carried out multiple annotatings of the same tweet and had been given the opportunity to recode any inconsistencies. The researchers say: "it is doubtful that any individual coder would achieve more than 90% accuracy on this task simply because it is difficult, even with a much larger number of annotators, to agree on an absolute categorization. There will always be tweets that are open to interpretation, preventing a hard target of absolute accuracy."
In conclusion, this study demonstrates that machine learning models are able to achieve close to the same accuracy in categorising tweets as could be expected from a single human coder. "The potential to use this automated process, which is reliable and accurate, could free valuable time and resources for conducting this analysis, in addition to informing potentially effective and necessary interventions."
JMIR Medical Informatics 2021 (Oct 08); 9(10):e29584. Image credit: Pixabay via JMIR
- Log in to post comments











































