Automatically Appraising the Credibility of Vaccine-Related Web Pages Shared on Social Media: A Twitter Surveillance Study

Fri, 02/21/2020 - 08:59

0 comments

Affiliation

Macquarie University (Shah, Surian, Dyda, Coiera, Dunn); Hamad Bin Khalifa University (Shah); Harvard Medical School (Mandl); Boston Children's Hospital (Mandl, Dunn)

Date

Mon, 11/04/2019 - 12:00

Summary

"The ability to measure how people engage and share misinformation on social media may help us better target and monitor the impact of public health interventions..."

Although the rapid growth of Web-based communications has benefited public health by providing access to a much broader range of health information, most people trust what they read on the Web without attempting to assess its credibility, which includes factors like veracity, readability and clarity, the use and transparency of sources, biases and false balance, and disclosure of conflicts of interest. Considering that misinformation can cause harm by influencing attitudes and beliefs, and given the rate at which new information is made available and the resources needed to appraise them, the researchers developed and tested machine learning methods to support the automatic credibility appraisal of vaccine-related information on Twitter.

The researchers collected 6,591,566 English-language, vaccine-related tweets and retweets from 1,860,662 unique Twitter users between January 17 2017 and March 14, 2018, using the Twitter Search Application Programming Interface, with a set of predefined search terms (including "vaccin*", "immunis*", "vax*", and "antivax*"). For all unique users posting vaccine-related tweets during the study period, they collected the lists of their followers to construct the social network. A set of 143,003 unique URLs extracted from the set of tweets constituted the text-based Web pages that were included in the analysis.

The credibility appraisal tool used in the study was developed by 3 of the researchers, who have expertise in public health, public health informatics, science communication, and journalism. Through a process described in the paper, they arrived at a tool with 7 criteria: (1) information presented is based on objective, scientific research; (2) adequate detail about the level of evidence offered by the research is included; (3) uncertainties and limitations in the research in focus are described; (4) the information does not exaggerate, overstate, or misrepresent available evidence; (5) it provides context for the research in focus; (6) it uses clear, nontechnical language that is easy to understand; and (6) it is transparent about sponsorship and funding.

They then compared 3 machine learning methods that are commonly used for document classification problems: support vector machines (SVM), random forests (RF), and recurrent neural networks (RNN).

Following the development of what they considered to be a reliable tool for automatically estimating the credibility of vaccine-related communications at scale, they aimed to characterise patterns of potential exposure to low-credibility vaccine communications on Twitter. For each Web page that met the study's inclusion criteria, they estimated its credibility score using the best-performing classifiers for each criterion. They then aggregated the total number of tweets posted during the study period that included a link to the Web page, including tweets and retweets. Finally, they estimated the potential exposure by summing the total number of followers for all tweets and retweets.

The study found that the best-performing classifiers were able to distinguish between low, medium, and high credibility with an accuracy of 78% and labeled low-credibility Web pages with a precision of over 96%. (The RF classifiers produced the highest performance overall, and in most cases predicted, whether the text on a vaccine-related Web page satisfied each of the credibility criteria with over 90% accuracy.) Across the set of unique Web pages, 11.86% (16,961 of 143,003) were estimated as low credibility and they generated 9.34% (1.64 billion of 17.6 billion) of potential exposures. The 100 most popular links to low-credibility Web pages were each potentially seen by an estimated 2 million to 80 million Twitter users globally.

In short, the study found that although low-credibility Web pages were shared less often overall, there were certain subpopulations where the sharing of low-credibility Web pages was common. "The results show that it is feasible to estimate credibility appraisal for Web pages about vaccination without additional human input, suggesting the performance - although variable - is high enough to warrant their use in surveillance." In particular, "Knowing where low-credibility communications are most commonly shared on social media may support the development of communication interventions targeted specifically at communities that are most likely to benefit...Although the methods are not yet precise enough to reliably identify individual links to low-credibility communications, they may eventually be useful as the basis for countermeasures such as active debunking."

In conclusion: "The results suggest two new ways to address the challenge of misinformation, including ongoing surveillance to identify at-risk communities and better target resources in health promotion and embedding the tool in interventions that flag low-credibility communications for consumers as they engage with links to Web pages on social media."

Web link

Click here to read the article online or to download it in PDF format (14 pages…

Source

Journal of Medical Internet Research 2019 (Nov 04); 21(11):e14007

Legacy Partners

Automatically Appraising the Credibility of Vaccine-Related Web Pages Shared on Social Media: A Twitter Surveillance Study

Red de La Iniciativa de Comunicación

Soul Beat Africa Network

The Drum Beat Network