Data Mining and Content Analysis of the Chinese Social Media Platform Weibo During the Early COVID-19 Outbreak: Retrospective Observational Infoveillance Study

University of California San Diego School of Medicine (Li, Xu, Cuomo, Purushothaman, Mackey); S-3 Research LLC (Li, Xu, Mackey); University of California San Diego Extension (Li, Xu, Mackey); Global Health Policy Institute (Li, Xu, Cuomo, Purushothaman, Mackey)
"Evaluating whether social media can act as a positive tool to promote global health objectives, particularly in the context of health emergencies, will be tested by COVID-19, along with its utility as a modern approach to public health surveillance."
In the early stages of an outbreak, and in the absence of complete epidemiological data, user-generated social media data can be mined to assess the public's knowledge, attitudes, and behaviours toward the disease and can help characterise disease distribution when cross-validated with traditional disease surveillance data. Leveraging infoveillance approaches, these researchers conducted a retrospective observational study for COVID-19 on the Chinese microblogging website Sina Weibo [新浪微博], also known as the Chinese equivalent to Twitter. In addition to attempting to assess whether Weibo posts about COVID-19 were predictive of the number of reported cases during the outbreak's early stages, the researchers conducted a qualitative analysis of COVID-19-related themes detected and discussed by users located in Wuhan, China, the city where the outbreak began.
Using an automated Python (Python Software Foundation) programming script, the researchers collected Chinese-language messages on Weibo from Wuhan between December 23 2019 and January 30 2020. For quantitative analysis, the total daily cases of COVID-19 in Wuhan were obtained from the Chinese National Health Commission, and a linear regression model was used to determine if Weibo COVID-19 posts were predictive of the number of cases reported. Qualitative content analysis and an inductive manual coding approach were used to identify parent classifications of news and user-generated COVID-19 topics.
During the 39 days of the study time frame, 115,299 Weibo posts were collected, consisting of an average of 2,956 posts per day. Keywords included the Chinese-language terms: [冠状病毒] (coronavirus), [新型肺炎] (novel pneumonia), [武汉肺炎] (Wuhan pneumonia), [疫情] (epidemic situation), [非典] (severe acute respiratory syndrome), [华南海鲜市场] (Wuhan Seafood Wholesale Market). Quantitative analysis found a positive correlation between the number of Weibo posts with those keywords and the number of reported cases from Wuhan, with approximately 10 more COVID-19 cases per 40 social media posts (P<.001). This effect size was also larger than what was observed for the rest of China, excluding Hubei Province (where Wuhan is the capital city) and held when comparing the number of Weibo posts to the incidence proportion of cases in Hubei Province. However, the researchers stress, "any potential predictive value of using social media data as a proxy for real world public health surveillance statistics needs more rigor and added data layers to confirm possible associations, particularly in the context of user reactions to news events as discussed in this and other studies."
Qualitative analysis of 11,893 posts during the first 21 days of the study period with COVID-19-related posts uncovered 4 parent classifications, with intercoder kappa agreement scores for each theme as follows: the causative agent of the disease was 99.04%, the changing epidemiological characteristics of the outbreak was 98.26%, and the public reaction to the outbreak control and response measures was 97.27%. A prevailing theme throughout the outbreak period that changed based on the availability of new information was that the causative agent of the outbreak was unknown, leading to uncertainty among Chinese users regarding the risks associated with the outbreak. Overall, the researchers observed a wide variation in user reactions to information, with some users expressing a willingness to undertake protective behaviour and other users downplaying the risks and engaging in behaviours that could have exacerbated disease spread (e.g., leaving Wuhan, attending New Year events). These observed attitudes and behaviours happened prior to the Chinese government announcing a lockdown of Wuhan and other cities in Hubei on January 23 2020.
More specific subthemes regarding users' knowledge, attitudes, and responses to COVID-19 changed as more information about the underlining epidemiology became available. Accompanying a shift in terminology was wide variation in user reactions as information from government sources was disseminated and the outbreak worsened (e.g., criticism of the Wuhan Red Cross response and user uncertainty related to news about quarantines). The researchers found that both the nature of the content and the volume of posts were likely driven by a combination of release of government information and news events.
In conclusion: "More research is needed to better understand the effectiveness of health communication strategies during evolving outbreaks such as COVID-19, particularly in the context of how information is understood, shared, and acted upon by users in the face of uncertainty and changing information. Specifically, we need to better understand how social media platforms can influence the public's risk perception, their trust and credibility of different information sources, and, ultimately, how it changes real-world behavior that can have an impact on control measures enacted to mitigate an outbreak."
JMIR Public Health and Surveillance 2020; 6(2):e18700. http://doi.org/10.2196/18700.
- Log in to post comments











































