Linguistic Analysis for Identifying Depression and Subsequent Suicidal Ideation on Weibo: Machine Learning Approaches


Pan Wei123ORCID,Wang Xianbin123,Zhou Wenwei123,Hang Bowen123,Guo Liwen123


1. Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan 430079, China

2. School of Psychology, Central China Normal University, Wuhan 430079, China

3. Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan 430079, China


Depression is one of the most common mental illnesses but remains underdiagnosed. Suicide, as a core symptom of depression, urgently needs to be monitored at an early stage, i.e., the suicidal ideation (SI) stage. Depression and subsequent suicidal ideation should be supervised on social media. In this research, we investigated depression and concomitant suicidal ideation by identifying individuals’ linguistic characteristics through machine learning approaches. On Weibo, we sampled 487,251 posts from 3196 users from the depression super topic community (DSTC) as the depression group and 357,939 posts from 5167 active users on Weibo as the control group. The results of the logistic regression model showed that the SCLIWC (simplified Chinese version of LIWC) features such as affection, positive emotion, negative emotion, sadness, health, and death significantly predicted depression (Nagelkerke’s R2 = 0.64). For model performance: F-measure = 0.78, area under the curve (AUC) = 0.82. The independent samples’ t-test showed that SI was significantly different between the depression (0.28 ± 0.5) and control groups (−0.29 ± 0.72) (t = 24.71, p < 0.001). The results of the linear regression model showed that the SCLIWC features, such as social, family, affection, positive emotion, negative emotion, sadness, health, work, achieve, and death, significantly predicted suicidal ideation. The adjusted R2 was 0.42. For model performance, the correlation between the actual SI and predicted SI on the test set was significant (r = 0.65, p < 0.001). The topic modeling results were in accordance with the machine learning results. This study systematically investigated depression and subsequent SI-related linguistic characteristics based on a large-scale Weibo dataset. The findings suggest that analyzing the linguistic characteristics on online depression communities serves as an efficient approach to identify depression and subsequent suicidal ideation, assisting further prevention and intervention.


the Fundamental Research Funds for the Central Universities

Knowledge Innovation Program of Wuhan-Shuguang Project

the Research Program Funds of the Collaborative Innovation Center of Assessment toward Basic Education Quality




Health, Toxicology and Mutagenesis,Public Health, Environmental and Occupational Health

Reference74 articles.

1. Institute of Health Metrics and Evaluation (2021, May 01). Global Health Data Exchange (GHDx). Available online:

2. Prevalence and correlates of the proposed DSM-5 diagnosis of chronic depressive disorder;Murphy;J. Affect. Disord.,2012

3. Epidemiology of adult DSM-5 major depressive disorder and its specifiers in the United States;Hasin;JAMA Psychiatry,2018

4. (2022, November 08). Depression. World Health Organization. Available online:

5. Clinical diagnosis of depression in primary care: A meta-analysis;Mitchell;Lancet,2009

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3