Using social media data to determine depression risk: A validation study
Bridianne O'Dea, Mark Larsen, Thin Nguyen, Dinh Phung, Svetha Venkatesh, Helen Christensen
Building: Holme Building
Room: Sutherland Room
Date: 2016-07-21 01:30 PM – 03:00 PM
Last modified: 2016-07-01
Abstract
Introduction: Depression is a leading cause of disability and represents a major health and economic burden. Current methods of detection rely on the self-identification of symptoms which is limited by poor mental health literacy and lack of awareness. Social media provides an extraordinary opportunity to identify individuals who are depressed and those who are at risk of a depressive onset or relapse; however, such an approach needs to be validated. This project aimed to determine the individual markers of linguistic expression (i.e. features, topics, and emotional sentiments) on social media that accurately represent an individual’s risk of depression.
Methods: Using social media and online advertisements, we recruited 117 individuals who had personal blogs into a 16 week study in which their depression and anxiety symptoms were measured fortnightly using the PHQ-9 and GAD-7 self-report questionnaires. These questionnaires were delivered via email. Participants also gave permission for their personal blog data to be analysed. Using validated analysis techniques, we extracted the linguistic features from the blogs and correlated these with participants’ mental health scores. We then utilised sophisticated Bayesian, non-parametric methods to determine individual patterns.
Results: We found that certain social media features were highly correlated (r= 0.67-0.96) with mental health scores at an individual level in 75% of the study participants. For example, the mental health scores for one participant was highly correlated with the expression of negative emotion (r=0.96, p=0.003). A second participant’s mental health scores were negatively correlated with the use of personal pronouns (r=-0.85, p=0.08), and for a third participant their discussion of religion was negatively correlated (p=-0.8, p=0.017). Only 24% participants dropped out; however, the models robustly accounted for attrition and used all available data.
Discussion: These results indicate that we are able to identify when individuals are in a depressive state based on analysis of their social media data. Innovations include the novel method for depression detection; the application of Bayesian nonparametric methods for context-sensitive modelling; and the ability to identify using personalised predictive algorithms. By analysing the social media data that individuals generate naturally and effortlessly in the daily course of their lives, we can monitor individuals’ mood to determine when support is required.
Methods: Using social media and online advertisements, we recruited 117 individuals who had personal blogs into a 16 week study in which their depression and anxiety symptoms were measured fortnightly using the PHQ-9 and GAD-7 self-report questionnaires. These questionnaires were delivered via email. Participants also gave permission for their personal blog data to be analysed. Using validated analysis techniques, we extracted the linguistic features from the blogs and correlated these with participants’ mental health scores. We then utilised sophisticated Bayesian, non-parametric methods to determine individual patterns.
Results: We found that certain social media features were highly correlated (r= 0.67-0.96) with mental health scores at an individual level in 75% of the study participants. For example, the mental health scores for one participant was highly correlated with the expression of negative emotion (r=0.96, p=0.003). A second participant’s mental health scores were negatively correlated with the use of personal pronouns (r=-0.85, p=0.08), and for a third participant their discussion of religion was negatively correlated (p=-0.8, p=0.017). Only 24% participants dropped out; however, the models robustly accounted for attrition and used all available data.
Discussion: These results indicate that we are able to identify when individuals are in a depressive state based on analysis of their social media data. Innovations include the novel method for depression detection; the application of Bayesian nonparametric methods for context-sensitive modelling; and the ability to identify using personalised predictive algorithms. By analysing the social media data that individuals generate naturally and effortlessly in the daily course of their lives, we can monitor individuals’ mood to determine when support is required.