Towards Interactive Algorithmic Support for Inductive Content Analysis
Aneesha Bakharia
Building: Law Building
Room: Breakout 5 - Law Building, Room 020
Date: 2012-07-10 01:30 PM – 03:00 PM
Last modified: 2012-06-12
Abstract
There now exists numerous software applications to assist researchers in qualitative content analysis. Despite the increased number of these applications researchers continue to argue that there has been little progress in computer-aided content analysis over the last few decades. Qualitative content analysis software applications have largely focused on data management and research project lifecycle management. Although more sophisticated algorithms for automated theme discovery have been available, mainstream adoption has been limited. The main contributing factor has been the lack of interactivity provided by algorithms. Interactivity is seen as a means by which a researcher can incorporate domain knowledge to better contextualise the analysis performed by an algorithm.
Recent theme discovery algorithms such as Non-negative Matrix Factorisation (NMF) and Latent Dirichlet Allocation (LDA) that are particularly suited to the task of finding latent themes within document collections will be introduced within this paper. NMF produces a matrix decomposition where the resulting two matrices only contain positive values and maps themes to both documents and words. LDA is a generative model that represents a document as a mixture of themes, each of which is afforded a different probability distribution. Both algorithms possess features that are not found in other statistical procedures such as Latent Semantic Analysis and k-means clustering but by default lacks interactivity. Results of a study to determine the types of interactivity that content analysts require will also be presented.
Recent theme discovery algorithms such as Non-negative Matrix Factorisation (NMF) and Latent Dirichlet Allocation (LDA) that are particularly suited to the task of finding latent themes within document collections will be introduced within this paper. NMF produces a matrix decomposition where the resulting two matrices only contain positive values and maps themes to both documents and words. LDA is a generative model that represents a document as a mixture of themes, each of which is afforded a different probability distribution. Both algorithms possess features that are not found in other statistical procedures such as Latent Semantic Analysis and k-means clustering but by default lacks interactivity. Results of a study to determine the types of interactivity that content analysts require will also be presented.