ACSPRI Conferences, RC33 Eighth International Conference on Social Science Methodology

Font Size:  Small  Medium  Large

Beyond the words: methodological challenges in computer techniques for content analysis

Valeria Pandolfini

Building: Law Building
Room: Breakout 5 - Law Building, Room 020
Date: 2012-07-10 01:30 PM – 03:00 PM
Last modified: 2012-06-12


Recently the development of content analysis studies has been characterized by significant changes strictly related to ICT’ rapid expansion and to increasing use of Internet as communication tool providing to researchers access to large data heterogeneity as well to innovative techniques for data collection and analysis. The paper aims to deepen methodological foundations, applications and challenges in using computer techniques for text data collection, pre-processing, extraction and analysis, focusing on when they are used to go beyond the manifest meaning of texts in order to discover their latent meaning. It addresses the relationship between the role of content analysis software and the role of researcher, whereas the first one works as an assistant and support, as documentation center, recording all steps of analysis and it doesn’t replace the researcher, who has always the responsibility to make important qualitative decisions to interpret the meaning of the words. These themes are faced trough a case study using statistical methods of content analysis to explore the content of “online talks”, i.e. messages posted in asynchronous forums by adult blended course’ learners, in order to interpret text data, to categorize them depending on their meaning and to study the relationships among the meanings. Texts’ meanings and relations-among-meanings are explored by creation of thematic categories and exploration of the context in which terms are used, drawing a conceptual map linking words and developed topics. The content analysis has been realized through the use of TALTAC software (Lexical and Textual Processing for the Analysis of Content), a software application for the automatic analysis of texts according to the logics of both Text Analysis (TA) and Text Mining (TM), by employing the techniques of “textual statistics” (Lebart and Salem, 1998; Bolasco, 2005). TALTAC, by automatic functions, allows texts normalization and lexicalization, categorization according to pre-defined semantic classes and their fusion and, through correspondence analysis, it allows the exploration of the association among corpus words. The case study allows to deepen methodological issues related to human coding processes management and shows the qualitative decisions assumed to categorize the texts, starting by an accurate quantitative analysis of the vocabulary of the corpus (N = 5.129.923 words token) till a more qualitative analysis trough the classification of texts according to the individual characteristics (age, gender, professional profile) of the subjects posting the messages and the study of repeated segments (Salem, 1987), of concordances and of positive/negative connotation of the text.