ACSPRI Conferences, RC33 Eighth International Conference on Social Science Methodology

Font Size:  Small  Medium  Large

Harpoon or maggot? A comparison of various metrics to fish for life course patterns

Nicolas Robette, Xavier Bry

Building: Law Building
Room: Breakout 4 - Law Building, Room 106
Date: 2012-07-10 01:30 PM – 03:00 PM
Last modified: 2011-12-12


Although most empirical life course studies rely on an event-based approach, theory has underlined the importance of the concept of trajectory: events shouldn’t be studied independently from each other, and the focus on specific transitions should be completed by a more ample view which conceptualizes the entire life course, i.e. a « ‘holistic’ perspective that sees life courses as one meaningful conceptual unit » (Billari, 2001). A holistic approach allows to summarize the timing and sequence of events, durations in the various states and durations between events. Trajectory-based methods are mostly non-parametric: they make no assumption about the process underlying life courses and they belong to the « algorithmic model culture » (Breiman, 2001). They chiefly aim at exploring and describing life courses and at « fishing for patterns » (Abbott, 2000).
More often than not the first step of holistic approaches consists in measuring the dissimilarity between life courses (regarded as sequences). Pairwise distances between sequences can further be used in various ways, often data reduction techniques such as multidimensional scaling or clustering. Many dissimilarity metrics exist in various domains (bioinformatics, data mining...). Their use in social sciences has significantly risen for a decade or two. The most widely known is certainly Optimal Matching Analysis, but other metrics for sequence analysis have been proposed and other techniques using Correspondence Analysis also exist. Therefore a crucial and pervasive issue in papers using holistic approaches is robustness. To what extent do the various techniques lead to consistent and converging results? What kinds of patterns are more easily fished out by each of the metrics?
Some metrics comparisons do exist in recent publications: most of them conclude to the robustness of sequence analysis, i.e. the main structure of the data emerge whatever methodological settings. However, most of these comparisons have shortfalls: they only deal with a limited range of methods at a time; they apply to specific sets of empirical data; other choices implied in the holistic approach (clustering techniques...) may blur the results. Therefore generalization is often problematic. We here propose a systematic comparison of about ten metrics that have been used in the social science literature, based on the examination of dissimilarity matrices computed from two data sets: a simulated one comprehending various sequence patterns that sociologists may aim at identifying, and an empirical one (about occupational careers) as a “control sample”. Thus what we try here is not to point out a hypothetical “best metric”, but rather to unravel the specific patterns to which each alternative is actually more sensitive.