ACSPRI Conferences, RC33 Eighth International Conference on Social Science Methodology

Font Size:  Small  Medium  Large

Revolution in Social Science Methodology: Possibilities and Pitfalls

Dipak K. Gupta

Building: Law Building
Room: Breakout 2 - Law Building, Room 026
Date: 2012-07-12 03:30 PM – 05:00 PM
Last modified: 2012-06-19

Abstract


Introduction
There is a new revolution taking placing in how social scientists research aggregate phenomena. Analyses of social phenomena received an unprecedented boost after the invention of high-speed computers. Prior to the 1960s most significant advances in theory building in social sciences were relegated largely to what we may charitably called, “informed speculation.” Karl Marx did not have to bother with the problem of falsifiability of his proposed theory. John Maynard Keynes advanced his general theory without formally statistically testing his hypotheses; he could not empirically demonstrate the link between aggregate demand and economic activities, the central tenet of his argument. The social movement theorists of the period had to rely primarily of perceived plausibility of their postulates or attempt to offer proof by conducting small case studies. There were a number of significant problems for empirical verification of posited hypotheses, the hallmark of scientific reasoning. All of those resulted from the absence of computing technology.
First, without the computer, collection of large-scale aggregate data was problematic and extremely time consuming. The first installation of commercial computer, UNIVAC I, took place in 1951 in the U.S. Bureau of Census. Soon thereafter, a similar machine, employed by the CBS was used for the first time to predict the 1952 presidential election. With the help of the new technology, 1 percent of the US population was surveyed and the TV news organization was able to correctly predict the outcome of the election: the win of Dwight D. Eisenhower. With that social science entered the new era of empirical verification and prediction.
The invention of computers also enhanced the capabilities to crunch numbers beyond any human capabilities. Suddenly, multivariate regressions, which were relegated largely to theoretical development, became commonplace in estimation. With increased computing capabilities came much more sophisticated statistical techniques. With enhanced ability to crunch number, a virtual explosion began to take place in the collection of data on innumerable aspects of life. For instance, today, we don’t need to define a nation’s development solely with per capita GDP. We can define it much more broadly with multi-faceted Human Development Index. We have cross-national indicators of state failure , corruption , and even gross national happiness .
Despite these breathtaking developments in data collection and computing capabilities, traditional social science research methodology suffers from some significant shortcomings. These shortcomings don’t stem from lack of computational capability, but from the process by which social science data are collected. Most social science data are collected through one of two ways. They are either collected from the “real world” or sample populations are polled to learn about their preferences and attitudes. In either case, the process can be extremely time consuming. The decennial census data, the most accurate portrayal of life in the nation take more than a decade to collect, collate, and to publish. Even the quarterly data are lagged by time. Surveys – unless we are talking about routine questions, such as daily presidential approval ratings -- take a long time to design, implement, and to analyze.
While data collection takes a long time, the world events in the age of super connectivity are moving at a faster rate than anyone could have even imagined. In the past, it would take years for collective movements to take shape. The Civil Rights movement took at least a couple of decades to reach its height. Al-Qaeda, similarly, had to struggle for years before it was recognized on the global stage. Compared to these, the lightning speed with which the so-called “Arab Spring” spread through the Middle East and the “Occupy Wall Street” spilled over most of the developed world is simply astounding.
The quick spreads of ideas are taking place due to two important factors. Both of these relate to the advancement in computer technology. The first relates to the physical nature of the innovations while the second pertains to the psychology of the users. The ease of communication has indeed made our effort to communicate with a large number of people extremely cost effective. In the past, people needed to have the “megaphone” which often came at the heavy expenditure of time and money. As a result, those who were at positions of public prominence had the “bully pulpit” from which to address a large crowd. Others, attempting a grass-root movement had to slog through years of work. These days, messages can be spread through the web and myriad social media outlets and can put forward ideas that were not readily discussed. Second, as Olson (1968) pointed out, there was a psychological impediment in voluntarily organizing large protest movements, known in the literature as the “collective action” problem. Some of these problems resulted from the fact that the costs of exposing their contrarian viewpoints would fall primarily on those who would dare to take the first step, while the benefits, if the common good is procured, would flow to everyone in the community. As a result, the collective action problem would dictate that even when there is a deeply help desire by a large segment of the population for a significant social change, such demand would probably not even be properly articulated. The risks of having to incur stiff costs would weigh heavily in the minds of the early activists. However, today, most activists feel -- albeit often erroneously -- a sense of anonymity when computers are used for the purposes of political mobilization. Kuran (1988) argued that most people tend to have a barrier between their public pronouncements and their privately held beliefs. Each of us has a different level of “threshold point” of tolerance for the status quo. When these thresholds are breached, people come out of their cocoons and join a mass movement. The society may appear to be extremely stable until the day when people would come out and join the revolution. That is why, Kuran argued, that it is impossible to predict the demise of established political systems, such as the Soviet Union or the Shah’s Iran, or Mubarak’s Egypt. Since we often believe that the computer accords us a greater anonymity, it is possible that in the present condition of technological advancement the threshold level has been reduced considerably. With so many being willing to express their deeply held ambivalence on-line, more and more are embolden to join them, thus forming large political movements almost out of thin air.
The Problem of Traditional Research Methodology
In view of the altered world, we can clearly see that social science and the policy world must look for other methods of analyzing data to understand and predict social events. Given the lag in collecting and disseminating traditional data, the traditional research method is akin to astronomers looking at a distant star; the light rays that hit the telescope are light years in the past and say nothing about its current condition. Hence, we must add data mining to our arsenal of social science methodology.
The word “data mining” usually a dirty word in the social science research, where it is seen as a way of fishing for answers by randomly examining a lot of information without any theoretical background. However, in the field of computer-assisted research, the term, implying collecting information through monitoring the Internet, is finding new respectability. As a result, the monitoring of the web sites, the Twitter and the like is open up new methods of understanding social interactions.
The monitoring of communications on the Internet creates a deluge of data. In the past, where the researchers used to worry about the sample size, the methods of data mining yield millions of observations. Therefore, the first challenge for the researchers is to classify these in a meaningful way so that we can make some sense of our collective mood, positions, and opinions. For instance, by monitoring the Twitter traffic, a group of researchers claimed to have observed the world’s mood swings during the week, the year, and in response to momentous world events (Miller, 2011). Although we cannot directly observe the mood or ask people about how they actually feel, we can develop “surrogates” that would hint at what might be feeling. Critics abound; information culled from the cyberspace may indeed be perfunctory, misleading, or worse. This, however, is similar to understanding economic achievement of nations by measuring illumination at night from satellite imagery. Or, for instance, the tonnage of tucking can provide an important clue to the future economic activities of a nation. These may not be perfect indexes, but when used judiciously they can be sufficiently good indicators of what we are attempting to measure.
Plan of the article
Since the current methodology is still at its infancy, we would like to begin our analyses by examining the differences in types of data that can be mined from monitoring the web sites, social media, and the Twitter. There are significant differences among the three in terms of how people reveal their inner thoughts and communicate with others. Furthermore, there are differences in how the data can be collected. Second, we will examine how the searches are sensitive to small changes in spelling or semantic contents. For instance, when we search by the terms Kuran or Quran we get different groups, with more secular web sites using the first and more Islamic ones using the latter. Similarly, if we search the name, al-Awlaki we will get one kind of communication, while searching by the honorific title Sheikh added to his name will yield a different set. These searches therefore, must be done with a deep understanding of the subject matter. The computer-based searches often rely on sentiment analyses. However, these attempts often produce confusing results. In the third section, we propose a different method of search based on theories of human motivation. The fourth section will examine the specific problems of computer-based searches. For instance, we will look at the problems of timing in data collection and the associated problems of geolocating the results. The fifth and concluding section will be concerned with the human rights aspects of this new methodology. We will look into the problem of individual privacy as well as efforts by authoritarian nations to shut off communication via social media.