ACSPRI Conferences, ACSPRI Social Science Methodology Conference 2014

Font Size:  Small  Medium  Large

Adaptive sampling from large-scale government hyperlink networks

Robert Ackland, Paul Henman, Timothy Graham

Building: Holme Building
Room: Holme Room
Date: 2014-12-10 01:30 PM – 03:00 PM
Last modified: 2014-10-31


Social scientists are increasingly using digital trace data (e.g. WWW hyperlink networks, Facebook, Twitter) to study the behaviour of individuals, corporations, groups and government. For example, it is relatively straightforward to use a web crawler to automatically construct a large-scale government hyperlink network (comprising government websites and the websites that connect via hyperlinks to government). But in order to use such a dataset for meaningful insights into the institutional structure of government or the visibility of government in different policy domains, for example, one cannot rely on automated methods alone. In particular, we need to know more about an organisation behind a given website in a government hyperlink network, for example, the policy domain and the type of organisation (corporation, NGO, government?). While sampling and statistical inference are cornerstones of empirical social science research, sampling is not currently widely used in web research (in the social sciences or elsewhere). This paper explores the use of sampling in the context of understanding the structure of government hyperlink networks, using large-scale crawls of the government web in Australia and the UK. We apply an adaptive sampling approach, which is a technique for sampling hidden or hard-to-reach populations, where the probability of selection of a unit in to a given wave of the sample is related to information collected on units drawn in earlier waves (snowball sampling in social networks is an example of adaptive sampling).