Researching Wikipedia Holistically: A Tentative Approach

29 minute read

Published: October 11, 2008

This is a tentative article-length introduction to my thesis on Wikipedia. It is an attempt to analyze Wikipedia from an interdisciplinary perspective that tries to make problematic various assumptions, concepts, and relations that function quite well in the “real world” but are not well-suited to studying Wikipedia. I begin by talking about the nature of academic disciplines, then proceed to a detailed but sparse review of certain prior research on Wikipedia. By examining the problems in previous research within the context of disciplines, I establish a tentative methodology for a holistic study of Wikipedia.

Certain topics seem to lend themselves to some academic fields better than others. Or more accurately, many fields have been constituted around specific topics in a way that makes them appear as the “natural” academic toolkit for research. For example, political science just seems like the perfect discipline to study a presidential election, just as economics just seems like the perfect discipline to study a worldwide financial crisis. English is obviously not high up on the list for those two issues, but it does seem to be a better choice if English literature is to be studied.

However, it is not that certain fields “own” various topics of study, forbidding any other discipline entry. Economics will “allow” a worldwide financial crisis to be studied by political science (what are the political ramifications?), sociology (how are various social structures affected?), rhetoric (how do people describe and argue about the crisis?), history (how does this crisis compare to previous ones?), media studies (how does the media represent and influence the crisis?), psychology (how is the crisis affecting people’s psyche?), or more disciplines with their own unique perspective on the topic. The point is that while all of these disciplines have something meaningful to say about a worldwide financial crisis, only economics has the ability to refer to it in a naturalized state. The perspectives of the other disciplines are just that, perspectives, with no legitimacy outside of their particular disciplinary lens. When not studying their “own” topic, these disciplines are constituted within a plane of existence that keeps them on the periphery. A legal analogy is apt: it is not that each discipline owns certain topics, but that they have original jurisdiction over them.

This is not just the case with economics – any other well-established discipline holds original jurisdiction over certain topics, making analysis from outside disciplines always-already marginalized. If one wishes to study, say, the economics of media, there are either two options: first, completely ignore the existing academic literature on media from media studies and treat the topic as raw input for disciplinary analysis; or second, attempt to perform some sort of “transdisciplinary” analysis that holistically incorporates the theories, methodologies, practices, techniques, and beliefs of media studies with those of economics.

There are both inherent and disciplinary problems with any academic study of Wikipedia. The inherent issue is that Wikipedia does not lend itself to this system in which there is one discipline with original jurisdiction and an endless number of effectual disciplinary perspectives. Rather, Wikipedia’s constituent elements are distinct but interrelated elements or topics which each have their own “home” discipline. Starting at a very technical level, Wikipedia runs upon a specific type of software which organizes data in a peculiar manner. Much work has been done in Computer Science about the computational “ontology,” or the way in which data are organized and stored. Many in this field have also analyzed issues of collaboration, developing quantitative models of how contributions to the site emerge and develop.

Some of these authors have linked these models to various social or political theories, although these are largely speculative and based on correlations, not causations. This is because the discipline of Computer Science only has original jurisdiction over well-formed quantitative models of data or computation, not social and political theories. Because of this, any attempt to analyze the social or political aspects of Wikipedia within these disciplines is necessarily speculative and perspective, as the researcher is effectively turning to other disciplines (political science, sociology) in a tangential manner when making such conclusions. These articles generally make a well-formed Computer Science conclusion about data or software, and then use that conclusion as the premise for a much shorter secondary analysis of Wikipedia within another discipline that is not fully deployed.

The much celebrated “Creating, Destroying, and Restoring Value in Wikipedia” (Priedhorsky et al, 2007) is an exemplar of this type of study. The authors craft an “empirically grounded” (259) classification schema for value and perform a rigorous quantitative analysis regarding the variables that affect their concept of value. The body of their paper includes elements of this discipline: three research questions with proper methodology sections, well-defined formula for their variables and metrics, and eight charts and graphs that illustrate their quantitative analysis. The authors skillfully craft these elements into a scientifically-sound conclusion made in the fourth-to-last paragraph based on their work: “1/10th of 1% of editors contributed nearly half of the value” (267) to Wikipedia articles. However, this is only their penultimate conclusion, as they use it to make a different kind of argument in their final paragraphs. Their tone shifts away from the scientific as they argue that:

because a very small proportion of Wikipedia editors account for most of its value, it is important to keep them happy, for example by ensuring that they gain appropriate visibility and status. However, turnover is inevitable in any online community. Wikipedia should also develop policies, tools, and user interfaces to bring in newcomers, teach them community norms, and help them become effective editors. (267-8).

For those in the social sciences and the humanities, there may be an impulse to criticize such a conclusion as pure speculation, given that their methodology and variables did not focus on norms, socialization, or other social processes. However, to do so would be to ignore the complex web of disciplinary relations at work in such an article. Their final paragraphs should be viewed as a well-intentioned attempt to make certain conclusions that their discipline could not make because it does not have original jurisdiction over the matter. If their task was solely to make a solid recommendation to the Wikipedian Community regarding norms and socialization, they would have inevitably failed because of the context within which their article was constructed and deployed. A methodology that incorporated a social science approach would have been necessary to authoritatively make such conclusions, given that sociology has original jurisdiction over the social. However, as they insinuate in their conclusion, this was not their main goal; instead, they see the main benefit of their article defining a measurable concept of value with respect to Wikipedia, “set[ting] the scientific study of Wikipedia … on a much firmer basis than ever before” (267). In other words, their contribution is to the computational study of data, not to the social study of computational systems. The speculative nature of their social conclusions are not properly constructed and defended, but they do not have to be as this is not the purpose of the article. In short, within this disciplinary matrix, is natural that we see paper as an excellent practice of computer science research and an awful practice of sociology research.

With this in mind, we turn our attention to the constituent elements of Wikipedia that do “lend themselves” (i.e. have been naturalized) into the more socially-focused disciplines. In contrast to the Computer Science approach which sees Wikipedia as software and data, this approach sees Wikipedia as community, society, and politics. The disciplines of sociology and political science have original jurisdiction over these topics, as they are the academic fields that are can make well-founded conclusions about them. An exemplary work that is informed by the kind of computer science methods illustrated in Priedhorsky’s article but expanded to a social science methodology is “Community, Consensus, Coercion, Control: CS*W or How Policy Mediates Mass Participation” by Kriplean, et al. This work is a collaboration between academics in Computer Science and Information Studies, as well as a researcher in Hewlett-Packard’s Information Dynamics Lab.

The article presents a methodology that is similar to the variable-based statistical methods used by Priedhorsky. After describing Wikipedia in detail (making sure to give a preliminary description of its policies, or codified norms), the authors present a schema for categorizing discussions in Wikipedia in relation to policy. They present well-defined variables and methods for such an analysis, and draw a figure that illustrates this method in practice. However, this is where Kriplean departs from Priedhorsky: the statistical methodology is used to identify several discussions which are then analyzed qualitatively, presented as a series of “vignettes” (172). In doing so, they squarely place this section of the article within the discipline of sociology, stating their intention to use a Grounded Theory approach to study “power dynamics at work within the ambiguity of the policy environment” (171).

They begin by tentatively assuming that what they call the “policy environment” facilitates the resolution of disputes through the invocation and negotiation of these codified norms. They then use the vignettes to illustrate many different disputes, not all of which use policy as a resolution mechanism. From this, they construct a typology of “power plays” (172), showing that although some disputes are negotiated through the invocation and interpretation of policy (e.g. the policy environment), others are based on non-policy factors. In particular, they note the way in which some disputes were resolved by reference to an editor’s reputation or the resolution of a similar debate on a related article, which are not based in policy. In the penultimate section “Design Implications,” they use these conclusions to argue that the software upon which Wikipedia runs should be changed to better facilitate these non-policy factors that influence dispute resolution. Specifically, they tentatively suggest a reputation system and a better way of tracking previous debates. Their ultimate conclusion is that the policy environment is a way in which important “articulation work” (175) is performed, and the software needs to be updated to better facilitate articulation that occurs but is not defined as part of the policy environment.

Kriplean’s methods and conclusions are radically different from Priedhorsky’s. Priedhorsky uses variables and statistics to construct a computational categorization schema that produces a scientific fact about users and data, which is transformed into a speculative set of social conclusions. In contrast, Kriplean uses variables and statistics to construct a computational categorization schema that is used bring forth qualitative data, which is analyzed in order to develop a social categorization schema, which in turn gives rise to sociological and design-oriented conclusions. From a sociological perspective, Kriplean’s article is far superior than Priedhorsky’s; however, it is far inferior from a computer science perspective. This is because Kriplean’s article only uses Computer Science to develop a computational ontological schema, not to produce computational facts like Priedhorsky. Despite the fact that they both draw from Computer Science, the conclusions made in the first part of Kriplean’s article about topics that Computer Science has original jurisdiction over are not as meaningful on their own as Priedhorsky’s conclusions. However, Kriplean’s Computer Science conclusions are fed into a Sociologically-influenced methodology, which allows them to make solid conclusions in a domain that Priedhorsky could not: norms and social facts. Finally, Kriplean’s article is situated within the field of Computer Supported Cooperative Work (CSCW), which allows them to make conclusions regarding design, a topic that CSCW has original jurisdiction over.

We have identified three sets of constituent elements of Wikipedia, each of which has been situated within the original jurisdiction of a field or discipline: first, the software and data, which has its home in Computer Science; second, norms and social facts, which belong to Sociology; and third, design issues, under the purview of Computer Supported Cooperative Work. Priedhorsky’s article was solely within the context of Computer Science and therefore failed to make solid conclusions about social facts, a topic that the discipline of Computer Science does not have original jurisdiction over. In contrast, Kriplean’s article was a collaboration between Computer Science, Sociology, and CSCW. This meant that Computer Science allowed conclusions about computational ontologies and schemas, Sociology allowed conclusions about social norms and facts, and CSCW allowed conclusions about design issues. Kriplean’s article could therefore say more about Wikipedia than Priedhorsky’s, because it deployed three different disciplines in order to make solid conclusions about three different kinds of topics.

However, these three constituent elements are only a small fraction of that which is internal to Wikipedia. Much has been written on the economic model of Wikipedia, asking questions about the organization and division of labor, for example. Elections are held on a regular basis for various high-level administrative positions, and political scientists have analyzed them using the same tools and techniques for analyzing political elections. The psychological aspect of Wikipedian contributors is often discussed in relation to the motivation and personality of editors. The discourse of the Wikipedian community has been analyzed from various theoretical perspectives, both inside and out of communication studies. Roy Rosenweig’s “Can History Be Open Source?” looked at Wikipedia as history, comparing the methods and issues in Wikipedia with those in History. Philosophers have examined the epistemological model present in Wikipedia, comparing it to various philosophical positions (Rodríguez 2007). From cultural studies, issues of multiculturalism in Wikipedia have been analyzed in detail (Pfeil et al, 2006). Wikipedia’s model of jurisprudence has also been analyzed; in fact, one the first academic articles about Wikipedia was a comparison of various free knowledge projects to the U.S. legal system (Benkler 2002).

Each of these articles is an example of a work in which a particular element of Wikipedia is analyzed using a disciplinary framework that has original jurisdiction over the element in question. However, this disjointedness creates problems when connections need to be made between topics. While Kriplean’s article does a good job at connecting three distinct topics, this does not provide an exhaustive look at Wikipedia. We can imagine an ideal study that would incorporate issues of software development, design, social norms, law, elections, organizational structures, social structures, interpersonal relations, multiculturalism, cultural practices, division of labor, discourse, history, subjectivity, epistemology, philosophical ontology, computational ontology, and back to software development and design as the cycle repeats itself. All of these issues influence each other, and in more complex ways than this simple chain would suggest.

Solutions to the problem of epistemology are not merely sociological, normative, psychological, legal, discursive, technological, and subjective; in addition, working out a solution involves working out a solution to these issues as well. How the community comes to terms with the question of what proper knowledge is simultaneously contains within it issues of what the proper social order ought to be for regulating and enforcing that epistemology, how to reconcile dissidents while keeping them motivated, what the proper way of phrasing such an epistemology ought to be, what technological space ought to be created in order to best facilitate such a discussion, who ought to be included and excluded from such discussions, and more.

When analyzing an entity or event in the so-called “real world” – for example, a presidential election – it is possible to perform a disciplinary analysis that re-appropriates the topic within the original jurisdiction of the chosen discipline. A rhetorician can choose to ignore (or bracket out) questions of economics, focusing only on what the rhetoric of politicians and pundits reveals, conceals, and so forth. Issues of economics may emerge in such a rhetorical analysis, but they are taken away from the disciplines that have original jurisdiction over them and recontextualized within the discipline of rhetoric. This is only possible because the rhetorician takes for granted the relationship between elections and economics: that during elections, politicians categorically talk about various political issues, and the economy is one of those issues which is discussed.

A psychologist can study a presidential election by measuring how people feel about various candidates or issues and for what reasons. This study may reveal, for example, that people who are more interested in politics are angrier than people who are not. The psychologist is only able to make such a conclusion by taking for granted the relationship between people and the election: that people have various levels of interest in the election, and we know what we mean when we say ‘people’, ‘interest’, and ‘election’. Such an assumption seems entirely unproblematic, and it most assuredly is within the context of American culture. However, in a culture like Wikipedia where the relationship between people and elections (as well as the concept of ‘people’ and ‘elections’) is more problematic, far more theoretical work must go into a similarly-structured research project.

This is because one cannot simply enter Wikipedia and import the categories of ‘people’ and ‘elections’ as they have been deployed within American culture. While these have been negotiated and solidified in one cultural context, they are still underdeveloped in the Wikipedian cultural context. For example, in American culture, it seems pedantic to ask what an election is and only ‘academic’ to ask who counts as a person – these notions are well-defined and entirely unproblematic within a certain cultural context. In Wikipedia however, what counts as an election is an essential question, given that the community explicitly claims on many different high-profile pages that they are not a democracy nor do they vote. However, certain events (Arbitration Committee Elections) are declared to be ‘elections’ and from an outsider’s perspective, look strangely similar to an election in which people vote on candidates. Other events with the a similar structure (Requests for Adminship) are explicitly declared to not be elections, even though there are what appear to an outsider as candidates who may or may not receive a certain position after things that look like votes are cast by people who look like voters, who in turn are regulated by criteria that look like voter eligibility rules. The question as to what an election is has not been as naturalized in Wikipedia as it has in the United States.

Similarly, the question as to what a person is has a similar level of ambiguity in Wikipedia with respect to American culture. In the United States, the definition of a person obviously differs, but each of these definitions are well-defined and function equally well for the researcher’s purpose. It does not matter if for the purposes of the psychological study, a ‘person’ is defined a human being, a resident (legal or illegal), a legal resident, a U.S. citizen, an adult, an (in)eligible voter, a (un)registered voter, or a (un)likely voter. Each of these definitions are distinguishable and unproblematic in the psychologist’s research, and can even be used to make conclusions (i.e., likely voters are more angry than unlikely voters, who are more angry that unregistered voters). In Wikipedia, this concept is far more problematic, as no unified conception of, say, an (in)active editor exists. If a psychologist performed a study on elections and the emotions of people in Wikipedia, there would be no taken for granted categorization schema which defines the conditions under which a person is. Is someone a person if they have ‘voted’ anonymously, that is, without an account? What if they have registered but their ‘vote’ is the first contribution they have made? Some in the Wikipedian community claim that these people are not to be treated as people, but as ‘sockpuppets’ – multiple hidden accounts that are controlled by a single human being and used to give one person more than one voice. Others disagree, and argue for giving anonymous and newly-registered contributors just as much weight as well-established registered users. This stands in contrast to the American political system, which has a solid conception of a ‘registered voter,’ even though the specific standards of voter registration vary from county to county.

The difference between research in the so-called real world and in Wikipedia is that the real world is held together by taken for granted categories and concepts that are not as solidified in Wikipedia. However, this should not be taken as an indication that Wikipedia needs explanation in a way that real world institutions do not; in other words, our task is not to solidify these categories and concepts so that we can analyze Wikipedia using the same kinds of techniques and methods developed for real world entities and events. What we must instead realize is that Wikipedia provides a unique site for analysis in which assumptions and relations traditionally taken for granted in the real world exist in a problematized state. However, it must be recognized that this is not due to any essential property of Wikipedia, meaning that we must reject the assumption that Wikipedia has some special characteristic which makes it at its essence a space that problematizes these traditionally taken for granted relationships. This means that we must not treat Wikipedia as a space in which all conceptual relationships are always-already problematized; rather, it is simply a space in which some traditionally reified relationships have been made problematic.

On one aisle, we have disciplinary analyses which import theories and categories which are well-functioning (i.e. taken for granted) in the real world but are problematized in Wikipedia. This form of research tends to gloss over those inconsistencies, resulting in an analysis of Wikipedia that is unproblematic for the researcher but ignorant of how such research contradicts local understanding present in the project. On the other aisle, there is the unreflective research that attempts to give a localized account of how the project operates. This form of research substitutes one misstep for another, choosing to reify the taken for granted assumptions and relations developed in the Wikipedia community as an alternative to reifying those taken for granted in the real world. The solution is not to try and find some sort of third way or middle ground, but instead to alternate between these perspectives. It is a simultaneous analysis of how Wikipedians see the real world and how the real world sees Wikipedia.

Instead of trying to determine through some sort of technique or formula of bringing forth the hidden assumptions and relations each side takes for granted, this work will operate at the intersection between an academic researcher’s account and what is known as a member’s account. We will posit what Bruno Latour calls a “symmetry” between these two accounts: we will give neither researchers nor Wikipedians full authority to speak about what Wikipedia is, nor will we assume that either side’s assumptions are valid at the expense of the other’s. This means that we cannot use some well-functioning theory or ideology in the real world (e.g. Communism) to explain a seemingly-congruent observation of Wikipedia. We will not assume that Wikipedia can be explained by already-existing theories or concepts, but we will also not assume that new theories or concepts ought to be constructed in order to explain Wikipedia.

In fact, the task is not to explain Wikipedia in any sense. That would be as futile as attempting to explain American society, with all its contradictory (sub)cultures, norms, institutions, categories, mythologies, and theories that are held together by a set of taken for granted assumptions. The task is also not to reveal or problematize these taken for granted assumptions, as if they were evils to be purged. We instead aim to demonstrate how Wikipedia, with all its (sub)cultures, norms, ideologies, discourses, institutions, mythologies, economics, philosophies, categories, and theories, is held together by a different but just as important set of taken for granted assumptions and relations.

This approach stands in opposition to the structuralist position, which posits universal features, elements, or tendencies present in all societies and then attempts to describe a particular society in terms of these structures. Examples of these universal structures include the previously-mentioned (sub)cultures, norms, ideologies, discourses, institutions, mythologies, economics, philosophies, categories, and theories. Our task is not to articulate the norms, ideologies, discourses, or other structural elements of Wikipedia. We are not to show what mythologies or philosophies compel Wikipedians to action. Concepts like the economy or the social structure – which make sense in our contemporary society – are not to be unproblematically imported into Wikipedia. The folly of this is best shown by an essay on Wikipedia that responds to the charges of Communism that are frequently leveled against the project. Ten internally coherent yet collectively contradictory “points of view” on the subject of economics are made, all of which defend Wikipedia while supporting different ideological worldviews. These include: “Wikipedia does not endorse any value system,” “Wikipedia is like Communism, and that’s a good thing,” “Wikipedia is not like Communism because it is voluntary,” “Wikipedia fuels the free market … [and] engages in competition,” “Wikipedia is like a charity,” “Wikipedia is like Anarchism,” “Wikipedia is a hobby,” and “Who cares, as long as it works?”

The point of the list is to show that concepts like economic ideologies make sense within a certain context, but quickly turn unintelligible within they are used to describe something like Wikipedia. Concepts like Communism require a coherent understanding of other concepts, like an economy and a state, each of which in turn require a coherent understanding of other concepts, like property, value, labor, and exchange for an economy and sovereignty, authority, rule, and power for a state. We could go one level further, but there is no need given that all eight of these dependent concepts are well-defined with respect to real-world nation-states but problematic within the context of Wikipedia. While a Communist, a liberal democrat, a libertarian capitalist, and an anarchist would most likely agree that Cuba is more Communist than the United States, these four individuals could each see Wikipedia as furthering their own political-economic ideologies. This is due to the fact that these concepts and relations have a taken for granted status in the real world, but an incoherence in the Wikipedian context. This allows concepts like “the state” in Wikipedia to be described as authoritarian, liberal-democratic, minimalist, or non-existent. However, we should be wary of claiming that this incoherence is fundamental or due to some essential nature of Wikipedia. All that has been observed is an incoherence, which could be explained due to various factors, including the technical/material conditions of Wikipedia’s existence, its various normative commitments, the relative youth of the community, or any other number of factors. It is not our task to say which.

Instead, we will take this incoherence as our unit of analysis in our study of Wikipedia. We will be examining the way in which these taken for granted, well-settled concepts are imported into Wikipedia and made problematic. In order to facilitate such a task, we will play the role of the anthropological stranger whose mission is to interrogate the conditions of possibility for the seemingly-universal concepts like discourse, governance, power, subjectivity, and norms. However, unlike the structural anthropologist, we do not expect coherent articulations of these concepts as we search for their explanations. Instead, we embrace the confusion as a way of making such concepts problematic. We anticipate that such an exploration will show us that in referring to what we call discourse or norms, for example, we are making certain assumptions that hold within our society but fall apart within Wikipedia.

In this way, we are able to sidestep the entire disciplinary matrix that naturalizes certain topics within certain fields. Thus we avoid the original jurisdiction issue that initially required us to split up Wikipedia into various constituent elements, each of which must be analyzed with distinct methods, techniques, and theories. Instead of positing a list of elements and systematically analyzing them from their own “home” discipline’s perspective (motivations from psychology, governance from political science, norms from sociology, epistemology from philosophy, and so forth), we take such a framework to be problematic. In doing so, we can reveal the contradictions that emerge when a set of internally-coherent practices, theories, methods, techniques, and beliefs are deployed in a foreign context. The questions that are to be asked therefore involve an attempt to overlay the project within certain frames. The point is not to make everything fit, but to see what is remains on the periphery.

Share on

Twitter Facebook Google+ LinkedIn

R. Stuart Geiger

Researching Wikipedia Holistically: A Tentative Approach

Share on

You May Also Enjoy

Researchers receive grant to study the invisible work of maintaining open-source software

Best Practices Team Challenges

So you want to start a data science institute? Achieving sustainability

Research Software Engineers and Data Scientists: More in Common