Sunday, 23 October 2011

Why read from primary sources? Or: why reading blog posts is harder, not easier than reading papers

I've been meaning to write this post for a long, long time. Now that I have an enormous pile of marking to get through in double-quick time, I have the perfect excuse for a bit of structured procrastination.

What is a primary source?

primary source, is an original piece of writing, describing some research and written by the person or team who performed that research. A secondary source, is a description or discussion of a piece of research by someone who has read about the research, but did not carry it out themselves. So, if an academic performs an experiment and writes it up as a journal paper, that paper is a primary source. If another researcher then quotes the paper and cites it in one of their papers, then that is a secondary source. Newspaper articles, magazine articles, wikipedia, and most websites and blog pages are secondary sources. When it comes to scientific research, only writing published in peer-reviewed conferences, journals, books and magazines constitutes a primary source.

What is peer-review and why does it matter?

Even if a paper is a primary source describing some research, that doesn't guarantee that the research is rigorous, reliable and high-quality. To ensure that all academic writing meets basic standards of quality assurance, scientists use peer-review. This means that a number of professional scientists (usually two or more) will read through the work carefully, and critique it before it is published. If the work is of very poor quality, or very badly written, it will be rejected and the authors will have to re-write their papers and try to publish them elsewhere. If the work is of a high enough standard to publish, the authors will be given a list of improvements they must make before the paper goes to print. This way, we ensure that inaccurate, incorrect, or incomprehensible work doesn't get published in high quality conferences and journals.

Why read primary sources?

Students often complain about making the leap from reading textbook-style prose to formal, academic research literature. Part of the problem is that the style of writing is different, and takes some getting used to. More deeply, though, students today have likely grown up with the web and with reading informal, secondary sources, making the change is hard work, and nerve-wrecking for some. Why waste hours wading through pages and pages of long-winded, complicated, weirdly-written prose, when you can read a quick, accessible summary on Wikipedia? Well, of course Wikipedia is a good place to start to get a basic overview of an area and help your understanding of the primary sources you are reading. However, it is absolutely essential to read the primary sources themselves. Why? 

Reason 1: secondary sources editorialise

A secondary source will describe some parts of the primary source, but not others. The secondary source will take a particular point of view (i.e. the author will voice their own opinion) and will pick the parts of the primary source that are useful for that discussion. This doesn't necessarily mean that the secondary source is particularly biased (although it might be), it's more that secondary sources are selective in what they discuss. For example, if a paper on Web2.0 discusses the implementation, performance and usability of Web2.0 sites, a secondary source on the subject of usability is likely to leave out any mention of implementation and performance. So, by reading secondary sources you miss out on a lot of the detail of the original work and much of that detail may be very important to you and your work.

It is probably worth saying that there is an important exception to this: survey papers. A good survey paper should be like an extended literature review that discusses, in some detail, the literature available in a broad area of Comptuer Science. These survey papers are a good place to start when writing your own literature review. You can usually find survey papers in well established journals, or specialist survey journals such as ACM Computing Surveys.

Reason 2: secondary sources are sometimes wrong

Every academic field has a number of ideas which are passed on from one generation to the next with little reference back to the original research that generated those ideas. Be somewhat skeptical about this, most of the time there are good reasons to feel assured that this knowledge is sound, especially in fields where mathematical proof is the main way of advancing the field. However, in more subjective or experimental fields (such as Software Engineering or Usability) results can sometimes be misunderstood or misinterpreted over the years.

An example of this is Winston Royce's "Waterfall Method" which (as you probably already know) is a method for organising and planning large programming projects. The central idea in Royce (1970) is very simple and easy to understand: you split the work into a number of different "phases" (requirements gathering, analysis, design, coding, testing, maintenance) and your team performs each phase in turn. There's even a nice image to go with the idea, just to make it nice and easy to understand:


Image source: Wikipedia

For many people, this is where their understanding of the waterfall model stops. But in Royce's original paper there is a long discussion of the drawbacks of organising a project in this manner In fact Royce says that it is "risky and invites failure" (pp. 329). Moving on, the majority of Royce's paper is a list of changes to the sequential model which make it more workable. Some of these are of particular interest, for example "plan testing" is a step that Royce advocates should go with program design. In modern, more "agile" development methods we would advocate writing unit tests around this time, so Royce is presenting a very modern approach. The last modification Royce makes is to "involve the customer" at several points in the process. Again, a much more modern approach that many authors would say goes with agile or "eXtreme" development methods.

The picture Royce paints is not a simple sequential model at all, it's much more complicated than that. Tarmo Toikkanen has written an interesting blog post on this subject. He speculates that the reason people advocate for the basic waterfall method is that the diagram and analogy make it very easy to understand, so people don't delve any deeper into the details. In fact, Toikkanen points out that NATO even have a military standard (DOD-STD-2167) based on Royce's work. [Aside: If you wanted to test Toikkanen's idea that it's the diagram in Royce's paper that leads to the misunderstanding, what experiment would you devise to test that idea?]

More parochially, we often see University students writing something like "in my project I will use the Waterfall Method" sometimes even with a citation. DON'T DO THIS! Read Royce (1970) in full, understand what he's really arguing for, then use a more modern method, or at least use Royce's iterative method found at the end of the paper.

Reason 3: different primary sources may disagree

Research is all about creating and discovering new ideas. Very often primary sources disagree on how best to do that, or they have competing ideas and only through years of research and discussion does a consensus evolve. There are examples of this throughout the history of science. Whether it's the flat Earth debate, Big Bang vs. the Steady State theory, structured programming vs. object oriented programming, through debate, reason, mathematical arguments, prototype systems, models, simulation and all sorts of other techniques, the history of science is full of arguments and competing ideas.

When you read a secondary source, very often whatever "debate" has taken place is already in the past and the author of the secondary source will simply describe the consensus that has since been reached. For example, there were many good reasons for cosmologists to believe in the steady state theory before evidence for the Big Bang became overwhelming. Only by going back to this litereature can we see how the debate unfolded and why the evidence that supported the Big Bang (to do with background microwave radiation in the cosmos which was discovered in the 1960s) was so convincing.

In Computer Science there are also many of these debates. For example, most proframming languages do not have a "goto" statement. In fact, Java has a keyword called "goto", but it is not used. In the late 1960s and 70s there was a heated debate about whether "goto" was a safe and useful construct and you can read through that debate in Dijkstra (1968), Knuth (1974) and plenty of other sources. Without going back to these papers, which were written well before the debate was settled, can you fully understand the arguments that, eventually, banished the "goto" statement from most modern languages.

Conclusions: reading blog posts is harder than reading papers

So, why did I say in the title of this post that reading blog posts is "harder" than reading papers? Actually reading blog posts may be easier, but in terms of getting a good grade in your project you are unlikely to produce a high quality literature review based on blog posts. Blogs will tend to be selective and biased in their nature. This isn't a criticism of blogs, far from it, blogs are a great place for lively debates. They aren't such a great place, necessarily, to descibe careful, peer-reviewed research in great detail -- that's best left for conferences and journals.


Edsger W. Dijkstra (1968) Letters to the editor: Goto statement considered harmful. Communications of the ACM 11, 3 (March 1968), 147-148. DOI=10.1145/362929.362947

Donald E. Knuth (1974). Structured Programming with goto Statements. ACM Computing Surveys 6, 4 (December 1974), 261-301. DOI=10.1145/356635.356640

Royce, W. Winston (1970), Managing the development of large software systems: concepts and techniques In proceedings of IEEE WESTCON, Los Angeles , 1--9 .

Posted via email from Pass your university project

No comments:

Post a Comment