Sunday, 30 October 2011

How to evaluate your project

Image by mpeterke on Flickr

This post is all about how to evaluate your project. This is something you will do ideally about half way through your work, but more realistically towards the end of your work, just before you write up. However, it's imporant that you plan your evaluation early on, otherwise you run the risk of getting almost to the end of your work and finding that your evaluation process isn't going to give you any good results. So, the main thing you need to understand when you are planning is how evaluation works in the sciences and how to apply that understanding to your own projects.


Sometimes we see student dissertations which make claims like "this experiment has proved that ...". Almost always the student has used the word "proof" incorrectly, and will lose marks because they have given the impression that they have not understood the contribution that their work has made to the field they are working in. Very often students in this situation simply haven't understood how scientists use the word "proof".

Proof, in a scientific context, is a mathematical argument that is used to convince other mathematicians or scientists that a theorem (or a mathematical idea) is true. Proofs must never involve evidence or experiments, only arguments. There's an example proof of a simple theorem at the end of this post. 

Once mathematicians are convinced that a proof is correct (and sometimes that is difficult in itself, if the proof is several hundred pages long) then it is irrifutable. This is very different to the sort of science that is advanced by experiments, where another scientist can find new data or eveidence that shows that an old idea was wrong. 

So, we generally say that a theorem can be proved correct whereas a hypothesis (or guess!) can only be tested via experiments. A hypothesis might turn out to be wrong if experimental data cannot be found to support the hypothesis, or contradictary evidence is found. If a lot of evidence is found to support a hypothesis we might call it a theory. Even so, a theory cannot be proved correct in all cases. For example, if you came up with a theory that said all atoms have a particular shape, you might invent a special microscope to look at atoms and find out if they have your shape. This would provide some evidence to support your theory. You couldn't, however, test every single atom in the Universe, so your hypothesis might well become a theory, but it can never be "proved" correct. 

[Aside: there's a long literature in this sort of philosophy of science. If you are really interested, read Karl Popper on Falsification, AJ Ayer on Verification and Paul Feyerabend on scientific revolutions and Imre Lakatos on Proof.]

Scientific method

Scientific method is the way that scientists decide whether a particular hypothesis (or guess) is likely to be a good model for the way the world works. If most scientists accept that the hypothesis is likely to be true, then we call it a theory. Of course, even theories have limitations, and it may be that as more experiments are carried out we find that a different theory fits the evidence better, or that the theory only works in certain circumstances. This is exactly what happened in physics to Newton's laws of motion. It turns out that Newton's laws describe the world pretty well in most cases, they can certainly tell you when your train is likely to arrive at its destination. For other circumstances, for example when you are travelling very fast, close to the speed of light, or for very small particles like quarks, other theories (like Einstein's theories or quantum mechanics) better fit the data we have gathered. Of course, much of this work in areas like physics is driven by what we can measure and observe. Better telescopes mean better theories of cosmology, and so on.

In computer science we also have hypotheses that we can test. For example "functional programming languages can run just as efficiently as imperative languages", "online learning increases student engagement", "objects and inheritance improve code reuse in software companies", and so on.

To be a true hypothesis, and not just the opinion of the author, a statement must be refutable, that is, it must be possible for experiments to determine that the hypothesis is incorrect. The opposite statement to a hypothesis is called an alternate hypothesis. Examples for the hypotheses listed above would be "functional languages are necessarily slower (or faster!) than imperative ones", "online learning has no effect on student engagement" and "objects and inheritance have no effect on code reuse in software companies".

So, to evaluate your own research questions, you need to do the following:

  1. Devise a hypothesis.
  2. Form your alternative hypothesis.
  3. Plan an experiment that tests whether the hypothesis or the alternative hypothesis is true. 
  4. Conduct your experiment.
  5. Analyse the results of your experiments.
  6. If the results are conclusive, STOP. Else, re-run the experiments, or devise a better experiment and repeat.

In a student project, you may not have time to repeat your experiments, especially if they involve people, but you should design your evaluation in such a way that this would be possible, were you to continue the work.

About experiments

A good experiment should test one variable and one variable only. So, if your hypothesis is "neural network algorithms run faster in C than C++" then you will probably want to implement some neural network algorithms in both languages. You should make sure that the programs are as similar as possible, except for the language you are using. If you implement slightly different algorithms, it may be the algorithm and not the language which is causing any change in performance you observe. In this case, the programming language is called the independent variable and the algorithms are called the controlled variable and the speed is the dependent variable which is being measured. 

Interpreting your results: correlation does not imply causation

Correlation by xkcd

When you perform an experiment, you are hoping that the outcome will lend some evidence to either your hypothesis or your alternate hypothesis. Going back to the example above, they hypothesis "neural network algorithms run faster in C than C++" has an alternate hypothesis "neural network algorithms run no faster in C than in C++". If we run an experiment to test this, and assume it's a fair experiment, and the results are that all our algorithms run faster in C, what has this told us? A naive answer would be that the experiments have confirmed the hypothesis that C is the faster language for this sort of algorithm. A more subtle answer would be that efficient neural networks are correlated with neural networks written in C. That means that when the algorithm is written in C it's likely to run quickly, which is what the experiment reported. This does not necessarily mean that the algorithms implemented in C ran quickly because they were written in C, it may be that there was some other factor involved that the experiment didn't effectively control.

In experimental work it is very important to understand this subtle distinction, otherwise you can easily fool yourself into believing that your experiments have discovered something far more conclusive than is actually possible. 

To give you a better idea of how this distinction between correlation and causation works, below are some examples of incorrect conclusions drawn from perfectly reasonable correlations. See if you can work out why the conclusions are unreasonable:

  • Children with bigger feet have higher reading ages. Therefore, people with bigger feet are more intelligent.
  • Teenagers who text late at night have poor motivation in class (see news reports here). Therefore, using mobile phones leads to poor performance in class (see also a more skeptical analysis here).
  • In the last 150 years there has been a dramatic increase in the number of people who report being abducted by aliens. There has also been a trend towards global warming. Therefore, alien abductions cause global warming.

In your own work, just be honest and straight forward about your results. If they aren't conclusive then say so and demonstrate your understanding by describing what future work could be done to gather more data. 

Some basic dos and don'ts

This is some more specific advice, based on good and bad practice we have seen from students over the years:

  • DO be clear and honest about what results your evaluation has obtained.
  • DON'T claim to have "prooven" anything if you haven't written a formal, mathematical proof.
  • DO use an appropriate experiment for your hypothesis. For example, if your work is about evaluating the performance or security of a technique, there is no need to involve real users in your evaluation. If your hypothesis is about usability you really must involve real users.
  • DON'T use questionnaires unless you can guarentee to get a large sample size of answers (always well above thirty) and you understand the statistics needed to analyse the results. If you are in any doubt at all about this then seek the advice of a qualified statistician before you start your project. If you can't do that, think about using an alternative evaluation method such as semi-structured interviews.


Example proof: The square root of 2 cannot be written as a fraction of whole numbers


The square root of 2 cannot be written as a fraction of two whole numbers. (This is sometimes called the Theorem of Theaetetus)

Proof (by contradiction)

Imagine we could write the square root of 2 as a fraction of two whole numbers, say x/y where x and y are integers.

Let's say that x and y don't have any factors in common, so x/y is already written in its simplest form and no numbers can be "cancelled out" of the fraction.

So, we can also say that (x/y)*(x/y)=2

Therefore (x*x)/(y*y)=2

Therefore (x*x)=2*(y*y) 

So we now know that x*x is even, since x is 2 times another number.

Since x*x is even, we also know that x is even (by the "Lemma" or little theorem that squares of odd numbers are never even).

Therefore, there must be a number, which we'll call z such that x=2*z

So, (2*z)*(2*z)=2*(y*y)

Or, more simply, 2*z*z=y*y

y must also be even, by the same argument that we used to say that x is even.

If y is also even, there must be some number, which we'll call w such that y=2*w

But if x/y=2z/2w then the fraction x/y was not in its simplest form like we assumed above.

This contradicts our initial assumptions, which must have been wrong.

So, the square root of 2 cannot be written as a fraction of whole numbers.

Posted via email from Pass your university project

Wednesday, 26 October 2011

How to read research papers

Photo by ailatan on Flickr

These last couple of weeks I've been taking my groups of final year project students through the process of starting their literature reviews. There is a separate post on literature reviews on this site here and a post on why you should read academic literature in Computer Science here. This post isn't to do with those topics, this post is about how to read research papers. We often find that if students haven't done much of this sort of reading before their get to their final year getting started can be a bit of a shock. So, this post is designed to help you get started with academic literature and, just as importantly, to help you get the most out of the papers you read in the short space of time you have available (and it is a short space of time, believe me).

Remember the structure of a paper is just like the structure of your thesis

Other posts on this site have discussed the overall structure of your thesis, but in outline this is the sort of structure you should be expecting to produce:

  • Introduction should introduce the reader to your research question and the broad context of the research.
  • Literature review should describe the work that other people have carried out to answer your (or similar) research questions.
  • Method should describe what you did to answer your research question (or to support your thesis, if you think of it that way), and how you went about it. 
  • Results should evaluate what you have done, and say what answer (to your research question) you have arrived at. 
  • Conclusions should summarise what you have done and how you answered the research question.  

Academic writing (in the sciences) of all sorts follows something like this structure, including all of the papers that you will be reading for your project. There are a couple of exceptions to this rule. One is theoretical papers which sometimes put their "related work" (or literature review) somewhere towards the end of the paper rather than after the introduction. The second exception is survey papers. Surveys are extended literature reviews and, as such, are a good place to start in your own literature reviews. ACM Computing Surveys is a journal that publishes survey papers or you can sometimes find them in reputable journals.

Briefly review each paper for relevance

You don't have time to read everything, so it's important to make sure that what you do read is really relevant to your thesis. So, to check whether a paper is likely to be relevant to you first read the Abstract. This should give you a brief summary of the whole paper. So, at the very least the abstract should give you a good idea of what research question the authors were trying to answer. Next, read the Conclusions. This is also likely to be a summary and may well give you a better idea of what results the authors obtained and what work they did not finish but left for "future work". If that doesn't give you a good enough idea of the relevance of the paper to your own work, try reading the last part of the Introduction. This is usually where the authors summarise what is written in each of the following sections of the paper, so that should give you a much more detailed view of what the rest of the paper contains. 

If, after all of that, you think the paper is irrelevant to you, then discard it and move on to something else. Otherwise, you are ready to move on with your reading...

Focus your reading on specific questions

If you just go ahead and read a paper from start to finish the chances are that you won't get very much out of your efforts. You are likely to ramble around the paper, not taking very detailed notes and at the end of your efforts you may not have learned much. A much better way to go about your reading is to keep in mind a number of clear, focussed questions and read the paper with the intention of writing down answers to these questions in your notes. That way you will finish with a clear set of notes that you can be confident will be useful to you when you start writing up.

I would recommend you use this set of questions to guide your reading:

  1. What research question were the authors asking?
  2. Why did the authors believe that their research question was important?
  3. How did the authors go about answering their research question?
  4. What results did the authors obtain or, what did the authors learn from answering their research question?

You can find a template for some notes here.

Making use of your notes

When you have finished reading you should have a stack of notes on all the papers you have read. This should be a much more concise way to start writing up than having a much bigger stack of papers and (most likely) not much memory of what was in them! So, the next thing to do is decide on the structure of your literature review chapter.

The first paragraph of your chapter should introduce the rest of the chapter. This is a good place to remind the reader of your research question and explain how the current chapter relates to it. 

The last paragraph of your chapter should summarise what you have reviewed. This is a good chance to help the reader naviagte around your thesis. Briefly review what you have said in the chapter and refer the reader to the next chapter, explaining how the next chapter follows on from the current one.

The middle part of the chapter is more difficult and, since your writing will depend on your particular research question and the literature you have read, there isn't much generic advice to be given here. However, you can start by reading through your notes and looking for common themes. Think about how best to present the ideas to a reader who has not read the same literature. Do you want to take the reader chronologically through the literature, from the earliest point to the present day? Would it be easier to understand if you split the reading into particular topics that are related? When you have what you think is a good structure, write some section headings into your thesis and think about which papers go in which sections (of course, some papers may well go into several sections). Write the citations into each section using something like EndNote, Mendeley or BibTeX to format them for you. Play around with the structure until you are convinced that it will make sense then write in the details of each section. Make sure you check out this post to help you with your writing.

Posted via email from Pass your university project

Sunday, 23 October 2011

Why read from primary sources? Or: why reading blog posts is harder, not easier than reading papers

I've been meaning to write this post for a long, long time. Now that I have an enormous pile of marking to get through in double-quick time, I have the perfect excuse for a bit of structured procrastination.

What is a primary source?

primary source, is an original piece of writing, describing some research and written by the person or team who performed that research. A secondary source, is a description or discussion of a piece of research by someone who has read about the research, but did not carry it out themselves. So, if an academic performs an experiment and writes it up as a journal paper, that paper is a primary source. If another researcher then quotes the paper and cites it in one of their papers, then that is a secondary source. Newspaper articles, magazine articles, wikipedia, and most websites and blog pages are secondary sources. When it comes to scientific research, only writing published in peer-reviewed conferences, journals, books and magazines constitutes a primary source.

What is peer-review and why does it matter?

Even if a paper is a primary source describing some research, that doesn't guarantee that the research is rigorous, reliable and high-quality. To ensure that all academic writing meets basic standards of quality assurance, scientists use peer-review. This means that a number of professional scientists (usually two or more) will read through the work carefully, and critique it before it is published. If the work is of very poor quality, or very badly written, it will be rejected and the authors will have to re-write their papers and try to publish them elsewhere. If the work is of a high enough standard to publish, the authors will be given a list of improvements they must make before the paper goes to print. This way, we ensure that inaccurate, incorrect, or incomprehensible work doesn't get published in high quality conferences and journals.

Why read primary sources?

Students often complain about making the leap from reading textbook-style prose to formal, academic research literature. Part of the problem is that the style of writing is different, and takes some getting used to. More deeply, though, students today have likely grown up with the web and with reading informal, secondary sources, making the change is hard work, and nerve-wrecking for some. Why waste hours wading through pages and pages of long-winded, complicated, weirdly-written prose, when you can read a quick, accessible summary on Wikipedia? Well, of course Wikipedia is a good place to start to get a basic overview of an area and help your understanding of the primary sources you are reading. However, it is absolutely essential to read the primary sources themselves. Why? 

Reason 1: secondary sources editorialise

A secondary source will describe some parts of the primary source, but not others. The secondary source will take a particular point of view (i.e. the author will voice their own opinion) and will pick the parts of the primary source that are useful for that discussion. This doesn't necessarily mean that the secondary source is particularly biased (although it might be), it's more that secondary sources are selective in what they discuss. For example, if a paper on Web2.0 discusses the implementation, performance and usability of Web2.0 sites, a secondary source on the subject of usability is likely to leave out any mention of implementation and performance. So, by reading secondary sources you miss out on a lot of the detail of the original work and much of that detail may be very important to you and your work.

It is probably worth saying that there is an important exception to this: survey papers. A good survey paper should be like an extended literature review that discusses, in some detail, the literature available in a broad area of Comptuer Science. These survey papers are a good place to start when writing your own literature review. You can usually find survey papers in well established journals, or specialist survey journals such as ACM Computing Surveys.

Reason 2: secondary sources are sometimes wrong

Every academic field has a number of ideas which are passed on from one generation to the next with little reference back to the original research that generated those ideas. Be somewhat skeptical about this, most of the time there are good reasons to feel assured that this knowledge is sound, especially in fields where mathematical proof is the main way of advancing the field. However, in more subjective or experimental fields (such as Software Engineering or Usability) results can sometimes be misunderstood or misinterpreted over the years.

An example of this is Winston Royce's "Waterfall Method" which (as you probably already know) is a method for organising and planning large programming projects. The central idea in Royce (1970) is very simple and easy to understand: you split the work into a number of different "phases" (requirements gathering, analysis, design, coding, testing, maintenance) and your team performs each phase in turn. There's even a nice image to go with the idea, just to make it nice and easy to understand:


Image source: Wikipedia

For many people, this is where their understanding of the waterfall model stops. But in Royce's original paper there is a long discussion of the drawbacks of organising a project in this manner In fact Royce says that it is "risky and invites failure" (pp. 329). Moving on, the majority of Royce's paper is a list of changes to the sequential model which make it more workable. Some of these are of particular interest, for example "plan testing" is a step that Royce advocates should go with program design. In modern, more "agile" development methods we would advocate writing unit tests around this time, so Royce is presenting a very modern approach. The last modification Royce makes is to "involve the customer" at several points in the process. Again, a much more modern approach that many authors would say goes with agile or "eXtreme" development methods.

The picture Royce paints is not a simple sequential model at all, it's much more complicated than that. Tarmo Toikkanen has written an interesting blog post on this subject. He speculates that the reason people advocate for the basic waterfall method is that the diagram and analogy make it very easy to understand, so people don't delve any deeper into the details. In fact, Toikkanen points out that NATO even have a military standard (DOD-STD-2167) based on Royce's work. [Aside: If you wanted to test Toikkanen's idea that it's the diagram in Royce's paper that leads to the misunderstanding, what experiment would you devise to test that idea?]

More parochially, we often see University students writing something like "in my project I will use the Waterfall Method" sometimes even with a citation. DON'T DO THIS! Read Royce (1970) in full, understand what he's really arguing for, then use a more modern method, or at least use Royce's iterative method found at the end of the paper.

Reason 3: different primary sources may disagree

Research is all about creating and discovering new ideas. Very often primary sources disagree on how best to do that, or they have competing ideas and only through years of research and discussion does a consensus evolve. There are examples of this throughout the history of science. Whether it's the flat Earth debate, Big Bang vs. the Steady State theory, structured programming vs. object oriented programming, through debate, reason, mathematical arguments, prototype systems, models, simulation and all sorts of other techniques, the history of science is full of arguments and competing ideas.

When you read a secondary source, very often whatever "debate" has taken place is already in the past and the author of the secondary source will simply describe the consensus that has since been reached. For example, there were many good reasons for cosmologists to believe in the steady state theory before evidence for the Big Bang became overwhelming. Only by going back to this litereature can we see how the debate unfolded and why the evidence that supported the Big Bang (to do with background microwave radiation in the cosmos which was discovered in the 1960s) was so convincing.

In Computer Science there are also many of these debates. For example, most proframming languages do not have a "goto" statement. In fact, Java has a keyword called "goto", but it is not used. In the late 1960s and 70s there was a heated debate about whether "goto" was a safe and useful construct and you can read through that debate in Dijkstra (1968), Knuth (1974) and plenty of other sources. Without going back to these papers, which were written well before the debate was settled, can you fully understand the arguments that, eventually, banished the "goto" statement from most modern languages.

Conclusions: reading blog posts is harder than reading papers

So, why did I say in the title of this post that reading blog posts is "harder" than reading papers? Actually reading blog posts may be easier, but in terms of getting a good grade in your project you are unlikely to produce a high quality literature review based on blog posts. Blogs will tend to be selective and biased in their nature. This isn't a criticism of blogs, far from it, blogs are a great place for lively debates. They aren't such a great place, necessarily, to descibe careful, peer-reviewed research in great detail -- that's best left for conferences and journals.


Edsger W. Dijkstra (1968) Letters to the editor: Goto statement considered harmful. Communications of the ACM 11, 3 (March 1968), 147-148. DOI=10.1145/362929.362947

Donald E. Knuth (1974). Structured Programming with goto Statements. ACM Computing Surveys 6, 4 (December 1974), 261-301. DOI=10.1145/356635.356640

Royce, W. Winston (1970), Managing the development of large software systems: concepts and techniques In proceedings of IEEE WESTCON, Los Angeles , 1--9 .

Posted via email from Pass your university project