Hypothesis testing and why we’re all actually research scientists

Testers are scientists.

You might not wear a lab coat to work (but if you don’t, you should try it), you might not think of yourself as a scientist, but what you do every day is craft a hypothesis and then design and execute experiments to try and falsify it. Many testers might not think of themselves as experimental scientists, it may seem like we have nary a test tube in many an office and no
Bunsen burners in the server room, so how can we be scientists?

Science isn’t defined by the equipment, but by the methodology. Karl Popper’s philosophy of science* defines science, in a horrible paraphrase, as the framing of a hypothesis and repeated attempts to disprove it – which is exactly what you do as a tester. You have a hypothesis, which I’ll designate as the hypothesis under test. The hypothesis under test for software testers is, generally, that the software behaves as expected.

So why does this matter? Well first you need to define what the expected behaviour is, what it should do when presented with different inputs/situations. Then you need to see if you can falsify that hypothesis under test by designing experiments which will provide a series of inputs and situations and observe the outcome. You tend to call these test cases. But are you really doing this? Often it’s too easy to fall into the trap of merely checking that the software behaves properly when given correct input, however that’s not falsification. You definitely should see if this happens, but there has to be an element of falsification; you have to assume that when given abnormal input or situation it also behaves correctly. Of course the label “situation” here covers a multitude of sins, from throttled CPU, to network time outs, to RAM. Your experiment will probably be best served by being broken down into many experiments to isolate each input and environmental variable.

Your experiment is starting to look a lot like a test plan and test cases/areas under test. Now you might be testing something like tens of thousands of connections to a database or packet transfer and the environment itself will introduce occasional failures, or you might simply have a tolerance for a certain level of failure. You accept that as long as X% or less of your experiments/tests fail, you’re willing to accept that the system as a whole works. Now this is a statistical method of hypothesis testing, you might have done something like this at school with confidence intervals and two tailed tests and the like. There’s a lot of maths around this area and if you’re so inclined, and I honestly think it’s worthwhile knowing, you can find lots about it on the internet**. However it’s probably enough to know an overview of the basic maths from the links below and you’ll see how you can account for things like systemic error.

I hope this has illustrated something of what I use to approach my testing, I was trained at university as a physicist, so I spent a lot of time designing ways to test hypotheses and even more time actually testing them. Then I spent even more time than that going through my results to extract meaning from the morass of data I’d generated.

I won’t be plunging this blog into a morass of mathematics and statistics, at least on my regular testing blog posts. Though I have an idea to explain randomness and its application to algorithms to generate efficient computations and good test coverage. This would require a whole series of posts to explain what randomness is, how it can be useful and why you shouldn’t fear things slipping through a procedurally generated test regime which employs stochastic elements in its tests. Whether anyone would want to read it though is another matter, if you feel strongly either way please let me know in the comments.

* https://en.wikipedia.org/wiki/Karl_Popper#Philosophy_of_science
** http://www.ats.ucla.edu/stat/mult_pkg/faq/general/tail_tests.htm
https://en.wikipedia.org/wiki/Confidence_interval

4 thoughts on “Hypothesis testing and why we’re all actually research scientists”

Paul Coyne 2016-06-27 at 08:16

Bingo! My degree is zoology and I use the SM as a tester. I absolutely agree with your points. This was the topic of my talk to EuroStar 2015. I think that the scientific method should be the “testers’ secret sauce” (horrid term, but you know what I mean) and it should be explicitly part of what is taught and practiced.

Reply ↓

Dom Walden 2016-06-27 at 12:43

I’ve always wanted to hear from scientists/people educated in science who are in software testing. What techniques did you use as a scientists to test hypotheses, and can these be applied to software testing?

The Popperian view of science is, of course, not the only view. Do you think that there are other philosophies of science that are applicable to testing? For example, an inductive perspective.

Reply ↓

Gem 2016-07-02 at 18:31

I honestly believe my biochem degree helps me test. I don’t do test cases, but I do mental test cases, and collect evidence for them, and its pretty close to the labwork I was doing as part of my degree (I talk about it here: http://letstalkabouttests.xyz/index.php/2016/03/24/ep-44-now-science-bit/).

I don’t miss the 7k dissertation at the end though ;)

Reply ↓

Jesus Acevedo 2016-07-22 at 17:23

I gleaned your article and have to align myself with you. Embracing hypothesis testing is paramount and should just be plain natural. As a tester the first thing to do is partition your requirements into verification statements, then you need to create a decision table to understand how much test data your going to justify and have full coverage, then you need to understand the user actor and what data may cover same conditions. You also need to understand and simulate all acceptable environment which should be a part of your test basis as well (never forget that). This will all be paramount for your test specification and testing thereof. The reporting provided is aa results reference to the execution you perform (I am being very general here to save bytes but I hope you get my point) and what you want to do is make sure 100% of the requirements are traced and tested and an acceptable % of the rules are passed both functional and non functional. If that’s not scientific then I maybe I don’t know what I am talking about and am just completely ignorant. Don’t forget the time planned versus the real time executed. This whole thing can be turned into a formula, but I don’t want to go there … if anyone still refuses to believe testing is scientific they should not be in this business to begin with.

Reply ↓

Crafty testing

Testing, crafting, and slyness. Follow me @Testing_crafty

Hypothesis testing and why we’re all actually research scientists

4 thoughts on “Hypothesis testing and why we’re all actually research scientists”

Leave a Reply Cancel reply