Testers are scientists.
You might not wear a lab coat to work (but if you don’t, you should try it), you might not think of yourself as a scientist, but what you do every day is craft a hypothesis and then design and execute experiments to try and falsify it. Many testers might not think of themselves as experimental scientists, it may seem like we have nary a test tube in many an office and no
Bunsen burners in the server room, so how can we be scientists?
Science isn’t defined by the equipment, but by the methodology. Karl Popper’s philosophy of science* defines science, in a horrible paraphrase, as the framing of a hypothesis and repeated attempts to disprove it – which is exactly what you do as a tester. You have a hypothesis, which I’ll designate as the hypothesis under test. The hypothesis under test for software testers is, generally, that the software behaves as expected.
So why does this matter? Well first you need to define what the expected behaviour is, what it should do when presented with different inputs/situations. Then you need to see if you can falsify that hypothesis under test by designing experiments which will provide a series of inputs and situations and observe the outcome. You tend to call these test cases. But are you really doing this? Often it’s too easy to fall into the trap of merely checking that the software behaves properly when given correct input, however that’s not falsification. You definitely should see if this happens, but there has to be an element of falsification; you have to assume that when given abnormal input or situation it also behaves correctly. Of course the label “situation” here covers a multitude of sins, from throttled CPU, to network time outs, to RAM. Your experiment will probably be best served by being broken down into many experiments to isolate each input and environmental variable.
Your experiment is starting to look a lot like a test plan and test cases/areas under test. Now you might be testing something like tens of thousands of connections to a database or packet transfer and the environment itself will introduce occasional failures, or you might simply have a tolerance for a certain level of failure. You accept that as long as X% or less of your experiments/tests fail, you’re willing to accept that the system as a whole works. Now this is a statistical method of hypothesis testing, you might have done something like this at school with confidence intervals and two tailed tests and the like. There’s a lot of maths around this area and if you’re so inclined, and I honestly think it’s worthwhile knowing, you can find lots about it on the internet**. However it’s probably enough to know an overview of the basic maths from the links below and you’ll see how you can account for things like systemic error.
I hope this has illustrated something of what I use to approach my testing, I was trained at university as a physicist, so I spent a lot of time designing ways to test hypotheses and even more time actually testing them. Then I spent even more time than that going through my results to extract meaning from the morass of data I’d generated.
I won’t be plunging this blog into a morass of mathematics and statistics, at least on my regular testing blog posts. Though I have an idea to explain randomness and its application to algorithms to generate efficient computations and good test coverage. This would require a whole series of posts to explain what randomness is, how it can be useful and why you shouldn’t fear things slipping through a procedurally generated test regime which employs stochastic elements in its tests. Whether anyone would want to read it though is another matter, if you feel strongly either way please let me know in the comments.
* https://en.wikipedia.org/wiki/Karl_Popper#Philosophy_of_science
** http://www.ats.ucla.edu/stat/mult_pkg/faq/general/tail_tests.htm
https://en.wikipedia.org/wiki/Confidence_interval