PDA

View Full Version : Systematic testing nano-tutorial



cbrandin
06-23-2010, 12:01 PM
I think a short tutorial on how to test electronic devices might be in order.

First of all you can’t really test for success. It amounts to trying to prove a negative – that no errors exist. If you wanted to be confident that you will not get errors in 1000 hours of shooting you would have to shoot 1000 hours and hope for the best. This is not viable. Failures almost never completely go away, they just become more improbable. Hopefully so improbable that they don’t matter. This is not the problem you might suppose it is because once you get to the point where a particular component’s reliability exceeds that of the rest of the system you are essentially there.

There is a way to get around this empirical testing problem though. Failures usually follow some kind of Gaussian distribution (half of a bell curve). Let’s say you have a test that fails in 5 seconds. You back off on a parameter and the failure only happens every 10 seconds. Back off a little more and it only fails after 30 seconds. After you have gathered several data points you can start getting an idea of how these failures fit on a distribution curve. Then you can extrapolate that out to whatever reliability level you want.

Now this is a bit of an oversimplification because you have to integrate the failure characterizations of all components (or parameters, as the case may be) and factor in interactions between them to develop an overall reliability statistic. If you’ve ever wondered how a manufacturer can claim a mean time between failure of years without testing for years, this is basically how it is done. That and stressing environmental factors.

Chris

Svart
06-23-2010, 12:55 PM
It depends really on what you want to test. Their are two main types of testing:

1. Testing designed functionality. This is called Product Verification or PV for short. You've designed something and know how it *should* work. Now you devise testing that will verify that the design works for 100% of it's intended use.

2. Testing randomly for failure. This is sort of what we are doing when we try random things and see what happens. Generally this will lead to finding clues and then following clues to build a hypothesis.

We don't really know what the design of the camera should be able to handle, therefor we can't test it's true functionality. We can only try random testing until we build a number of clues that might point to a problem or to a potential useful scenario. The problem is that there is no set working parameters. Testers can pretty much try anything they want. This can create the problem that we saw with the Native 24p issue.

The Native 24p issue shows us how humans gravitate towards testing things they want and testing more to prove the results they want rather than attempting to make the unit under test fail, as most Test Engineering does. People were testing settings they *hoped* would work in order to get the maximum bitrates out of their cameras. Instead of testing the worst-case scenario of high bitrates and low card speeds first and then gradually moving up in card speed or down in bitrate, they simply bought the fastest cards out there, the sandisk 30mb/s cards, and quickly determined that these hacks were working since it worked for themselves and a few other people. Instead of trying to change more unknowns, like card speed or brand/type, to get a feel for how the range of changes would affect the outcome, people were chastized for finding that they had noticed errors on cards that weren't the accepted de-facto standard. Bringing this to light, it was found that there were two wildcards(pun intended), the SD card being used and the Native 24p patch. The 24p patch was easy to test since it only incorporated two choices, enabled or disabled. The second wildcard was the choice of card itself where you could simply matrix out a number of cards and settings and try them all. Once you've tried all possible combinations then you are faced with facts. Those facts stated that all cards tested(in my case) would pass at all pre-determined passable bitrates only if Native 24p were not enabled. They would fail at bitrates determined to be very passable if Native 24p were enabled.

This was still met with resistance, for unknown reasons. As Scientific Method(and to a non-scientific degree, Occam's Razor) has proven for thousands of years, once you rule out all other possibilites, the ones left are the answer no matter how unlikely.

Scientific testing, something I do for a living, is all about trying to figure out IF/when something will have a problem, not really what kind of problems you will get. It's also not about trying to prove something to be true. Once you attempt to prove something true, you completely invalidate your research because you have just invested emotional biasing into the equation. This is why most engineers do NOT test their own work. They subconsciously know the flaws and work-arounds and generally fail to account for their own biasing. This is part of the reason that we are all here to help test Vitaliy's work. More people testing will always be more helpful in identifying problems, even if a majority of testers don't get beyond their own agendas.

cbrandin
06-23-2010, 01:05 PM
Good Points! I guess I was addressing the stability testing - which should be the final test. Random testing is indeed a very good way to find errors, especially ones that you haven't anticipated. I think I even said that in the other testing procedure thread. However, if you want to know how reliably something will work, the statistical approach is the only viable systematic way to do it, and no product testing regime is complete without it.

Chris

cbrandin
06-23-2010, 01:10 PM
Of course, that doesn't mean everybody has to do this. One group doing random tests with another being more systematic about it can work quite well. I have seen several complaints about the non-systematic testing going on. Well... non-systematic testing is valuable too - just not so much if there isn't any systematic testing to compliment it.

Chris

Svart
06-23-2010, 01:40 PM
well as long as there is a goal, random testing can be very useful. Random testing of bitrates at the beginning was extremely useful because it quickly led to understanding the maximum theoretical throughput of the camera and gauge the changes between maximum and minimum settings. Beyond this, it's not helpful unless you find a problem. Once you find a problem, you simply have to try every single permutation(or possible combination) of settings until a pattern emerges. This can be a problem because people tend to get tired of trying things that fail and tend to just stick with things that pass and then center their testing around the passing tests. This is a known condition that happens and should be avoided if something is to be tested properly.

The other part is proper documentation. I think everybody is guilty of missing at least one crucial bit of information when they take notes on testing. It's an unfortunate situation but it's completely expected in human nature. We simply can't catch all the details all the time, especially since we might not even know what we are looking for.

I guess the point is really to keep an open mind. You never know what little detail might be the key to figuring something out.

wturber
06-23-2010, 02:07 PM
I

This was still met with resistance, for unknown reasons. As Scientific Method(and to a non-scientific degree, Occam's Razor) has proven for thousands of years, once you rule out all other possibilites, the ones left are the answer no matter how unlikely.

Scientific testing, something I do for a living, is all about trying to figure out IF/when something will have a problem, not really what kind of problems you will get. It's also not about trying to prove something to be true. Once you attempt to prove something true, you completely invalidate your research because you have just invested emotional biasing into the equation. This is why most engineers do NOT test their own work. They subconsciously know the flaws and work-arounds and generally fail to account for their own biasing. This is part of the reason that we are all here to help test Vitaliy's work. More people testing will always be more helpful in identifying problems, even if a majority of testers don't get beyond their own agendas.

The problem with eliminating variables in this environment, is twofold (at least). First, you have to be really sure that logically you've really accounted for the variables you think you've accounted for. Second, when relying on the reporting from others you must trust that they've actually done what they are reporting that they have done. When you have careful testers and not-so-careful testers going back and forth and when the not-so-careful testers think they are the careful testers - you can get some unusual resistance.

BTW, I don't do scientific testing but I have a lot of experience with trouble isolation. I've also done a fair bit of beta testing of graphics software. And that can involve giving some help sleuthing situations that cause failures. My best solve was determining that having numbers in computer names for render nodes was the cause of a problem with an advanced 3D rendering plug-in. Everybody thought that all the variables had been accounted for. Few people could duplicate the error. The developer was stumped big time.

Anyway, one of the big lessons over time has been that it is easy to think you've properly accounted for some possibility when you really haven't. The willingness to question and re-examine existing assumptions is valuable.

cbrandin
06-23-2010, 03:21 PM
Boy, you guys are making some fine points! Hopefully this will result is more efficient testing.

Chris

Ozpeter
06-23-2010, 06:44 PM
No mention here on qualitative testing - in other words, proving that changing any given parameter causes an improvement in picture quality. And if so, by how much (how to measure that??) and then consideration of the cost (card filling more quickly, possibly mid-shot failures in the short or long term) vs the benefit (discernible, demonstrable PQ improvement).

That's where I think the biggest testing problem lies. (And yes, I have said that before!)

Ozpeter
06-23-2010, 06:56 PM
And another thing that should be included in systematic testing - you need to check how the footage produced using the settings under test (if you are thinking of using them for real) will work in typical post production scenarios. Obviously that will vary from NLE installation to NLE installation. For instance, with high bitrate AVCHD, if you want to edit that directly on the software/hardware of your choice, does it actually handle it in all situations? Here for instance I've found with one program that it will edit it with transitions etc and render that no problem, but add in colour correction effects and it falls over (crashes in render) and doesn't with standard GH1 footage. No point in glorious footage that you can't actually use the way you want afterwards.