Automated test harnesses for Firefox, zooming out a bit
In December I was going through some of the steps to be able to change, fix, and interpret the results of a small subset of the gazillion automated tests that Mozilla runs on Firefox builds. My last post arrived at the point of being able to run mochitests locally on my laptop on the latest Firefox code. Over the holidays I dug further into the underlying situation and broaded my perspective. I wanted to make sure that I didn’t end up grubbing away at something that no one cared about or wasn’t missing the big picture.
My question was, how can I, or QA in general, understand where Firefox is with e10s (Electrolysis, the code name for the multiprocess Firefox project). Can I answer the question, are we ready to have e10s turned on by default in Firefox — for the Developer Edition (formerly known as Aurora), for Beta, and for a new release? What criteria are we judging by? Complicated. And given those things, how can we help move the project along and ensure good quality; Firefox that works as well as or, we hope, better than, Firefox with e10s not enabled? My coworker Juan and I boiled it down to basically, stability (lowering the crash rate) and automated test coverage. To answer my bigger questions about how to improve automated test coverage I had to kind of zoom out, and look from another angle. For Firefox developers a lot of what I am about to describe is basic knowledge. It seems worth explaining, since it took me significant time to figure out.
First of all let’s look at treeherder. Treeherder is kind of the new TBPL, which is the old new Tinderbox. It lets us monitor the current state of the code repositories and the tests that run against them. It is a window into Mozilla’s continuous integration setup. A battery of tests are poised to run against Firefox builds on many different platforms. Have a look!
Current view of mozilla-central on treeherder
Edward Tufte would have a cow. Luckily, this is not for Tufte to enjoy. And I love it. You can just keep digging around in there, and it will keep telling you things. What a weird, complicated gold mine.
Digression! When I first started working for Mozilla I went to the Automation and Tools team work week where they all came up with Treeherder. We wanted to name it something about Ents, because it is about the Tree(s) of the code repos. I explained the whole thing to my son, I think from the work week, which awesomely was in London. He was 11 or 12 at the time and he suggested the name “Yggdrazilla” keeping the -zilla theme and in reference to Yggdrasil, the World-Tree from Norse mythology. My son is pretty awesome. We had to reject that name because we can’t have more -zilla names and also no one would be able to spell Yggdrasil. Alas! So, anyway, treeherder.
The left side of the screen describes the latest batch of commits that were merged into mozilla-central. (The “tree” of code that is used to build Nightly.) On the right, there are a lot of operating systems/platforms listed. Linux opt (optimized version of Firefox for release) is at the top, along with a string of letters and numbers which we hope are green. Those letter and numbers represent batches of tests. You can hover over them to see a description. The tests marked M (1 2 3 etc) are mochitests, bc1 is mochitest-browser-chrome, dt are developer tools tests, and so on. For linux-opt you can see that there are some batches of tests with e10s in the name. We need the mochitest-plain tests to run on Firefox if it has e10s not enabled, or enabled. So the tests are duplicated, possibly changed to work under e10s, and renamed. We have M(1 2 3 . . .) tests, and also M-e10s(1 2 3 . . . ). Tests that are green are all passing. Orange means they aren’t passing. I am not quite sure what red (busted) means (bustage in the tree! red alert!) but let’s just worry about orange. (If you want to read more about the war on orange and what all this means, read Let’s have more green trees from Vaibhav’s blog. )
I kept asking, in order to figure out what needed doing that I could usefully do within the scope of As Soon As Possible, “So, what controls what tests are in which buckets? How do I know how many there are and what they are? Where are they in the codebase? How can I turn them on and off in a way that doesn’t break everything, or breaks it productively?” Good questions. Therefore the answers are long.
There are many other branches of the code other than mozilla-central. Holly is a branch where the builds for all the platforms have e10s enabled. (Many of these repos or branches or twigs or whatever, are named after different kinds of tree.) The tests are the standard set of tests, not particularly tweaked to allow for e10s. We can see what is succeeded and failing on treeherder’s view of holly. A lot of tests are orange on holly! Have a look at holly by clicking through on the link above. Here is a picture of the current state of holly.
If you click a batch of tests where there are some failures — an orange one — then a new panel will open up in treeherder! I will pick a juicy looking one. Right now, for MacOS 10.6 opt, M(2) is orange. Clicking it gives me a ton of info. Scrolling down a bit in the bottom left panel tells me this:
The first number is how many tests ran. Scary. Really? 164546 tests ran? Kind of. This is counting assertions, in other words, “is” statements from SimpleTest. The first number lists how many assertions passed. The second number is for assertion failures and the third is for “todo” statements.
The batches of tests running on holly are all running against an e10s build of Firefox. Anything that’s consistently green on holly, we can move over to mozilla-central by making some changes in mozharness. I asked a few people how to do this, and Jim Matthies helpfully pointed me at a past example in Bug 1061014. I figured I could make a stab at adding some of the newly passing tests in Bug 1122901.
As I looked at how to do this, I realized I needed commit access level 2 so I filed a bug to ask for that. And, I also tried merging mozilla-central to holly. That was ridiculously exciting though I hadn’t fixed anything yet. I was just bringing the branch up to date. On my first try doing this, I immediately got a ping on IRC from one of the sheriffs (who do merges and watch the state of the “tree” or code repository) asking me why I had done something irritating and wrong. It took us a bit to figure out what had happened. When I set up mercurial, the setup process and docs told me to install a bunch of Mozilla specific hg extensions. So, I had an extension set up to post to bugzilla every time I updated something. Since I was merging several weeks of one branch to another this touched hundreds of bugs, sending bugmail to untold numbers of people. Mercifully, Bugzilla cut this off after some limit was reached. It was so embarrassing I could feel myself turning beet red as I thought of how many people just saw my mistake and wondered what the heck I was doing. And yet just had to forge onwards, fix my config file, and try it again. Super nicely, Clint Talbert told me that the first time he tried pushing some change he broke all branches of every product and had no idea what had happened. Little did he know I would blog about his sad story to make myself feel less silly….. That was years ago and I think Mozilla was still using cvs at that point! I merged mozilla-central to holly again, did not break anything this time, and watched the tests run and gradually appear across the screen. Very cool.
I also ended up realizing that the changes to mozharness to turn these batches of tests on again were not super obvious and to figure it out I needed to read a 2000-line configuration file which has somewhat byzantine logic. I’m not judging it, it is clearly something that has grown organically over time and someone else probably in release engineering is an expert on it and can tweak it casually to do whatever is needed.
Back to our story. For a batch of tests on holly that have failures and thus are showing up on treeherder as orange, it should be possible to go through the logs for the failing tests, figure out how to turn them off with some skip-if statements, filing bugs for each skipped failing test. Then, keep doing that till a batch of tests is green and it is ready to be moved over.
That seems like a reasonable plan for improving the automated test landscape, which should help developers to know that their code works in Firefox whether e10s is enabled or not. In effect, having the tests should mean that many problems are prevented from ever becoming bugs. The effect of this test coverage is hard to measure. How do you prove something didn’t happen? Perhaps by looking at which e10s tests fail on pushes to the try server. Another issue here is that there are quite a lot of tests that have been around for many years. It is hard too know how many of them are useful, whether there are a lot of redundant tests, in short whether there is a lot of cruft and there probably is. With a bit more experience in the code and fixing and writing tests it would be easier to judge the usefulness of these tests.
As usual when I dive into anything technical at Mozilla, I think it’s pretty cool that most of this work happens in the open. It is a great body of data for academics to study, it’s an example of how this work actually happens for anyone interested in the field, and it’s something that anyone can contribute to if they have the time and interest to put in some effort.
This post seems very plain with only some screenshots of Treeherder for illustration. Here, have a photo of me making friends with a chicken.