This will mostly be helpful to Firefox Desktop folks, so if you’re not one of those, please instead enjoy a different blogpost. I recommend this one about the three roles of data engagements.
So you’ve found yourself a plot that looks like this:

You suspect this has something to do with a code change because, wouldn’t you know it, the sharp decline starts around Feb 22 and we released Firefox 123 on Feb 20. But where do you go from here? Here’s a step-by-step of how I went from this plot arriving in Slack#data-help to finding the bugfix that most likely caused the change:
1. Ensure this is actually a version-specific change
It’s interesting that the cliff in the plot happened near a release day, and it’s an excellent intuition to consider code releases for these sorts of sea-changes in data volume or character. But we should verify that this is the case by grouping by mozfun.norm.truncate_version(app_version, 'major') AS major_version which in our case gives us:

Sure enough, in this case the volume cliff happens entirely within the Firefox 123+ colours. If this isn’t what you get, then it’s somewhat less likely that this is caused by a client code change and this guide might not help you. But for us this is near-certain confirmation that the change in the data is caused by a code change that landed in Firefox 123… but which one?
( This is where I spent a little time checking some frequent “gotcha” changes that could’ve happened. I checked: was it because data went from all-channel to pre-release-only? (No, the probe definitions didn’t change and the fall isn’t severe enough for that (would look more like an order of magnitude)) Was it because specific instrumentation within the group happened to expire in Fx123? (No, the first plot is grouped by specific probe, and all of the groups shared the same shape as their sum) Was it an incredibly-successful engagement-boosting experiment that ended? (No, there haven’t been any relevant experiments since last July) )
2. Figure out which Nightly builds are affected
Firefox Desktop releases new software versions twice a day on the Nightly channel. We can look at the numbers reported by these builds to narrow down what specific 12h period the code landed that caused this drastic shift. Or, well, you’d think we could, but when you group by build_id you get:

Because our Nightly population isn’t randomly distributed across timezones, there are usage patterns that affect the population who use which build on which day. And sometimes there are “respins” where specific days will have more than 2 nightlies. And since our Nightly population is so small (You Can Help! Download Nightly Today!), and this data is a little sparse to begin with, little changes have big effects.
No, far more commonly the correct thing to do is to look at what I call a “build day”. This is how GLAM makes things useful, and this is how I make patterns visible. So group by SUBSTR(build_id, 1, 8) AS build_day, and you get:

Much better. We can see that the change likely landed in Jan 18’s nightlies. That Jan 18-20 are all of a level suggests to me that it probably ended up in all of Jan 18’s nightly builds (if it only landed in one of the (normally) two nightly builds we’d expect to see a short fall-off where Jan 18 would be more like an average between Jan 17 and 19.).
Regardless of when during the day, we’re pretty sure we have this nailed down to only one day’s worth of patches! That’s good… but it could be better.
3. Going from build days to pushlog
Ever since I was the human glue keeping the (now-decommissioned) automated regression detection system “alerts.tmo” working, I’ve had a document on my disk reminding me how to transform build days or build_ids into a “pushlog” of changes that landed in the suspect builds. This is how it works:
- Get the hg revisions of the suspect builds by looking through this list of all firefox releases for the suspect builds’ ids. You want the final build of the day before the first suspect build day and the final build of the final suspect build day, which in this case are Jan 17 and Jan 18, so we get f593f07c9772 and 9c0c2aab123:

- Put them into this template:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange={}&tochange={}— which gives us https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f593f07c9772&tochange=9c0c2aaab123
This gives you a list of all changes that are in the suspect builds, plus links to the specific code changes and the relevant bugs, with the topic sentence from each commit right there for you. Handy!
4. Going from a pushlog to a culprit
This is where human pattern matching, domain expertise, organizational memory, culture and practices, and institutional conventions all combine… or, to put it another way, I don’t know how to help you get from the list of all code that could have caused your data change to the one (or more) likely suspects. My brain has handily built me a heuristic and not handed me the source code, alas. But I’ve noticed some patterns:
- Any change that is backed out can be disregarded. Often for reasons of test failures changes will be backed out and relanded later. Sometimes that’s later the same day. Sometimes that’s outside our pushlog. Skip any changes that have been backed out by disregarding any commits from a bug that is mentioned before a commit that says “Backed out N changesets (bug ###)…”.
- You can often luck out by just text searching for keywords. It is custom at Mozilla to try to be descriptive about the “what” of a change in the commit’s topic, so you could try looking for “telemetry” or “ping” or “glean” to see if there’s anything from the data collection system itself in there. Or, since this particular example had to do with Firefox Relay’s integration with Firefox Desktop, I looked for “relay” (no hits) and then “form” (which hit a few times, like on the word “information”, … but also on the culprit which was in the form detector code.)
- This is a web view on the source code, so you’re not limited to what it gives you. If you have a mozilla-central checkout yourself, you can pull up the commits (if you’re using git-cinnabar you can use its
hg2gitfunctionality to change the revs from hg to git) and dump their sum-total changes to a viewer, or pipe it through grep, or turn it into a spreadsheet you can go through row-by-row, or anything you want. I’m lazy so I always try keywording on the pushlog first, but these are always there for when I strike out.
5. Getting it wrong
Just because you found the one and only commit that landed in a suspect build that is at all related, even if that commit’s bug specifically mentions that it fixed a double-counting issue, even if there’s commentary in the code review that explains that they expect to see this exact change you just saw… you might be wrong.
Do not be brusque in your reporting. Do not cast blame. And for goodness’ sake be kind. Even if you are correct, being the person who caused a change that resulted in this investigation can be a not-fun experience. Ask Me How I Know.
Firefox Desktop is a complex system, and complex systems fail. It’s in their nature.
And that’s it! If you have any comments, question, or (better yet) improvements, please find me on the #glean:mozilla.org channel on Matrix and I’d love to chat.
:chutten







