This post is part of a series:
Introducing quality, Part 1: Software testing
The past year has seen a push for more quality in our software here at Tombola and this push has happened on many fronts: people, processes and code. As far as people go, we’ve hired QA testers for every team to assure that whatever gets pushed to live is up to our expectations and that of course there are no regressions. Lots of internal processes and tools also got introduced.
In this series of articles we will talk about the QA approach in the Bingo Platform team and we will start with the basic building block: the code.
Where we started from
Everyone has come across the testing pyramid:
The base of a layer represents the number of tests and the position the complexity and coverage/reliability. So according to this, a well tested software system should have a big number of unit tests that are easy to write/maintain, run fast but cover minimal code; and a small number of end-to-end tests (or user interface, or acceptance, or functionality tests, the jury is still out on the naming) that are a lot slower, a bit harder to write/maintain, sometimes give false negatives but cover a lot more functionality and have much more end user relevance. There is a middle bit in there for all the tests that are not quite end-to-end tests but definitely not unit tests either as they use actual dependencies and not mock them. Some people will include an even smaller layer at the top for manual testing (see following posts).
Our main code base has had quite a big suite of unit tests that has been running for a long time and supporting the main functionality. They were run in TeamCity on every checkout, right after the build. There were also a few GhostInspector tests ran after deploying to dev. There were no other tests of any kind. Good automated tests (like normal code) are hard to write and can get messy and unmaintainable. Clean code practices apply to them too. We can’t ignore writing business code even if it is in a bad shape but we can ignore testing if the time we spend and value we get out of it is not worth it. In our case then the unit tests definitely needed some refactoring and cleaning up. The GhostInsector functional tests covered some of the basic functionality but haven’t been developed in some time and did not influence the deployment process. There was a part of the code base that was not covered and automated testing was not part of the building process but rather something to apply after solving problems. In the end we could not have 100% confidence in our tests. And you guessed it! here come the memes:
The first steps
So, where to we go from here? As with any big change in software development, start with the people first. We needed to get the developers on the team to see the value of software testing and restore their confidence in them. The code base is already significantly large and covering it with unit tests would be hard. It hasn’t been written with testability in mind and we would end up with having to do a lot of setting up before the tests and/or refactor large parts of the code in order to bring it in shape for unit testing. Both approaches are too complex and time consuming and we needed some quick wins.
We decided to focus on the end-to-end tests (we shall call them e2e tests from now on). Relatively few e2e tests would cover big chunks of the code base and give us lots of confidence that, from a user’s point of view, we did not commit something that broke some core functionality. Having a custom built test suite, we covered all of the main functionality of the site (logging in, registering, paying/withdrawing, navigating, editing account details) and made the tests run in TeamCity on every commit (more details in the next post of the series).
This work did not take much time and after a bit of tweaking it started to pay off. We caught a few regressions in TeamCity before the code got deployed to any environment and solved them immediately. The feedback cycle was shortened significantly because we did not discover regressions manually on the dev/stage or -God forbid- live environments. We of course held the mandatory mini celebrations and “:)” post-its in the sprint retrospectives. Everyone started seeing the value of building in quality early on in the development process and owning the notion collectively, not relying on manual testers or product owners to catch errors before they would go live.
It was very interesting and fulfilling to see everyone coming on board with the idea of software testing and how we evolved it. At first we noticed that the e2e and integration tests had started taking a long time to run. That was only natural as they were using the same database and user account and they have been growing in numbers.
We implemented a “user pool” functionality where all the e2e tests use different users, sourced from a pool that grows according to the demand. This allowed us to run tests in parallel as they have independent contexts, which we accomplished with NUnit’s Parallelizable attribute. The run time for e2e tests has dropped by 20-30% and at the time of this writing we intend to retrofit the integration tests with this feature as well. Parallelization also allowed us to refactor the tests to smaller user journeys thus making them more stable.
Secondly, we used NUnit’s Category attribute to filter out the really flaky ones and work on them separately as technical debt in order to improve them. This is an ongoing work, every sprint we improve at least a couple and hopefully we will end up soon with a very robust test suite. The added benefit of filtering them out is that we can run them in their own TeamCity build configuration and not slow down our build pipeline too much. Fixing them is top priority though, the “Fragile” category is not a throw out and forget pile!
There are optimizations to be made at the database level too. We started using the dev database as a back-end store to get us started but they should be using their own datastore with curated data that resemble production as much as possible.
All of this work has been done to get us to a comfortable level of confidence for our existing code base, its functionality and its deployability. But the work hasn’t stopped. New features have been and are being released every week. We as a team have agreed to make sure every piece of new functionality on the site has been automatically tested appropriately before we can commit it. This has become part of our pull request verification process. Our “Definition of Done” for a user story has been modified to include automated testing too. We also treat the bugs we find as “test cases”. A bug is a piece of functionality that apparently hasn’t been automatically tested properly, otherwise it would have never occurred. Once it’s fixed, a test will be put in place to assure no regressions will happen. Yes, you guessed it again! Meme time:
But what about unit tests?
We still rely heavily on e2e and integration tests though. Unit testing is a discipline in its own right and it can take a long time to get used to. The inhibiting factors in the code base are still there and will be a big piece of work to take out, one that maybe is not even worth taking on. We have already decided that new functionality should be automatically tested but that usually means some form of an e2e/integration/smoke test. The code written from now on can be in a much better shape by employing unit testing and in particular Test Driven Development (TDD). As we already said, this is a discipline that takes time to master and get the benefits from. The only way to get to that point is to just start doing it and practice, practice, practice. It will surely be a significantly slower process at first, maybe even frustrating, but the long term benefits of code quality and development speed are hard to match. So, again, we are at the start of this journey and we need to get the people on board.
To that end, we held a couple of coding sessions where we talked about the notion of “clean code” and how TDD can help you achieve it. We all paired up and coded away on a simple .NET MVC application using the 3 rules of TDD and getting to know our way around the basic principles of unit testing. At the end of them, we all did a mini code review of everyone else’s code.
As we expected, everyone felt it was a very slow process at first but did notice some light at the end of the tunnel. There are many resources online that will tout the benefits of unit testing (code quality, easy refactoring etc) and we won’t start any holy wars in this post 🙂 The only thing that’s left is to practice the discipline at every chance and hopefully see the benefits in real world situations. It’s worth mentioning we enjoyed this TDD exercise so much that we agreed to regular coding sessions of any sort (not just TDD) and more frequent code reviews.
This has been a pretty long process and even though we have reaped many benefits, as always, there is still a long way to go. We can always cover more code with tests, refactor them to be faster and more stable, adjust how we run them on our CI pipeline, customize their behavior and our processes when they fail, deal with the tests suites scaling ever larger… Thankfully we are in a profession where there is always an interesting problem to solve. Whatever the future holds though we know we can rely on our good coding practices, our automated test suites, our manual QA and our robust deployment pipeline to deliver the most value to our Bingo players!