I recently stumbled across a copy of Levison’s Software Testing with Visual Studio 2010. I found it to be a curious example of the unevenly distributed future. Some of the sections are clearly outdated, while others sound like science fiction. It prompted some thoughts about testing in Business Intelligence (BI).
There are many challenges in testing BI software:
It’s not unusual in BI to create software with the idea that it’s going to be modified as soon as the customer sees it. This immediate feedback loop is healthy, but it sometimes leads the programmer to conclude that there are no requirements. This perceived lack of requirements then sets off a cascade of testing failures. The solution to this problem is almost self-evident: Software should be created only after it has been designed to a written requirement. Lack of confidence in the current requirements is motivation for those requirements to be lightweight, and for the design to be flexible. You should always have tests to cover your software, and written requirements that justify your tests. If you want to write software for which you have no requirement either resist that urge or better yet, explore the potential requirement with your customer.
When I take ownership of a database there usually is no build process nor any test automation, nevermind a thriving ecosystem built on those foundations. And the first step is not to implement build and test systems, but to choose which tool we’re going to adapt to fill those needs. The Extract Transform Load (ETL) unit testing frameworks that do exist are so far behind frameworks in mainstream programming that taking a well supported framework that doesn’t support ETL and adapting it for ETL is often going to be the correct decision. Other BI tools are in similar states. All of this is a barrier that just doesn’t exist in other sectors of software development.
Finally there is the tension between testing on a full set of live data or on a small controlled set of data. The small controlled set is required for unit testing. The full set of data required for performance testing and reconciliation, and helpful for acceptance testing. The correct answer is that both sets of data are needed. If you’re BI is dependent on an expensive DBMS and the design is flawed so that you can’t use an instance for multiple purposes (because the database and schema names are hard coded), then there may be financial costs. Confront that problem head on, don’t hide it and let it fester.
Like many problems, we simplify BI testing by decomposing it. When testing it automated, the number of steps in testing is not a direct source of costs. Decomposition of the tests into a logical programme of discrete units improves our ability to understand the system and therefore to make it functional efficiently. Costs are reduced by having many simple tests, rather than fewer complex tests.
Software grows from the requirements, and step one is improving those requirements. Regardless of your software development process, you need written requirements with sufficient detail that an objective third party can determine if the software is correct or not. Once that requirement is written it needs to be analysed by the designers and developers of the application.
When we are doing rapid-prototyping, the written requirement may arrive after the software has been “approved” by the customer. Do not let the approval of a small handful of test cases persuade you that the requirement does not need to be stated. That’s a well paved road to chaos.
I recommend BEAM* as a process for creating good BI requirements.
Once the requirement is frozen, whether for sprint or an hour, the software needs to be designed. That design needs to be documented in a format that makes it durable, and then it needs to be reviewed by a knowledgeable peer who wasn’t involved in its creation. Peer design review is regularly shown to be the most effective testing activity.
Every software module should have unit tests. The unit tests use the small, hand-crafted, control data set or even mock out the database entirely. I recommend Test Driven Design, particularly as that process sometimes allows for the initial draft of the unit tests to be the design that was reviewed in the previous stage.
Every software feature should have acceptance tests. It is important that the customer can understand the acceptance test. Any feature that is difficult to prove is working from the customer’s perspective should be considered for redesign or removal.
These acceptance tests run the integrated system end-to-end from the feature’s, and therefore customer’s, perspective. Acceptance tests should be run on a recent backup of the production data. This is easiest by using a webdriver on the visualisations that your customers interact with. If you have a feature that isn’t readily testable through the user interface, the first step ought to be to consider integrating it into your user interface. You’ll still end up with some features that aren’t readily visible, which is where a customer-readable testing framework like DBFit comes into your tool kit.
There is no characteristic inherent to BI software that obstructs the application of continuous integration. The continuous integration system ought to regularly play all of the unit tests, and if the unit tests pass then all of the acceptance tests. Your acceptance tests probably take minutes or even hours to run, so the CI probably will not run on every commit to source control.
The CI system’s logs of the acceptance test are the bulk of your performance tests. You just need to write a few tests that read the other test’s logs and compare them to your performance measurement thresholds.
The customer still needs to interact with the system before it goes into production. I recommend having them watch the acceptance tests for the new development, then having them do some exploratory testing. A screen recording tool is invaluable in making that exploratory testing useful for everyone.
A subset of the acceptance tests should check data without modifying it. These reconciliation tests check if the data from the source system is being reflected in each layer of the BI system. A subset of these reconciliation tests ought to run regularly on the production system.
As I mentioned earlier, this plan either sounds archaic and over simplified, or like science fiction. If you’re doing better than this, congratulate yourself. If not you may need some motivation:
Faced with the enormity of the BI testing, many decision makers decide not to invest in a comprehensive testing program. This is an age old problem. As Weigers says on the first page of Creating a Software Engineering Culture, “Never let your boss or your customer talk you into doing a bad job”. The cost of bad software is enormous. It is our job as software professionals to deliver software of acceptable quality at as fast a pace as we can. A methodical, repeatable, and reliable test process proves that quality. It also ultimately increases the speed of delivery by reducing the time taken by testing, deducing the number of defects that come back from formal testing, and by keeping the system as a whole from slipping into chaos.
Effective testing is the difference between order and chaos, and therefore between success and failure of your entire BI initiative. Show your work and its tests the respect that they deserve.