To have constant heartbeat of release, testing has to take the central stage. It can no more be an activity that is performed at the end of release cycle rather it has to happen during all the phases of a release. And mind you release cycle has shrunk from months to days.
In “Effective DevOps” book, the authors lay out many plans for making an effective plan to move towards a DevOps culture. On automation, it suggests that automation tools belong to either of the following three types:
- Scheduled Automation: This tool runs on a predefined schedule.
- Triggered Automation: This tool runs when a specific event happens…
- On-Demand Automation: This tool is run by the user, often on the command line…
(Page 185 under Test and Build Automation section)
The way, we took upon this advice to ramp up our efforts for Continuous Testing is that each Testing that we perform should be available in all three forms:
- Scheduled Testing: Daily or hourly tests to ensure that changes during that time were merged successfully. There are no disruptions by the recent changes.
- Triggered Testing: Testing that gets triggered on some action. For example a CI job that runs tests which was executed due to push by a Developer.
- On-Demand Testing: Testing that is executed on a needs basis. A quick run of tests to find out how things are on a certain front.
Take Performance testing for example. It should be scheduled to find issues on daily or weekly basis, but it could also be triggered as part of release criteria or it could be run On-Demand by an individual Developer on her box.
In order to achieve this, we re-defined our testing jobs to allow all three options at once. As the idea was to use Tools in this way, we picked upon Jenkins.
There are other options too like GoCD and Microsoft Team Foundation Server (TFS) is also catching up but Jenkins has the largest set of available plugins to do a variety of tasks. Also our prime use case was to use Jenkins as Automation Server and we have yet to define delivery pipelines.
(the original icon is at: http://www.iconarchive.com/show/plump-icons-by-zerode/Document-scheduled-tasks-icon.html )
I’ll write separately on Triggered and On-Demand testing soon and now getting into some details on how we accomplished Scheduled Testing below.
We had few physical and virtual machines, on which we were using Windows Task Scheduler to run tasks. That task will kick off on a given day and time, and would trigger Python script. The Python scripts were in a local Mercurial repository based in one of these boxes.
The testing jobs were Scheduled perfectly but the schedule and outcome of these jobs were not known to the rest of the team. Only testing knew when these jobs run and whether last job was successful or not.
We made on of the boxes as Jenkins Master and others as slaves. We configured Jenkins jobs and defined the schedules there. We also moved all our Python scripts to a shared Mercurial repository on a server that anyone could get. We also created custom parts into our home grown Build system that allows running pieces in series or parallel.
Given that Jenkins gives you a web view which can be accessed by all, the testing schedule became public to everyone. Though we had a “Testing Dashboard” but it was an effort to keep it up-to-date. Also anyone in the team could see how was the last few jobs of say Performance testing and what were the results.
Moving to Jenkins and making our scripts public also helped us make same set of tests Triggered and On-Demand. More details on coming posts so as how.
I wish I could show a “Before” and “After” pictures that many marketing campaigns do to show how beautiful it now looks like.
Do you have Scheduled testing in place? What tools you use and what policies you apply?
We are living in ‘survival of the fastest’ era. We don’t have time for anything. We prefer reading blogs instead of books and we look for tweets rather than lengthy press releases. So when it comes to testing a release that has only a few changes, we don’t have time to run all the tests.
The question but is: which subset of tests we should be running?
I have touched this subject in Test small vs. all, but looking at build change logs and picking up tests to run is a task that requires decision making. What if we can know the changes automatically and run tests based upon that?
That is possible through TIAMaps. No this term is not mine but part of it is. It originates from Microsoft’s concept of ‘Test Impact Analysis’ which I got to know from Martin Fowler’s this blog post. I’d recommend to read it first.
If you are lazier than me and couldn’t finish the whole blog, below is a summary along with a picture copied from there:
First you determine which pieces of your source code are touched upon by your tests and you store this information is some sort of maps. Then when your source code changes, you get the tests to run from the map and then just run those tests.
Below is a summary of TIAMap implementation in our project.
Why we needed it:
We didn’t do it for fun or due to “let’s do something shiny and new”. We are running out of time. Our unit tests suite has around six thousand tests and a complete run (yes, they run in parallel) takes about 20 minutes. Hmmm… a little change that needs to go has to go through 20 minutes of Unit test execution, that’s bad. Let’s see what others are doing. Oh yeah, Test Impact Analysis is the solution.
Generating TIA Maps
Code coverage comes to the rescue. If we already have a tool that finds out which lines of code are touched by all tests, can’t we have a list of source files that are touched by a single test?
So we configured a job that would run for tests and saves this simple map: test name -> source file names. There were two lessons that we learned:
- Initially, we had a job that would run for all 6,000 thousands and it was taking days. We became smarter and after generating first TIA Map for all tests, we only update maps for the tests that changed. We don’t have a way to find the test names that changed, but our job is based upon timestamp of files that have test code.
- We were storing the Map in a SQLite Db. As the Db had to pushed to our repository again and again, it was difficult to find deltas of change. We switched to simple text file to store the Map. Changes can be seen in our source control tools and anyone can look at those text files for any inspections.
As you can imagine that the hard part is to get those TIAMaps. Once we have them, we now do the following:
- When there is a need to run tests, we determine which source files have changed since the last run.
- We have a Python script that does the magic of consulting the maps and getting a list of tests to be executed.
- We feed that list of tests to our existing test execution program.
How is it going?
It is early to say that as we have rolled this as pilot and I may have more insights into the results in few months. But the initial feedback is indicative of us being on the right path. Time is being saved big time and we are looking for any issues that may arise due to faulty maps or execution logic.
Have you ever tried anything similar? Or would you like to try it out?
It happens to all of us. We are used to doing a process in a particular way for years and never think of other ways of doing it. Then someday someone says something. That serves as an eye opener and we start seeing other ways of doing the same thing.
This happened to us for our rule: “Break a build if a single unit test fails”
Sounds very simple and rational. We have this rule for may be 10 years and I have repeated this over and over in all new projects that we took up in these years. Our structure follows the Plan A as shown below i.e. to run tests as part of the build process and if a single test fails, build fails.
What changed in year 2017 was a quest to find ways to release faster, you know the DevOps like stuff. So we started to look at the reasons for the build to be broken. We build our source 4-6 times a day, so we had enough data to look into. One of the reasons was always a failing test.
Now we thought, as you must be thinking by now, that it is a good thing. We should not ship a build for which a given test breaks. But our data (and wisdom) suggested that failing tests were in following three categories:
- The underlying code changed but test code was not changed to reflect this. Thus test fails but code doesn’t fail.
- The test is flaky (expect a full blog post on what flaky tests are and what we are doing about them). For now, flaky test is that passes and fails with same code on different occasions.
- The test genuinely fails i.e. the feature is indeed broken.
Now 1 and 2 are important and Developer who wrote the test need to pay attention. But does this stop the build to be used? Of course not.
3 is a serious issue but with the notions of ‘Testing in Production’ combined with the fact that fix is just few hours away, we figured out a new rule as shown in Plan B.
Yes, when a build fails due to a failing test, we actually report bugs for failing tests and declare the build as Pass. Thus the wheel keeps rolling and if it needs to be stopped or rolled back, it can be.
A few weeks into this strategy while all looked good, it happened what was always feared to be happening. A build in which 20% of our tests were failing was declared pass and our bug tracker saw around 100 new bugs that night. Let’s call that spamming.
That raised our understanding to move from binary (fail or pass) to a bit fuzzy implementation. We came with a new rule that if 10 or more tests (where 10 is an arbitrary number) fail, we’ll declare the build as failed. Otherwise we’ll follow the same path. So this is now our current plan called Plan C.
I know what you are thinking that a single important test could be much important than 10 relatively unimportant tests. But we don’t have a way to weigh tests for now. When we have that, we can even improve our plans.
Does this sound familiar to your situation? What is your criteria of a passed build in context of test results?
The very idea that quality can be added later is flawed and quality must be baked in.
Speaking of baking, it is not possible for the inspector who tastes a finished cake to make it any better. That person would just tell you if the cake is good or bad. And if it’s bad, there is nothing you can do about it. But if you want the cake to be always good, you will take good consideration on the ingredients that will go in, you’ll consider hiring a professional baker to do the job, you’d like to inspect it in intervals like how the dough is or opening the oven to see if cake is in good shape.
Similarly, you cannot have Testers jump late into the development cycle and improve the quality. A Tester can only tell you if the Software is good or bad. And if it’s bad, you’ll have to go many cycles of correction before it goes reasonably good. So, if you want your Software to be in good shape, you’ll take care of the technology and tools you use, you’ll consider training or hiring professionals to do the job right, you’d like to review the code time to time to see if it has quality baked into it.
Catching logical errors:
All of us know that humans make errors but all of us also believe that those humans don’t include us. It might be true that some people make more mistakes than others but no single person is error free. So as a matter of fact, when you show your code to a peer to review it, or you explain it yourself to a peer through a walkthrough, you start catching errors. Some of these are obvious logical errors which got slipped as you wrote that code. Don’t feel ashamed as this happens and we should be happy that this error was caught much early in the cycle.
Catching Performance issues:
Writing algorithms that solve complex problems is not an easy job and at times in an attempt to do that, you can take a path that is lengthy. Well, lengthy in terms of how many CPU cycles you’ll need to complete them which can result in slow performance. There are also cases where unnecessary data structures are involved which cause more memory to be consumed than the minimum required one. Your peers can help you find such errors in the code. You can also have some expert on this subject in your team to do this type of review for all the code that is being written.
In one of the teams in which I worked, we had a person (let’s call him Ahmed) who was our performance expert. All team members when confronted with the task to write best performant code would consult Ahmed after writing the code. And if Ahmed was happy about it, it was always a quality code.
Memory leaks and Security considerations:
Most developers are capable of writing code which is free of logical errors and is usually well performing, but there are other things hidden like the bugs who only come out when there is dark and no one is around. Yes, I’m talking about writing secure code.
These aspects of code can again be reviewed by Developers who are experts in this domain. The good thing is that some of this expertise is now being passed on to tools. For example, you can use built-in tools in your IDE or can get many free tools that can help you find memory leaks in your code.
Similarly, if your code is good in terms of all requirements but is not safe and is vulnerable to various type of attacks, we cannot call it Quality code. Before some hacker can detect those vulnerabilities, how wonderful it would be knowing these so early in your development lifecycle.
Knowledge of the tribe:
The collective knowledge of your development team is what matters more than the individual member of your team. In fact, the sum is always greater than the actual sum. Let me explain this through an example.
Let’s assume that you three members in your development team named Ali, Sara and Jamshed. K represents the knowledge possessed and the brackets are meant so as who possess that knowledge. Given the above, the below is always true:
K (all team) > K (Ali) + K (Sara) + K(Jamshed)
So, the real job of management is to create an environment in which knowledge of one person is passed on in a way that it becomes a knowledge of the whole tribe. Code Reviews play a big role in achieving this.
I hope that by now you are convinced that you need to do Code Reviews. So maybe I can help you further on the how part of it where you can ask these questions:
How often should Code Review happen?
There are many school of thoughts on this. Some believe that all commits should be reviewed such as if it is not reviewed, it will not be in the production code. Others think that achieving this “all code must be reviewed” policy takes lot of effort and you should be selective in the process.
If you pick the first path, there are tools that can help you such that no code goes unreviewed. And if you like the second approach more, you’ll have to decide what is the selection criteria and who makes the decision if a particular piece of code ought to be reviewed.
Who should do the Code Review?
Just like the above, there are all and selective approach. Some teams like each member of the team to take the Reviewer role once in a while. In my opinion it helps to give this power to all rather than few. If only some team members review the code of other team members e.g. senior folks reviewing junior folks code, it will create friction in the team.
The exception is obviously special cases where you have Security and/or Performance experts in your team who’ll do more reviews than other members.
How long each review should take?
Though this largely depends on how big is the piece of code being reviewed but leaving this as an open time slot can both slow down the process along with the code author feeling being questioned for too long. You can set sort of “30 minutes or one hour rule” that each small code review should take no more than 30 minutes and each big review should take no more than an hour. These numbers are arbitrary and you have all the powers to set a limit of your own.
What the Reviewer should check?
Typically creating a checklist will help you which can have questions like the below:
- Are there any errors?
- Is code human readable?
- Is there good commenting?
- Are there tests written against this code?
- The coding guidelines have been followed
These questions vary from organization to organization and team to team. So, you should feel free to start with your own list and then grow this list as you see other cases.
All the best on your journey of making this world a better place by producing Quality Software.
Note: The above article was printed in Flash newsletter which is being reproduced for blog’s audience.
DevOps is a big thing these days and it is hot. Lot of people are talking about it and as we are going through this journey of adapting DevOps culture, we are learning it.
In recent month or so, I have given some introduction to fellow testers at my organization and shared thoughts at couple of Industry sessions. So I thought it would be good idea to share my slides here.
For further study, I’d recommend reading following content:
- “Effective DevOps” book by Jennifer Davis & Katherine Daniels
- Microsoft’s guide to DevOps
- Continuous Integration blog on Nutcache
- “A Practical guide to Testing in DevOps” book by Katrina Clokie
- Continuous Testing by Dan Ashby
- Moving to weekly releases e-book by Rob Lambert
All the best with your DevOps journey!
The signs of a growing IT industry in Pakistan are evident with the number of events that happen each year. One such event is the annual conference on Agile which saw it’s 4th edition on October 21, 2017. The Agile Conference Pakistan 2017 hosted by Pakistan Agile Development Society hosted over 300 professionals joining from around the country. ClickChain team from Bahawalpur lead by Muneeb Ali, Contour Software team from Karachi under Owais Ashraf leadership, and half the panelists from Lahore provided the best possible mixture of top IT talent at display during the day. The theme this year was “SCRUM in Pakistan”.
Do use it at Home:
For many commercials and TV programs with stunts, they have a message like “Don’t do it Home”. Naveed Khawaja who is a UK based Agile Trainer and Coach (and adjusted his schedule to be able to join us) mentioned that because he loves Kanban, he does all his personal work through it. He shared examples like:
- His 6 year old daughter using a Kanban board.
- His family using a Ramadan Kanban board.
- His (poor) carpenter being Kanban-ified for some renovation work.
The idea that I got is that if you like something, you can try it for both your professional and personal life. Interestingly, after I attended the 7 Habits training from FranklinCovey few months ago, I practiced the learning both at office and at home.
(more photos at: https://www.facebook.com/bendaoods/)
Purpose is more important than Terminology:
Sumara Farooq in her talk suggested that when SCRUM or any new initiative is implemented in an organization for the first time, the message can be simplified by focusing on the purpose or intent of doing something rather than using the terms. For example:
- A Storyboard is actually there to provide visibility.
- A Daily SCRUM is actually needed for better collaboration
This indeed can change the way you communicate with the real folks who’ll embrace the change if they like the purpose rather than saying that “we need SCRUM from tomorrow”.
Performance = Skill x Will:
Mohsin Lodhi was fantastic in his talk as usual and threw in lot of interesting ideas for anyone who manages people. His main theme was Servant Leadership such that what it is and how to achieve that. As he was setting up the idea, he shared the above formula which I really liked. I have seen many teams under performing because either they lack Skill or they are not Willing to do it. I learned that both of these are important and any leader should work on enhancing Skill of the team and buying their Will to perform.
Selling your idea is the main thing:
The panel discussion which had Naeem Iqbal, Naveed Khawaja, Faisal Tajammul, Shaima Niaz and was moderated by me saw a flurry of question from the audience. The question ranged from difficulties in SCRUM transformation, lack of top management support, collaboration issues, fear of failures etc.
A common theme that I saw in the questions was that you know why an idea is worth (say SCRUM) but you are unable to convince others. I call this a “selling problem” and as someone rightly said:
Selling is not part of the game, it is the game.
Unfortunately all of us, the IT guys think that if a solution is technically good, people would love it and buy it immediately. But buying something is more psychological than technical and we need to learn the art of selling. I am learning it too but have found one thing: if you target on the need of the buyer, you might be able to sell something. For example in case of SCRUM, consider selling cost saving and faster results to management and consider selling self-organization and better prioritization to your team members.
There were lot of amazing people that I met over the conferences and many interesting discussions (including Khurram Ali‘s talk on User Stories, Muhammad Ibrahim‘s research presentation on Scrum and XP) that I saw which I’ll try to cover in future posts. Before I go, want a special mention of Faiza Yousuf and Noorjehan Arif who came from Karachi for the conference and ran our twitter campaign. And yes, #AgilePK and #ACP2017 were the top trends on that day.
Thanks to all members of the organizing team which orbit around Naveed Ramzan, who worked hard to made it a memorable day. And Anum Zaib lifted the level of event host considerable from last year (last year it was me :))
What did you learn from the ACP2017 if you were there? And if you were not there, what else do you want me to cover about this event?