This post was originally published in spanish on march 21st 2021.
In today’s dynamics, and especially in cities with a high degree of urbanization, our lives literally depend on software. Systems and applications (in their different presentations) are part of our day to day and a significant portion of our productive, social, personal, and even family time passes daily (as information) through them.
The certainty of the correct functioning of the software is something that we usually take for granted; However, being software (still for the time being) product of humans, it is also subject to errors which can have many different sources and very diverse effects, from erroneous calculations, disabled functionality, inability to save changes, to information leakage. Depending on the context, such effects can go unnoticed or put people’s lives at risk.
For this reason, defect management (stay tuned for an upcoming publication on this topic) is a fundamental process in software development. Part of this management consists of analyzing the origin and causes of the defect (see Root Cause Analysis or RCA) to understand how it was created, what impact it had, what was the process for its solution and how similar errors can be avoided in the future.
In my experience with software projects of various industries, architectures, technologies, platforms, and magnitudes, I have compiled and developed a classification of defects related to their origin that provides the following advantages:
- Equalizes defects: it groups defects by causes, regardless of their consequences or the magnitude of their impact.
- Aids in prioritization: it allows a comparison of volumes between categories.
- Allows trend or pattern identification: for example, segregation into categories can make defect clustering more evident in modules and code segments, in collections of data or architecture components.
- Helps leveraging lessons learned: categorization serves as a starting point in the analysis and inception of countermeasures both corrective and preventive, for example related to the best practices associated with the category or that have been previously applied in defects of the same category.
- Promotes processes improvement: it serves as a basis for generating or referring additional corrective or preventive measures on a proven and common basis.
- Democratizes defect management: considering that defects are produced by the interaction of team members and not just a particular role.
The proposed categories are based on three fundamental premises:
- Causality: Attend to the causes of the defect and not to the consequences
- Segregation: Generate as little overlap as possible between categories
- Universality: Seek the grouping of all defects
As we will see in the subsequent discussion, cases may occur when it is not as straightforward for a defect to be in only one category; It is mainly for this reason that the proposed list is not intended to be exhaustive or definitive, but to serve as a guide for each team to develop its own categorization scheme so that it fits the context and conditions of their projects.
Categories for defect classification
Defects caused by data: this type of defect originates from the source where the information is stored, by its extraction or subsequent processing and/or deployment. For example: data stored in incorrect format (perhaps for internationalization purposes) or with inappropriate values (non-compliant to business rules and lack of input validation), wrong or unavailable digital resources (images, documents, fonts, styles), etc.
Defects caused by programming logic: these types of defects lead to problems in the functionality of the software and are usually caused because of a bad implementation of an algorithm or process that promotes such functionality. They can also be related to the lack of analysis of the input data, an incorrect interpretation of the requirement (data processing and process output), coding errors (lack of variable initialization, mispositioning of start or end elements of code blocks, etc.), incorrect use of loops or conditionals, etc.
Defects caused by SDLC processes: these types of defects are caused by the lack of tracking (or non-existent definition) of processes related to the generation and deployment of software code; for example, poor monitoring of coding standards, missing or incorrect deployment of related elements (such as additional code, configuration files, associated assets (images, fonts, pdfs), changes in the database schema or failure to properly merge changes into the code base (code override on integrations), etc.
Defects caused by incomplete requirements: these types of defects are related to ambiguities or omissions in the requirements that leave important aspects about the implementation and desired functionality open for interpretation by readers. For example, in non-mature agile processes it is common that the “Definition of Done” for user stories is not met, causing conflicts between the who is going to develop the functionality and who is going to test it. One way to identify these defects is when you start hearing phrases like: “I thought”, “what I had understood is”, or “since it did not say, I assumed”, “it’s obvious it should”, “as it’s always been this way”, during discussions between developers and quality engineers.
Defects caused by SDLC tools or environment: There are times when a defect arises even when both the code and data are correct and the processes were followed correctly. Many of them are caused by inconsistencies in the environments in which the code is deployed, perhaps due to misconfigurations, conflict in versions of required software (library dependencies), use of non-standardized tools for the development or promotion of code, etc.
Invalid defects: this category intends to group the “defective defects”, that is, those defects that after a review ended up being expected functionality or they are disregarded for not being part of a requirement or specification or even for being trivial in the current context of the project. In many cases these defects are removed from the log; however, it is important to keep them as evidence of both the effort invested by the person who recorded, project history preservation and in the cases where the test engineers made a mistake (so we keep metrics on defect for all roles, not just developers).
Defects Removed: this category groups those defects that will not be addressed by the team that recorded them (whatever the reason). On many occasions, teams loose visibility of the this type of defects, so it is important to maintain evidence of its existence and (in most cases) ensure its follow-up for the correct update of its status (for example if it is still open, in progress or closed)
As mentioned earlier, defect classification can be though (especially if you agree with the phrase: “everything comes down to mathematics”). Categories could be reduced to only “processes, processes, processes”; however, in the practice of software development processes can become very complex and this reductionist view considers that the “establishment and consequent monitoring of processes” prevents subsequent failures, so that, when a defect occurs it is due to the lack of monitoring or definition of a process, which is not necessarily true in all cases.
There are several software development models that work under the premise of the exhaustive definition of processes, but they require these processes and the teams themselves to be very mature. Consequently, it would be expected for number of injected defects to be lower in proportion to the number of defined processes; but there is no verifiable evidence of this relationship. On the other hand, in the agile world of software development, the formal and total definition of processes is simply not feasible.
The unique context of each software project means that the classification of a defect within one of the proposed categories can be relatively simple or very complex, take as an example the following case:
Imagine a scenario where a web page has an error in a legal text where some words have non-alphanumeric symbols. Being a relatively static text, at first it can be thought that it is a defect of content or related to the data but when verifying the storage (or the source) of the data directly the mentioned symbols are not observed.
Further investigation reveals that the text was extracted from the requirements management tool and inserted into a file using a copy-paste process; however, the text editing program used to update the text in the file proved incompatible with the UTF format, so the text appears to be visually correct but its binary representation in the file where it is stored causes the appearance of symbols when the web server extracts the text for rendering the page.
If there is a defined policy for the team on the use of tools for text extraction processes, then the defect would be caused by not following the SDLC processes; otherwise, it would be a defect caused by the SDLC tools and as a mitigation measure the aforementioned policy would be established.
It is important to mention that there is another wide variety of non-functional defects (for example security, performance, scalability, stability, etc.) that are not contemplated in the current classification but arguably could be incorporated into any classification by following the premises of the classification proposed earlier.
The dynamics of the software development universe is actually more chaotic and impregnable than it seems, so (at least for me) any element of guidance or support can make the slightest difference between success or failure.