In the past, it was difficult to gather reliable data about our software development process without interfering with the process itself. This created three types of problems:
First, the process of actively gathering data adds an addition layer of friction to the software development process. This additional friction impedes team productivity and often entails duplicating effort (e.g. we perform a task and then have to record data about the task that was just performed).
Second, the process of actively gathering data introduces human biases into the process. I’ve seen numerous occasions where developers, aware that they were being observed, began to game the system in ways that make the data being collected appear more favorable to them. In addition, I’ve also seen project managers skew numbers to make their teams look better to their direct reports.
Third, these data being collected about the software development process are often used in a punitive fashion. Essentially, data are being used to reprimand individual team members or whole teams rather than to provide the teams with the information they need to continuously improve over time.
With the emergence of the DevOps movement, there has been an increasing emphasis on automating the software-delivery pipeline. The primary purpose of this automation is to facilitate CI/CD/CD (Continuous Integration, Continuous Delivery, and Continuous Deployment). However, this automation produces a continuous stream of actionable data as a byproduct of this automation.
In data science, we refer to data generated as a byproduct of a process as data exhaust. This data exhaust has significant secondary value beyond its primary purpose of facilitating the automated software-delivery process.
Essentially, as a byproduct of the application of DevOps to our Agile software development process, we now have a constant stream of reliable low-cost data about our process that we can begin to analyze. These data are low-cost, because we’ve switched from high-cost active data collection to low-cost passive data generation. In addition, these data are more reliable because they have been generated by the process itself, which makes them less sensitive to human biases created by a manual data-collection process.
As a result, we now have the reliable low-cost data necessary to begin applying the practice of data science to our software development practices. As a result, I believe we are beginning to see the emergence of a new software development movement, which for lack of an existing term of art, I call DevSci (i.e. Software Development Science).
DevSci is the application of the practice of data science to the Agile software development process, made possible by DevOps. Essentially, once we have our automated DevOps pipeline in place, we can use data generated by DevOps to continuously improve our Agile software development practices.
Now, questions that would have been difficult to answer a decade ago, are simply a matter of querying the data exhaust from our DevOps pipeline. For example:
But it goes way beyond this… I’m beginning to see data science and machine learning applied to various aspects of the software development process in very novel ways. I don’t think we’ve even begun to scratch the surface of how data science will transform software development.
I think we’ve already seen this same transition occur in other industries, like manufacturing. First we had our craftsmen building high-quality products in small teams and one-man shops. Then we automated the manufacturing value stream via factories and assembly lines. Finally, we’re now applying data science to the manufacturing process and radically transforming the way products are being made. I believe this is analogous to the software industry’s transition from Agile, to DevOps, to DevSci.
Ultimately, what I’m predicting is that the next movement in software development is the application of data science to our Agile software development practices, made possible by DevOps. In addition, I believe that this DevSci phenomena will fundamentally transform the way that we create software, from the inception of new features to the continuous delivery of those features.
Whether we call this new-fangled approach “DevSci”, “Software Development Science”, just an extension of Lean Software Development, or something else entirely, I’m relatively confident that there is a real phenomenon beginning to emerge in our industry. Giving it a name just helps us to have a rally-point that we can gather around. In addition, it gives us a search term to help coalesce this set of practices and a community of like-minded individuals.
Please keep in mind, however, that Agile values individuals and interactions over processes and tools. So it is critical that we do not strip away the human element from the practice of software development in our attempts to quantify, analyze, and improve our software development practices. These data science practices must be created to enable teams to continuously improve themselves; not for management to continuously monitor and punish the Orwellian cogs in their system.
DevSci needs to be about humans using data science to create better software… not about software using data science to constrain humans.