Below is an overview of the release maintenance process. The complete process is documented under QM35 Maintenance Release Process in Azure Dev Ops.
The maintenance process is initiated with the creation of a Bug work item, upon being reported by from any credible source. Those issues may be found internally, by a researcher, or by a customer who is acutely experiencing the problem. The Operations Information R&D department - every development team, every product, and every program - is partnered with our Support department and has adopted a single, streamlined maintenance process.
Three Step Maintenance Process
When bugs are reported, the Engineering Manager of the responsible Development Team documents the bug in a work item that is added to the Product Backlog, just as any other product work would be defined. Tag these work items with Maintenance. This is required by Engineering Program Managers for work management and reporting purposes.
Once the bug item is created, its severity is assessed based on the actual and potential disruption the bug brings to customers’ business. The Technical Product Manager works with the Engineering Manager to determine the severity of the bug. Severity is set at one of four levels - Low, Medium, High, or Critical. Any bug deemed as Critical, will get the highest priority warranting immediate allocation of resources.
R&D and Support evaluate an issue on two common factors - impact and workaround availability. R&D and Support's assessment of issues diverge on a third factor: affected customers. Implicitly, any call about an issue is already 100% likely, for that customer. Therefore, Support's third factor for criticality is Time Sensitivity.
For Support, an issue is critical if:
For R&D, we must consider the likelihood of the user base encountering the problem.
Criticality Score = ( Impact + Workaround Availability ) * Likelihood;
Based on that formula, work items are marked with the following severity:
Score each of the following factors on the scale shown. Then determine the Criticality Score using the given formula, and assign a Severity Level to the bug in the relevant Backlog Items.
Impact (0-3) – The extent of the impact to customers’ data infrastructure if they hit the bug
High (0) | Medium (1) | Low (3) |
---|---|---|
Crash | Performance | Usability |
Deadlock | Efficiency | Documentation |
Wrong data (collected or calculated) | Scalability | Consistency |
Wrong search results | Calculation/Logic | Workflow |
Data loss | Grammatical | |
Archive corruption | Cosmetics | |
Data integrity | ||
Exploitation of a security vulnerability |
Remediation Level (0-1) – Availability of a workaround that lowers or eliminates the probability or impact of the software bug, without any fix/change to the code
Likelihood (1-4) – The indicator of how probable and easy it is for the users to hit the bug. It will be presented as the number/percentage of customers who have hit the bug already (associated number of support cases) or are anticipated to hit it in the future (anticipated behavior of a new AVEVA Operations Information technology). This could be estimated across all verticals or could be focused on one specific industry.
Occassionally, understanding the root cause of an issue is not possible without providing an instrumented development build to a user. This Maintenance Release Process allows for these builds when assessing severity or developing a hot fix. These builds may be distributed on a limited basis to users in accordance with QM04 Product Release Types & Versioning.
When sharing an instrumented development build with a customer, QT07 Release Artifact Register must be completed and sent to the Business Unit Leadership Team and Product Quality Owners. The former would like to be aware of potentially critical issues and the reason we need to send this. Product Quality Owners need evidence of what builds were sent out to customers.
Operations Information R&D has structured the development of all products to be able to quickly address critical problems. Ideally, we are able to produce a fix quickly and make that fix generally available just as quickly. Practically, we often need to work with customers to determine the root cause and provide them with a Hot Fix sooner than an update can be made generally available.
Hot Fixes are Limited-Availability releases that are made when:
Due to the critical nature of the problem and impact on customer operations, Product Quality Owners do not require a full release submittal or proof of compliance to all procedures. However, all Hot Fixes and Instrumented Development Builds are subject to all R&D Policies. Evidence of compliance with those policies is necessary via QT07 Release Artifact Register.
Once a Hot Fix is completed, work immediately begins on General Availability releases for all actively developed versions, which typically are the main/master branch, the current version, and any long-term serviced versions, such as those included in the PI System for Critical Operations. The main/master branch will release the fix with the next General Availability product release (e.g., a feature-rich, new version). For the current and serviced versions, the fix is made as a General Availability Update (e.g., a Revision, Patch, or Service Pack).
Our final responsibility is to ensure that no other customers experience this problem. R&D is committed to fixing critical issues as Job 1. Customers are entitled to a fix for critical issues, now. We do not ask them to wait for an upcoming feature release to get problems resolved. Any issue that receives a Hot Fix, or has been deemed Critical by R&D (using the Criticality Scoring System), will be fixed immediately in all actively developed versions of a product - typically the most recent release and any version Product Management has included in the PI System for Critical Operations. This must be an update to all actively developed General Availability Releases.
The Engineering Manager (of the responsible team), Technical Product Manager, and a Staff Developer create the work items needed to fix the issue and release the affected software. Specific, written consideration must be given for how this fix affects related products – and the need to update and release those products.
The Engineering Manager, Technical Product Manager, and Staff Developer for the related products must create work items in the Product Backlog for those products.
Once the all the work for all the affected products is defined, all the work is estimated (in story points) by the responsible Development Teams. Those estimates are converted to developer-hours based on the Development Teams’ established velocity.
When the estimate is completed, an executive summary of the problem, the required work, and the estimate of that work is provided to Engineering Program Management via QT08 Critical Issue Report.
Implementation of critical fixes takes priority over meeting a sprint goal, but for large estimates, it may be necessary to take a more measured approach. Engineering Program Management will consult with the Business Unit Leadership Team and provide direction on whether the responsible development teams will begin work immediately, thereby cancelling and re-planning the current sprint, or if the current sprint should continue and the work for the critical fix should be implemented as part of an “Emergency Program.”
For all non-critical bugs, the work is defined and estimated the same as all other work. Once that work is defined, the Engineering Manager, Technical Product Manager, and a Staff Developer decide on the priority of the fix.
Implementation of non-critical bugs is done in the normal course of work. This maintenance process does not mandate when Updates are developed and released. This is left to the discretion of Product Management with courteous inclusion of Program Management and other Software Development Leadership. Every R&D Program budgets development time and resources for technical improvement, technical debt reduction, and fixing non-critical bugs.
We're leveraging the ADO existing hierarchy to see how well we're aligning our actions with our priorities. With a few changes, we can use an evergreen Program to get the insight we need.
A placeholder Program, PI System Updates & Hot Fixes (No Program), will live alongside usual programs in the Program Backlog. This familiar structure allows management to explore what's in development using existing ADO and Power BI dashboards, queries, reports & tools.
Product Updates & Hot Fixes are customer-facing. Accordingly, we're adopting the marketing product segmentation for the PI System as themes for this new Program.
For example:
A new work item type, "Maintenance Release", has been added to ADO. It lives at the same hierarchy level as Epics, but exclusively in Product Backlogs, while Epics continue to live in Program Backlog.