Summary
Responsibility for building and maintaining AVEVA Data Hub is shared between the software development teams with codebase responsibilities for ADH core services and the Service Operations team. This platform consists of the Service Fabric, Azure resources, and the supporting core services and libraries.
Service Operations
The Service Operations team is a product-focused team, responsible for continued operations and uptime of the environment and substrate of AVEVA Data Hub. Specifically, this includes the following responsibilities:
- Identify and enforce best practices for pipelines deploying Aveva Data Hub services and resources.
- Monitor resource usage of the Data Hub production clusters and respond as necessary. This can include configuration changes or working with development teams to prioritize a product code change.
- Build additional Data Hub monitoring capabilities as needed.
- Create and maintain Azure resources (subscriptions, VPNs, certificates, etc.) used for dev/test Data Hub clusters as well as work with Global R&D Cloud Dev Ops (CDO) to ensure the production environment has its necessary Azure resources (e.g., overflow subscriptions).
- Enable development teams to create and destroy temporary or long-running dev/test clusters for Data Hub; build and maintain deployment automation tools as needed.
- Ensure quiet enjoyment of limited-lifetime artifacts, including certificates, keys, etc.
- Perform regular operations exercises (eg: quarterly cluster disaster recovery, weekly namespace recovery, etc.)
- Maintain deployment test framework (PIT). Development teams maintain the actual test suites.
- Track and enforce deployment test uptime policies.
- Report to stakeholders on Data Hub operation KPIs: uptime, cost, deployment metrics (successes, failures, time to recover on test issues, outstanding test issues.)
- Govern access to Data Hub privilege roles: (via PIM when appropriate) for temporary Data Hub cluster roles and Azure roles.
Development Teams
Development teams are assigned responsibility for the development and operation of core ADH technologies, ensuring best practices of the infrastructure supporting the cloud and on-premises. Where Service Operations is responsible for Azure resources, specific development teams are responsible for AVEVA Operations Information software identified as core to cloud operations. These teams ensure core technologies (identity, provisioning, secrets, logging, telemetry) that program teams can use, regardless of feature or component. Examples include the strategies for and libraries and services supporting logging and telemetry. All cloud development teams are responsible for using current versions of those libraries and following the strategies. Another example is the use of limited lifetime artifacts. Development teams devise the strategy for consuming the artifacts and any supporting source, all cloud development teams follow the strategy, and Service Operations supports rolling those secrets.
All development teams are responsible for the operation and health of any services and pages they deploy to any environment of AVEVA Data Hub (production, staging, etc.). They consume frameworks and libraries often provided by infrastructural programs, and they use tooling from Service Operations. The services for which they are responsible are assigned by the Software Development Department Leads or implied based on their current Program assignments.
Responsibilities for Development Teams include the following:
- Service Performance and Heath Monitoring
- Alerts (provide runbook for AVEVA Cloud Dev Ops and respond to requests for troubleshoting in a timely manner)
- Ensure Service Health (proactively monitor health and address related issues promptly)
- Incidence Response (Platform Health and Tooling Validation)
- Disaster Recovery and Business Continuity ownership relative to the services they maintain.
- Beyond monitoring, dev teams are expected to push “hotfix” releases to resolve performance or other operational issues.
- Troubleshooting when service specific issues arise, e.g., be stewards of alerts and notifications coupled to their services.