- Oleksandr Sytnik
- 5 min
We are a bank, and we strive to be a technology company with a banking license. Our engineering teams have lots of empowerment and freedom in making their daily decisions. We have managed to conquer and rule our continuous delivery and deployment processes. How can we get more in control of our releases? This is the question we’re diving into in this blog, where we’ll share a few of the key results of this exploration.
Engineering culture in action
We had another edition of our Engineer’s week in October 2021. It was an action-packed week with a number of great presentations spanning technical and non-technical lanes, inspiring keynotes, lots of knowledge sharing and lots of fun. The takeaway message was loud and clear: we want to be a technology company with a banking license! We want to code a better world together!
The controlled rollout workgroup
Nevertheless, we cannot be just another technology company with a banking license.
We are a bank. And we need to comply with regulatory requirements.
We are a bank. And we need to provide quality services to our customers.
We are a bank. And we are secure and compliant, but we also want to be fast and competitive.
This year, we have set up the controlled rollout workgroup with the goal to raise awareness within teams about the great power and great responsibilities that development teams are entrusted with. We aimed to provide developers with insights on monitoring, observability, best practices and tools to create resilient and compliant software. In addition, we wanted to facilitate collaboration between teams in the scope of the controlled rollout process.
We had a couple of community sessions where we talked about how the team could set up a controlled rollout with feature flags and canary releases. And how they can get additional insights into their software by using advanced observability solutions and predictive monitoring.
Later, we conducted a series of interviews with engineers, solution architects, and business analysts. We identified the following ideas that would help to achieve our goals:
- Reduce human factor through automation
- Reduce impact of the changes by making them more atomic
- Reduce impact of the changes through the controlled rollout
- Prefer partial degradation of services over complete denial
- Develop with resilience in mind
- Leverage predictive analysis and AI-assisted operations
What is the best way to encourage a group of engineers to unleash their creativity? A hackathon! We set out to prepare for the hackathon with the focus on the following themes:
- Automatic rotation of keys and certificates
- Rollout with automatic rollback based on automatically collected metrics and user feedback
- Data-driven decision making and AI-based anomaly detection
- Usage of split.io for experimentation and controlled rollout
- NexusIQ: automated decision making and approval process
In addition, there was a free lane allowing teams to come up with their own ideas.
More than 50 engineers from multiple squads in two areas (Digital Platform Business and Consumers) participated in the event. The hackathon started with an engaging kick-off session on Thursday morning, spanned over the weekend and ended with a grand finale on Tuesday morning.
There were quite a few solutions presented during the grand finale session, ranging from proof-of-concept to almost production grade solutions. The ideas that we exchanged are going to set the stage for some really valuable changes in the way we roll out our releases. This is just the beginning!
The winning entry: rollout with automatic rollback.
We are using continuous integration (CI) and continuous delivery (CD). The CI–part is completely automated, but the CD-part of our current CI/CD pipeline requires manual approvals by production environment owners, enforcing the four eyes principle. This step is a safeguard and a formal requirement to deliver new functionality to the production environment. Also, the CD-part requires verification and manual decision making to scale the traffic. The winning team simplified this process by implementing a release pipeline with automatic rollback, so part of manual approvals can be removed without introducing additional risks.
By propagating the relevant metrics to SignalFx, configuring a detector there to start a feedback loop back to the pipeline, we can automate scaling decisions for any new release. There is also a possibility to use an AI-driven anomaly detection system, which is already being used by a couple of teams. This system combines a hierarchical model with Vector Quantized Variational Autoencoder (VQ-VAE), which will be a subject for another Tech blog article, so stay tuned!
The best thing about the proposed solution is that it’s using the technology stack that is already available (SignalFx, Java, Spring, Azure pipeline libraries) and widely used by many teams. Also, it was provided as a template, which makes it readily usable by other teams.
Key advantages of using an automated solution for the rollout process are:
- Reducing manual verification in production and hassle-free production deployments;
- Reducing outage time;
- Automation of the threat detection based on metrics during deployment.
What is next!?
Armed with all the cool, innovative ideas, we are now encouraging teams to continue working on a couple of selected solutions for their own features and bring them to production. As a community we will further strive to get more teams involved and create more ideas.
Overall, we may conclude that our first controlled rollout hackathon was a success! We would like to thank all the teams for their enthusiastic participation in the hackathon.
About the author
Oleksandr is a software engineer with over 16 years of experience in IT industry. He works on a team that is responsible for providing business customers with actionable insights into their transactions. He has a passion for automation and AI/ML technologies, and a keen interest in bridging the gap between maximizing engineering velocity and solving concerns related to cyber security.