Discover more from Engineering Primer
Unveiling success through platform engineering
Platform Engineering Primer, Part 1. The first in a series of articles exploring how developer platforms benefit modern businesses.
Join me in examining the intricacies of platform engineering. Let's take it apart and find the effective strategies, the most practical ways of working and the technologies that fit your needs and circumstances. Above all, I want to explore how internal developer platforms are the gateway to modern enterprise success.
Since this is the beginning of our journey, I feel the need to say a little bit about myself and what motivates me to write. After all, some of you may be sticking around for the whole series, so it's only fair that I introduce myself, right?
I've spent seven years immersed in the world of platform engineering. From embracing new technologies and building teams to refining workflows and streamlining tech stacks, it's been quite a ride. I've had my fair share of tough moments and sub-optimal decisions that have almost left me and my clients sinking in technological quicksand! But the thing is, we've always managed to stay afloat. These experiences have given me a rather unique perspective. They have sharpened my critical thinking and strategic skills and enabled me to see many of the pitfalls and opportunities that lie ahead.
Working with brilliant minds across various sectors has broadened my perspective. I am unwilling to keep the knowledge I've acquired to myself. My objective has always been to utilise technological advancements to enhance our lives instead of getting in the way of it. Platform engineering offers a unique opportunity to do just that, which is why I write these articles and eagerly await your feedback.
One final note: I'm all about keeping it simple. I write in plain, everyday language because I want this to be a smooth ride for everyone. My goal is to make sure that even if you're not knee-deep in technical stuff, you can still dive into this series and come out with something useful.
So, with that out of the way, let’s begin.
Where are we and what got us here?
It’s as good a place to start as any ;)
Well, it's safe to say that DevOps is pretty much the standard these days. Most of us in the industry have jumped on the bandwagon.
DevOps came about as a response to the growing burden on engineering teams. It's been the method of choice lately, with the whole "you build it, you run it" concept becoming the norm. The only catch is that it hasn't scaled as seamlessly as we'd hoped. Unfortunately, it requires specialised skills that are difficult and expensive to find. The bigger the organisation, the more teams duplicate time and effort by doing the same thing in different ways. In response to this problem, organisations are increasingly focusing on engineering enablement as a way to increase developer productivity. They're providing the right tools, streamlining processes, and diving into automation and self-service offerings. And guess what? Internal developer platforms are coming into play to make the whole cloud journey smoother, especially at scale. It's about giving developers the freedom to deploy and manage software without breaking a sweat.
Developer platforms streamline existing processes and reduce time to production. They help organisations address the challenges of infrastructure complexity, which has continued to grow under the DevOps approach.
You may disagree, and that’s OK ;)
If you have a different opinion, or feel that I have missed something important, please leave a note in the comments. It's my aim to answer every question people leave there.
What if you look at it from a different angle? You might think of platform engineering as a niche hobby that only engineers and techies really understand. But what about the real business impact? And how can those of us with technical backgrounds make these potentially valuable results crystal clear to everyone in our organisations?
Well, fear not! There is a section in the middle of this article containing the answers to these questions and more…
What is the cost of not having a developer platform in place?
In what ways do developer platforms benefit organisations that implement them?
What are the potential pitfalls?
How do we communicate all of this clearly?
If you want to jump to that section, scroll down to “Is platform engineering worth the investment?”
Otherwise, let’s continue by looking at the state of platform engineering today.
Platform engineering: the current stage of evolution
It was a great moment in my career: We implemented a centralised platform to reduce lead time to production from around six months to less than two weeks. In other words, we reduced project ramp-up time by an order of magnitude - 10 times.
This approach has huge implications for development teams, who have traditionally struggled in this area due to a lack of time or expertise to deal with the burden of a modern cloud infrastructure.
The split between time spent on infrastructure and applications will vary depending on team structure and organisational expertise. It's always been a trade-off, as it's increasingly common for teams to spend more time managing infrastructure than developing new features. Organisations that adopt a platform engineering approach can dramatically free up delivery teams to focus on their core business, while leaving the configuration and management of the infrastructure in the reliable hands of dedicated experts. This is illustrated in the diagram below.
As you can see, with a platform engineering approach, the infrastructure-related work is becoming more predictable and there is much less variance of unexpected activities.
When it comes to the actual data, platform engineering has been classified as an innovation trigger in the Gartner Software Engineering Hype Cycle. The chart below shows a 2-5-year horizon for platform engineering to emerge and be fully accepted by the mainstream market. Beyond the fevered lens of hype, platform engineering is starting to be taken seriously for its direct benefits and enabling qualities. This shows that organisations are aware of the need to improve developer efficiency and address the increasing cognitive load of engineering.
Building a centralised platform is also a viable option as part of a company's digitalisation journey. Modernising existing IT systems and adopting a new engineering mindset on a large scale is difficult. That's why you need some level of consolidation and a central place to facilitate change. For me, platform engineering has always been an impetus to change culture and processes first, and technology second. When we talk about digital transformation, this may be a more tangible implementation to consider.
Platform engineering emerges
Let's take a look at two problem statements that express the concerns that eventually led to the emergence of platform engineering. These should be familiar to you if you're in a digitally intensive business.
The statements come from a top 10 global grocery and general merchandise retailer that processes thousands of transactions per second, supports the delivery of more than 1 billion items per year, has nearly 200 development teams, and an annual European turnover in excess of £50 billion.
"It takes about two months of initial effort to set up your own infrastructure. In addition, it takes around two weeks of effort per month from an experienced DevOps engineer to maintain the infrastructure."
"Development teams reported up to three weeks to complete day one activities such as onboarding, setting up access and CI/CD before they even start coding."
These concerns, echoed by many organisations, have led to wider adoption of cloud-native technologies and modern development practices, improving time to market. The idea of building internal developer platforms came from the same desire for change.
How are internal platforms for developers actually built?
When it comes to building internal platforms for developers, let's look at two scenarios that companies often find themselves in.
Platform engineering: an evolutionary approach
In 2016, three colleagues and I embarked on a new client engagement. Our main goal was to implement a new retail fraud detection service. Shortly after the start of the engagement, we noticed that the developers in this area were struggling with the stability of the cloud infrastructure, resulting in frequent accidental downtime. Each week, the Technical Programme Manager (TPM) would present a spreadsheet detailing the revenue loss associated with service interruptions. Thousands of pounds were being lost each time.
We assessed the current situation and, rather than build yet another application, we began to build a shared infrastructure for the five teams in this area. The cloud-native technology landscape was gaining traction. It was the perfect time for a bit of innovation. We launched the first instalment of self-hosted Kubernetes on AWS, creating a shared environment that supported full infrastructure automation, data storage, secure access and monitoring. After this had been in production for a year, the engineering director responsible for the company's global infrastructure took notice. He had previously struggled to comply with regulations and manage the costs associated with the public cloud. To address these challenges, we created our first platform engineering group. Over the next year, we scaled our solution to support hundreds of developers.
I'd say this is the most common scenario for how companies build platforms. It starts small: a few engineers build a solution that multiple teams adopt. Over time, they come up with the idea of sharing it with the rest of the company. This is a perfect example of how real innovation can come from a handful of people working under some form of constraint. A key consequence of this approach that we've seen is that it's hard to scale your existing platform to a larger number of teams because it was never designed to do so, both in terms of architecture and support.
Platform engineering: a top down approach
The second scenario is more top-down: when a company's leadership recognises that it is facing challenges with its current operating model and decides to make way for the new platform. The problems they see are typically
Inefficient infrastructure lifecycle management, leading to unintended downtime or teams struggling to keep up with upstream changes.
Long lead times to production, where the infrastructure takes longer to spin up than expected. Lots of manual hand-offs or time-consuming technology integrations to deal with.
There are always compliance and security gaps that need to be addressed. However, organisations find it difficult to implement changes quickly. It takes time to coordinate changes and fine-tune standards.
High infrastructure operating costs are often a top priority. Engineering teams make suboptimal architecture or technology choices just to stay under the radar.
From what I've seen, most companies facing these challenges don't have the technical capability to build an internal platform. Naturally, they start looking at off-the-shelf products, most of which are not a good fit for the specific way their organisation works. This is particularly true in terms of existing processes, integrations, ways of working or the level of customisation required. In fact, most companies, especially larger ones, are still building their own internal platforms.
The way I see it, the issue of building vs buying platforms is part of a larger discussion that will be covered in a separate article.
The transition from DevOps to platform engineering
Taking a holistic view of how organisations can move from a DevOps approach to platform engineering, the steps could be as follows.
Question expectations. Invent something new without insisting that it meets outdated or ineffective standards. Do it within a team or smaller group, as doing it on a large scale straight away is tough. Sometimes, shadow projects become the new normal. You never change things by fighting the existing reality. Instead, build a new model that makes the old one obsolete.
Introduce automation by standardising the technology stack so it can scale later. Do it in a single team and focus on fixing bugs. Once things are stable, you can extend it to other teams in the same area and across the organisation.
Distribute automation, provide code and usage guidance, and support developers as contributors. Build a community of developers around your solution to scale it exponentially. Many companies engage their engineers in collaborative work using the inner source approach.
When you are confident in the current design, create reference architectures and have them tested by multiple teams. Provide a baseline implementation for other teams who are not domain experts. To make it work for your organisation, we think about the overall structure and design. We also take into account any known trade-offs.
To help developers improve their release management and host workloads, we offer a managed platform. It's secure and reliable, built on top of the existing solution. Create a dedicated team of experts to run the platform and support other engineers.
Popularise best practices by sharing documentation, samples and success stories. If you have enough money, consider making a special team to help other teams use the platform and solve problems. This team does not need to maintain a separate backlog, as it can embed itself in other groups as a service. Read more on this in Team Topologies by Manuel Pais and Matthew Skelton.
Is platform engineering worth the investment?
Let's briefly look at the benefits and returns you can expect from investing in a platform engineering approach.
Decades ago, a few bigger technology companies identified the critical factors:
Self-service developer UIs
The right level of abstraction
A dedicated team to manage everything as a platform.
This approach enables developers to build, deploy and operate applications more efficiently and with higher quality.
And here's the bottom line: Companies that nurture their developers perform five times better. The main reason is that it's increasingly common for teams to spend more time managing infrastructure than building new features. Take a look at the chart below, which clearly shows this performance gap.
By adopting a platform engineering approach, organisations are successfully bridging the gap between infrastructure and development teams, resulting in increased collaboration and productivity. Employees have a sense of a broader developer community, which boosts their agency. It also helps to retain key talent.
Ensure the platform engineering initiative has a sound footing
There are two fundamental questions that need to be answered before committing to a platform engineering initiative.
First, consider whether platform engineering is the right approach for your organisation based on the challenges you face:
The size and complexity of your business. Would economies of scale benefit your organisation? Would centralising resources create efficiencies?
Operational excellence. Are service outages, undetected errors or problem resolution times an issue in your organisation?
Goals and direction. What do you want to achieve? Standardised infrastructure? Improved security? Reduce time to production? Optimise cloud spend?
Current operating model. Are there differences in the way teams within your organisation approach software quality, security or deployment? Do your teams face bottlenecks or slowdowns due to a lack of standardised tools or processes?
Let's face it. In some cases, the do-it-yourself approach is better, costs less and gives your developers the freedom of choice that really works for them. Putting people through another migration is not cheap either. I've seen many examples of platform development being treated as a pet project and failing because the platform wasn't built for clear reasons. As a result, the platform team needed more direction and platform adoption was disappointing. In contrast, I have seen heavy and opinionated platforms that limited rather than empowered users. In addition, scaling the platform and supporting this growth in usage by increasing the number of operational team members, many of whom lacked a software engineering mindset, meant that the initiative was not successful in the long term.
Second, make sure you hire people who have the ability to execute the vision of the initiative. If you're considering building a platform, it's highly likely that your organisational routines and existing processes will inadvertently cause it to fail. Don't give up! Building a platform at scale requires people with multidisciplinary skills and a very different mindset to doing it on a smaller scale. These people need to be passionate about solving other people's problems, customer focused and not afraid to take responsibility for running a large number of services in production under some form of SLA. Not to mention streamlining existing processes and tackling drudgery and friction. Identify the go-to people in your organisation. You know who they are, everyone wants their time.
Platform engineering is an organisational effort that requires significant up-front investment. It must be approached strategically to realise its value. It typically takes up to a year to start seeing tangible results. To deliver the results, you need to have a consistent set of guiding principles for the platform, as well as executive buy-in. This is where a mission, vision and roadmap come into play.
Now that you've identified a solid opportunity for your company, you'll probably want to talk to your colleagues to get them on board, and help them understand what's at stake. The next section will help you prepare for this.
Communicating your platform’s mission and goals
A mission statement helps to guide our day-to-day decisions and actions. From making architectural changes and shaping the product roadmap to designing end-user interactions.
When it comes to platform engineering, I think it’s fair to say that most platforms will share a basic mission statement similar to this:
We aim to make it easier and quicker for developers by offering a complete and standardised environment that's also secure, compliant and cost-effective.
The commonality of this mission doesn't mean it's not valuable. Each platform must meet the objectives of cost, reliability, security, technology and developer efficiency in slightly different ways. Managing the platform requires a delicate balance between the needs of the developers and the realities of the organisation.
There are many factors that go into balancing these goals. Let's examine the relationship between these goals in the context of creating a mission statement for your platform.
Keeping costs under control is the most important thing in most cases. Especially today, as we all face an economic downturn, where literally all companies have started to look at cost optimisation. Some of them are trying to control public cloud costs, while others are thinking about cloud repatriation and moving workloads back to on-premises. In both cases, a well-designed platform can reduce costs through resource utilisation, limiting software licences, reducing accidental downtime, or creating shared services that can be used by multiple applications or teams. It is important to identify what the key cost drivers are in your organisation and to include them in your mission statement or goals.
Ultimately, we all want to run production services in a secure and reliable environment that meets SLA requirements. However, it's often the case that developers don't keep the infrastructure up to date. Lifecycle management is a complex process and can lead to downtime. As a result, Meantime to Recovery (MTTR) becomes less predictable. All of these issues lead to production incidents. Having a dedicated team of experts to keep the lights on definitely helps to mitigate the risk.
Most developers are not security experts. Home-grown infrastructure (network, access rights, workloads) is not continuously tested for info-sec requirements. With platform engineering, every workload can be compliant at a non-functional level from day one. With platform engineering, it's much easier to introduce new policies and compliance requirements without disrupting the user experience, and it doesn't require as much effort as a do-it-yourself approach.
Technological advancement has always played a significant role in building a competitive advantage in the marketplace. It's fair to say that platform engineering facilitates the adoption of new techniques, helps keep pace with emerging technologies and brings new team topology patterns. However, organisations need to be prepared for such drastic changes. Often, new practices can be introduced without considering the time and effort required by people to integrate them into existing processes and working habits. That's why defining the enablement plan is one of the key success factors of platform engineering. This involves a lot of work to raise awareness of the value of platform engineering, to train people and to show them how to navigate a new landscape.
When it comes to efficiency, bridging the gap between infrastructure and application space is becoming top priority. To give you a more tangible perspective, the platform provides necessary interfaces and processes for developers to self-serve themselves and work in a consistent way. At the end of the day reducing cognitive load in engineering.
By now you probably have a mission statement in mind that fits your business pretty well. Before we go any further, let's look at two examples from my own experience to keep the inspiration flowing.
"Our mission is to converge disparate infrastructure solutions to improve our technology efficiency, economies of scale, technology portability and security.
To do this, we're creating an opinionated, standardised, enterprise-integrated architecture and sharing it using internal open source software. We are also hosting this infrastructure for teams and providing infrastructure operations support for teams that have traditionally struggled in this area."
"Building a deployment platform for production and non-production environments that spans multiple cloud providers and data centres, freeing engineering from day zero and day one activities to deliver solutions for customers and developers, and making it a joy to use."
How do internal developer platforms mature over time?
Based on my own journey in platform development, it's pretty clear how platforms come to life, grow and eventually find their footing. From my perspective, the patterns of platform evolution tend to follow a familiar path. Building platforms often means encountering the same iterations over and over again.
People are starting to realise that the problem exists and needs to be addressed. The do-it-yourself approach is no longer scalable or cost-effective. Organisations begin to explore the problem space, either on their own or by engaging external consultants.
This phase involves discovery and interviews with teams and their managers. The result is a synthesis of the existing tech stack, identification of bottlenecks and friction points in day-to-day operations, and recommendations on how to address them.
Mission and vision
This is the time when product and engineering work together to build a strong case for why senior management should invest in the platform engineering idea. This usually results in full executive buy-in or enough funding to build the prototype.
Mission and vision are treated as guiding principles during the platform development process. Having clarity on this helps to push back unnecessary work and avoid the situation where the platform is "all things to all people".
Now comes the execution. The platform engineering team starts with the initial architecture design, technology choices and documentation of the concept model, which can be validated with technical leadership. This results in a prototype with a limited set of capabilities to demonstrate value and trigger thinking about actual deployment. Typically, if successful, this is enough to release funding for the next project milestone.
From an engineering perspective, this phase focuses on overcoming scaling issues, implementing key capabilities and achieving higher maturity in terms of processes and non-functional aspects such as disaster recovery, incident management, security or documentation. Most of the gaps and friction points are addressed during this time by working closely with the application teams.
From a leadership perspective, this is the time to start assessing the value of the platform, typically based on the number of teams or applications on board. The financial viability of running a platform engineering team is also often explored.
Eventually, the nature of the platform engineering work becomes more operational. The platform engineering team keeps the lights on, and does no substantial product work. Sooner or later, the platform is seen as old, part of a routine or less strategic. There are several reasons why platforms become contained. The most common are:
Leadership decides to build something new to keep up with the ever-changing technology landscape
Mismatch with current strategy
The platform is too costly to operate or adoption is declining.
Once the platform is contained, the natural next step is to plan for decommissioning. This phase is mainly focused on migrating the platform and untangling dependencies. This always takes longer than originally planned. The nature of the work involves coordinating and discussing migration timelines. In some cases, supporting technology integration activities. Finally, everything is removed and the platform no longer exists.
Building an internal platform for developers is a rollercoaster!
Platform development is a very different beast from your typical application development project. It can be quite a bumpy ride, dealing with existing bottlenecks and resistance. Trying to automate these clunky, manual processes to deliver a smooth user experience is a real test, not least because people in the organisation tend to raise their shields when they encounter the unfamiliar.
Your success will depend on others, and you'll work hard to create the visibility needed to drive adoption of your platform, often failing to convince some very opinionated people. You will gain trust by helping other engineers through difficult situations. This will often take you out of your comfort zone. It is about understanding a problem space beyond the boundaries of the platform itself, which may include technologies in use, integrations, processes, custom code, and so on.
Defining the boundaries of the platform and diplomatically pushing back on incompatible ideas will be inevitable and difficult, so there will always be a risk of scope creep.
Now, let's look at some of the pitfalls you and your colleagues are likely to encounter and should be aware of.
Platform engineering pitfalls
While it's easy to come up with success stories that are truly dazzling, let's not overlook the importance of critical thinking in delivering a healthy new platform. The following pitfalls can act as useful reminders of critical thinking’s value.
Platform engineering is not about optimising problems that should not exist in the first place. Inefficient processes or broken services need fixing at the source of the problem. For example, you can't fix underlying DNS resolution issues by shifting them to the platform engineering team. Doing this leads to suboptimal platform design, poor developer experience and wasted engineering potential.
Choosing the right time to build a platform is an important decision. Suppose the technology stack isn't standardised enough, or the current operating model hasn't been fully assessed. In this case, it's too early to build a platform. If most teams have already modernised their infrastructure or recently completed migration efforts, then it's too late. In either case, there's a risk that a newly built platform won't solve the right problems or will be too expensive to deploy.
Multiple platform engineering initiatives often lead to internal competition and politics. People tend to protect their castles. It's an unhealthy situation to have two platforms competing for the same user base. To address this, create a robust workload allocation process. Suppose you are still facing the complexity of running multiple platforms. In that case, there is an excellent opportunity to specialise platform engineering efforts on those unique and different requirement profiles, for example, public and private cloud.
So, why do successful companies invest in platform engineering?
Without platforms, each team would be faced with trying to solve the same problem. Taken together, these teams would find themselves solving the same problems over and over again. This inefficient approach is all the more likely when you consider that no team can afford to spend a lot of engineering time collaborating with other teams when their focus is on delivering business value.
The other key observation in today's world is that organisations are starting to develop true hybrid strategies to support the ever-growing data and number of applications, and to optimise previously overlooked cloud spend.
Platform engineering is the solution for efficiently running mission-critical enterprise software in production without compromising the pace of innovation.
As digitally-intensive organisations explore the promise of platform engineering, they begin to see the broader implications for their entire business. They see how internal developer platforms and the agility they bring act as a gateway to modern business success.
Now that you've read this article, I hope you have a more complete view of the platform engineering landscape and its advantages. Everything in the above article is based on my own experience and from my individual point of view, which is why I want to ask you:
What's your perspective on platform engineering?
Do you have any interesting real-life stories to share?
What new dimensions did you discover while reading?
Feel free to write to me directly or leave a comment here. I will make sure all your comments and questions receive a thorough response.
Thank you for reading Engineering Primer. Subscribe to receive my next post. It's free and helps support the work I do.