Navigating platform engineering: the big picture

Platform Engineering Primer, Part 5. The fifth in a series of articles exploring how developer platforms benefit modern businesses.

May 08, 2024

Join me in examining the intricacies of platform engineering. Let's take it apart and find the effective strategies, the most practical ways of working and the technologies that fit your needs and circumstances. Above all, I want to explore how internal developer platforms are the gateway to modern enterprise success.

In our day-to-day work, we may interact with people who don't have platform engineering expertise. Or you or I may find ourselves in a situation where we are about to join a platform engineering initiative without a sufficient understanding of the subject. How do we navigate this space so that we do not slow down the engineering team or even unconsciously sabotage their efforts?

I remember several times when it was almost impossible to conclude a discussion because the people leading the project needed a deeper understanding of the specific domain. We wasted a lot of time talking about basic concepts that everyone could have been briefed on earlier, yet this wasn't made part of the plan. People tended to rely on their initial viewpoints rather than considering broader perspectives. This often resulted in frustration, inefficient use of time, and eventually a lack of trust. A more structured approach could have been used to bring about a collaborative and trusting environment.

As in any field, it takes time to develop the right judgement to contribute in a meaningful way. Platform engineering is a relatively new concept, so we often need to educate people about it. I strongly believe that platform engineering success is directly related to how close key people in the organisation are to core work at the beginning of the platform engineering journey. While getting these people on board, upgrading your own orientation within the platform engineering landscape can help immensely.

Let’s admit when it comes to building an internal platform for developers, there are many approaches and none of my articles can fully convey what it is really gonna take in your context. However, I do see value in giving you a mental model to use while implementing an internal developer platform. This is something I find essential and use when educating others around me.

If you are an engineering manager, architect or team leader, my work is intended to help you in this educational endeavour. Times when this article will be useful include:

Get everyone in your organisation on the same page
Improve your clarity so you can get others on board
Gain an understanding that allows you to make a meaningful contribution to the initiative

Finally, if you are still exploring whether or not platform engineering is a good fit for your organisation, have a look at my earlier article, Unveiling success through platform engineering, which explores this topic.

Let's start by familiarising ourselves with some basic concepts.

Key concepts around platform engineering

Building a clear mental model around platform engineering is an essential first step which brings confidence and enables people to contribute. Every platform will have a different set of priorities, architecture considerations, and unique people behind it. Personally, I start by understanding basic concepts and then gradually zoom in on specific areas that are closely related to the work itself.

What is platform engineering?

Understanding platform engineering is highly contextual, depending on perspective: adopting a platform as a product, using platform engineering services or a development within the company. As organisations perceive it in slightly different ways from each other, here is the interpretation I use in my daily work as Head of Cloud Engineering at VirtusLab.

Platform engineering involves designing and building toolchains and workflows that enable self-service capabilities for teams of software developers. The aim is to deliver an integrated product, commonly known as an 'internal developer platform', that provides the operational requirements for applications throughout their lifecycle.

Often building on cloud capabilities, platform engineering initiatives are there to support organisations throughout the lifecycle of their platform. The platform engineering team takes a holistic approach to ensure that the product is aligned with the business strategy and objectives. They also make sure they fully understand the organisation's operating model, the technology landscape, and the people who deliver the software the business depends on.

Building an engineering team to design the platform and deliver the capabilities can take several forms, depending on your priorities: improved developer experience and productivity, efficiency in building data-intensive applications, faster time to market, cost optimization, security, or all of the above.

The typical objectives of a platform engineering initiative are:

Centralise infrastructure and drive adoption of standard architectures
Facilitate full compliance in a dynamic security and regulatory environment
Reduce time to production by eliminating developer friction points
Control costs, encourage code reuse, and replace multiple 'do-it-yourself' solutions

Let's acknowledge the fact that each interpretation has its virtues and limitations. Please note this is the perspective of a software engineering services company. It will be slightly different from the perspective of product vendors and end-user organisations.

Internal developer platform architecture

From a 10k foot view, most of the internal developer platforms will consist of similar areas. The diagram below shows just that.

To effectively navigate platform engineering, it’s best to understand these concepts as your starting point:

Platform boundary: This is a fundamental aspect of platform engineering that determines how far the platform extends: what’s in scope and what’s out of scope. Boundaries are typically formed around external systems and domains, such as compute, storage, deployment tooling, observability or security tooling. This is also a good indication of the problem space and how much the platform engineering team needs to take on their shoulders in terms of maintenance. The platform boundary is sacred and should be protected to avoid scope creep. It’s worth asking yourself: What’s the approach to defining platform boundaries? Is it based on the team's skills or organisational topology? What’s the level of alignment / overlap with the rest of the organisation?
The operating model explains how developers work in the organisation. In platform engineering, it’s usually presented in a self-service fashion using web portals, direct API calls, or developer tooling. It’s important to understand how many deployment and lifecycle activities are handled as part of the platform, and what the application team is responsible for (this is called the “shared responsibility model”). It also speaks to the level of autonomy vs control. A good understanding of these will help determine the best market fit for the platform in the organisation. Ask yourself a simple question - what’s the path to production? How do teams deploy and operate their software? How much support do they need from managed service teams or platforms?
Platform capabilities: In other words: what are the key capabilities of the platform? This could be compute, databases, or deployment as a service. What is the value proposition of the platform?
Engineering principles and guardrails: How does the platform engineering team approach the solution architecture? How do they keep the quality bar high? Familiarising yourself with these aspects provides a better understanding of existing architectural tradeoffs and how technical decisions are made.
Platform technology landscape: What are the key technologies used in the platform; what are its building blocks and components? How do they interact with each other? Are they open-source or proprietary? How many of them are reused internally vs just operated as part of the platform only?
Level of abstraction: the level of abstraction is the degree of complexity at which a system is viewed or programmed. The higher the level, the less detail or control. The lower the level, the more detail and control. These are related to the engineering maturity of the organisation. Typically, the more experienced the engineers, the more control they need. Is the solution an Infrastructure as a Service, Platform as a Service or Deployment as a Service? How much control does it provide in terms of service management aspects, custom configuration, and observability?
Team enablement: Is the platform engineering team actively involved in helping application teams overcome obstacles and speed up migration efforts, or is there an external team doing this work? Can the platform engineering team focus on building and maintaining the platform, or does it also act as an enablement team?
Operations and Site Reliability Engineering: Complex initiatives require a wide range of skills and processes. Is the current team capable of addressing existing issues? Is this approach sustainable over time? What’s needed to achieve operational excellence in the platform team? Platform engineering mostly requires a software engineering mindset, and experience shows that it is good to introduce SRE capabilities early in a project and scale them out when the platform gets closer to general availability.

Finally, it is important to understand where platform engineering sits in the organisation topology. It may sit under infrastructure, application development, or even a dedicated developer experience business domain.

Implementation process

In an earlier article, I already explained how platforms come to life and mature over time. This is a good place to start. However, when it comes to the actual implementation process we need to break it down into the most important things that happen at each stage. This section provides ways to think about them.

Step 1. Understanding fundamentals

At this stage you need to explore existing technology and processes so you know how to adapt and what can be potentially reused. Conduct interviews with a few development teams to better understand their operating model and the challenges they face on a daily basis.

Introducing platform engineering has organisational implications. You need to communicate platform engineering initiatives company-wide early on. It helps to check for potential competition or overlapping initiatives. This helps avoid the friction in the future. Based on this knowledge, define boundaries and key areas of the platform. Take into consideration the size of and how capable your platform engineering team is. Use this knowledge to define the platform boundaries and key areas. Often the fewer the better.

You should now have a good level of understanding of your organisation’s needs and can make technology choices based on the existing technology landscape and engineering principles. Carefully consider whether it makes sense to introduce new technologies, as in many cases introducing a new practice is costly and requires additional training and procurement. Not every organisation is a high performer; sometimes it’s better to set realistic expectations and use well-understood, established technologies.

The next step is to define the right abstractions that fit your organisation’s current operating model. Decide which one fits your needs: Infrastructure as a Service, Platform as a Service or Deployment as a Service. Each has unique benefits and tradeoffs. In addition, you should look at this phase holistically and assess how it fits into the private vs. public cloud vs. edge landscape if your organisation has that scale.

Now you are ready to start designing the solution architecture by first defining key design considerations. Architectural guardrails help the team make platform engineering decisions.

Step 2. Design the platform’s concept model

At this stage, it’s better to focus on a single area because it shortens the lead time to build it and deliver the platform POC. You can’t build a platform that satisfies everyone, aim for 80% of use cases, the remaining 20% are edge cases that are probably too costly to address. Do not jeopardise the platform based on the opinions of a vocal minority.

Run a series of spikes to see if the initial assumptions make sense. You may want to adapt the Request for Comments (RFC) approach, which is useful for gathering feedback in a more scalable and asynchronous way.

As a result, you should have at least a basic solution design in the key platform areas, as it may be increasingly difficult to undo some decisions later. Some example areas might be access control, onboarding, networking, security, scaling or multi-tenancy.

Write everything down in the form of a design document. You’ll be presenting and returning to it many times in the future. Ensure overall architectural alignment with enterprise architects, if applicable. Discuss this with decision makers, and various development teams, and finally get the executive buy-in for the early implementation work.

Step 3. Build and validate the platform prototype

First, you need to agree on the outcome, scope and timeline. It’s important to set and manage expectations upfront. It’s almost certain that you will face external dependencies and need to manage the risk.

Finding an early adopter team is also crucial. The ideal team should be:

The right size so that you are motivated to look after them during this phase, preferably as part of a customer journey.
Able to participate in an early adoption programme.

Invite them to participate in a customer journey, ask them for feedback often, and listen to their pain points.

Deliver the first prototype, preferably within six months. You want to validate assumptions and fail fast. Pivot is less expensive at this stage.

Start advocating for the platform at an early stage to build the adoption funnel. Planning for migration (e.g., from already existing platforms) always takes more time than you expect. It also helps identify additional requirements and potential blockers.

Step 4. Mature the platform for general availability

Putting the platform into production and achieving operational excellence always depends on your unique organisational goals and what you want to achieve with platform engineering.

At this stage you may want to consider:

Optimising costs and whether it makes sense to implement some form of cross-charging model to hold teams accountable.
Establishing a release management process and achieving a higher level of automation in general are important at this stage. From now on, it cannot be done on a case-by-case basis, it has to be reliable.
Achieve operational excellence. Depending on the circumstances this could be building a Site Reliability Engineering capability, improving lifecycle management processes or introducing 24/7 support.

Assuming all of the above, the platform engineering team structure at this stage may not be appropriate for this phase and may require additional staffing.

What your platform team needs to move forward

Empowering your platform engineering team is the best leverage you can get. This creates a ripple effect throughout your organisation.

Platform engineering is about delivering value to developers, and the best people to understand their needs are platform engineers. They operate at the same level of context. Nothing will replace human-to-human relationships, so as a leader you need to create favourable circumstances for your platform engineering team to have the time and space to build those relationships. If the work is process-heavy, if people need approval at every stage, or they spend too much time in non-engineering meetings, they won’t have the space to look beyond the current scope. The bounded engagement model, where platform engineers work directly paired with developers on some fixed-time, fixed-scope work is the most efficient approach.

In my experience, empowering the platform engineering team includes:

Having an enabler/disruptor/innovator role in leadership that supports moving the needle, managing friction, and driving change outside of the platform domain. Historically, people who keep their head down are not the best fit for this type of role.
Enabling the team to contribute directly to the roadmap.
Let the team process, prioritise and propose implementation of feature requests alongside the product management (take ownership).
Get the team actively involved in support activities, migration and onboarding so that they are constantly exploring the problem space.
Give the team the final say on when to move to the next stage of platform maturity.
Help them advocate for the platform in the organisation so they have a chance to prove its value. Focusing on both top-down and bottom-up approaches has the greatest impact.

These things may sound obvious to most of you. And that’s fine if you already understand them. But for those platform engineering efforts that are not yet aligned with innovation programs, they will have to emerge in other ways until they are more widely recognized. Understanding and adapting to how people in your organisation think about platform engineering is an essential first step.

Now that you've read this article, I hope you have a more complete view of the platform engineering landscape, its pitfalls and advantages. Everything in the above article is based on my own experience and from my point of view, which is why I want to ask you:

What's your perspective on platform engineering?
Do you have any interesting real-life stories to share?
What new dimensions did you discover while reading?

Feel free to write to me directly or leave a comment here. I will make sure all your comments and questions receive a thorough response.

Read previous articles:

Engineering Primer