The nature of work in platform engineering
Platform Engineering Primer, Part 3. The third in a series of articles exploring how developer platforms benefit modern businesses.
Join me in examining the intricacies of platform engineering. Let's take it apart and find the effective strategies, the most practical ways of working and the technologies that fit our needs and circumstances. Above all, I want to explore how internal developer platforms are the gateway to modern enterprise success.
Although I started my career in traditional business application development, I quickly discovered my passion for distributed systems, security and the complexities of the non-functional side. I enjoyed looking under the hood to connect and automate as much as possible. Even with a new project, breaking down business processes into programming logic became monotonous and needed more excitement. All in all, this was the catalyst for my move into cloud engineering.
Thanks for reading Engineering Primer! Subscribe for free to receive new posts and support my work.
As my career shifted direction, a few things struck me that I had not seen before.
The cloud-native technology landscape was still in its infancy, most of the technology we were using at the time wasn't stable and production-ready. At the same time, there was a huge appetite across the software development industry to adopt new cloud-native technologies and stay ahead of the curve. Unfortunately, there weren't reference models for how to use these technologies, information on best practice and, most importantly, clear ways to enable solutions at scale (today we call them golden paths). This created an exciting opportunity to pave the way for others facing the same uncertainties.
As a result of the rapid evolution of the cloud-native landscape, people working in the software development industry experienced increasing cognitive load. This created a challenge for developer efficiency that is still being addressed today.
In infrastructure, a software engineering mindset had begun to take hold. This shift was a response to the increasing complexity of most emerging technologies, which were based on distributed systems principles. This shift also meant that I could leverage my previous programming experience for example to build Kubernetes operators and CLI tools, or achieve higher levels of automation using various SDKs.
As I worked through these challenges, I realised that it wasn't just about the technical stuff, but also about understanding the human side. Solving these combined puzzles made my work more satisfying and opened doors to impacting the organisation as a whole.
So, naturally, my journey shifted from cloud engineering to platform engineering. Today, everything I've mentioned above is the bread and butter for most platform engineering teams.
In this article, we’ll look at the distinctive aspects of platform engineering and the unique nature of the work in this space. We’ll also explore how having the right people in place can have a significant impact across the business. Finally, we’ll also briefly touch on key considerations when hiring and, in the long term, scaling the platform engineering team.
Overall, it will give you a better perspective on how people working in platform engineering think, what they value and how to interact with them effectively.
How platform engineering differs from traditional application development
Before we get into the intricacies of the nature of the work and my observations, let's lay the groundwork for how platform engineering differs from typical application development.
The distinction between platform engineering and application development is both subtle and significant. Application developers focus primarily on front-end and back-end components, working closely with business processes. Platform engineering operates behind the scenes. Unlike customer-facing roles, these positions are internal, and dedicated to streamlining the software development lifecycle processes within the organisation. They primarily interact with developers.
Platform engineering is about solving problems at scale. This means that systems often need to be even more scalable and reliable compared to the scope of a single application. The bigger the organisation, the more challenging the work. For example, it might involve enabling new technology, considering how it will be used in the context of your organisation, writing documentation, defining patterns and building self-service processes around it. So that at the end of the day, the approach is well understood, standardised and easy to automate. It also fits use cases for the majority of users in the organisation.
Here is the list of key areas of focus in platform engineering:
Infrastructure automation and development tools: Ensures that infrastructure spin-up is fully automated using the Infrastructure as Code approach. This often includes developing custom tools (using programming languages) to abstract common operations in a simple, user-friendly way.
Infrastructure baselining: Provides golden paths to production by stuff like deployment patterns, environment promotion, code generation, and other types of reusable blueprints.
Infrastructure lifecycle management: Keeps the infrastructure up to date, well integrated and accessible to end users so they don't have to worry about it.
Scalability and reliability: Takes responsibility for providing end users with a fully functional environment that is stable and scales as the user base grows or during traffic spikes.
Security and compliance: Often concerned with standardisation efforts and ensuring that potential attack vectors are eliminated across the technology stack.
Monitoring and continuous improvement: Continuously monitoring the health of systems, implementing automation in this area and actively looking for ways to do things better.
Support and incident management: Establishing a formal process to help end users with their day-to-day issues and resolve production incidents in a timely manner. Often acting outside of typical working hours.
I’d say, platform engineering tends to attract individuals who want to make a bigger organisational impact. It applies to both individual contributors and decision makers who inspire change through the organisation. They always have a collaborative nature and are efficiency-focused.
It takes me back to the initial days when we were expanding our platform team. Alongside my colleagues, we came up with the following candidate profile. And you know what? Not much has changed since those days! I continue to refer to it whenever I recruit someone for the platform engineering team.
The ability to tackle new challenges that come with a lot of uncertainties, quickly adapting and learning, while also taking initiative, remains essential in platform engineering.
Typically, platform engineers…
Solve other people’s problems
As part of the platform engineering team, your ultimate goal is to build an internal solution for application teams that enables their day-to-day work. This ranges from day-one activities such as managing access, infrastructure configuration, setting up tools or providing out-of-the-box technology integrations. But it goes much further, to helping them deploy and operate production services in a safe and reliable way.
You have to look holistically, both taking into account the user's perspective and the organisation’s realities with what is possible and what is not. At the same time, actively communicating limitations, explaining do’s and don'ts and facilitating the flow of work. Moreover, it takes a specific type of skill to understand where exactly the cognitive load is and how to optimise it.
Engineers in this space need to be passionate about solving other people's problems, often supporting developers with a vast difference in level of experience ranging from someone who barely understands the tech landscape to seasoned experts who expect full customisation and control. They act as a bridge between technology and its users, ensuring that solutions are not only effective but also user-friendly.
Automate themselves out of a job
That’s the only way to handle the complexity and scale. People scale linearly, while the right automation scales exponentially. It’s essential to build an automation-first mindset in the platform team from day one.
People building platforms need to strive to automate themselves out of their job in order to scale out. Engineers in this area are supposed to streamline inefficient processes, so other people can experience less friction. Traditional system administration roles or old-fashioned operations are not capable of doing this work. It requires software engineering skills in DevOps.
Are ready to take the blame
Depending on the existing integrations and level of coupling with other systems, while building and operating the platform it’s inevitable to aggregate failures. It’s often the case that you put your reputation at risk because someone else messed something up in a different place which you only abstracted. Now you are accountable to fix it and make it the right way going forward. Platform engineers are not stepping away from the bottom of the problem.
In the context of incidents, accidental downtime or other production issues, we often talk about the importance of blameless culture in the organisation, but the reality is that we’ll always have stressful conversations. You need to be okay with this kind of situation. The blast radius in platform engineering is always bigger than in the application scope alone.
Always keep the lights on
Platform engineering involves running large-scale infrastructure in production, supporting it 365/24/7. Often resolving issues outside of your comfort zone while managing communication under the pressure of people who are impacted.
People in this space need to embrace the responsibility for the entire lifecycle of the infrastructure. This often involves being on call to address unexpected issues, which means being prepared to dive into problem-solving at any hour, not just during conventional working hours. People who favour a stable 9-to-5 schedule may find that platform engineering is not the ideal choice for them.
Their work is often invisible when everything runs smoothly, however, it becomes critical when the unexpected happens.
Community is essential in platform engineering
Platform engineering is a cultural change, and to be successful, it always needs a developer community as its foundation. Developers create value by using and building on the platform. You are scaling the platform organically by creating a dedicated space for engineers to collaborate, share knowledge, and help each other. Being actively involved in building the community is an integral part of the platform engineering team’s role.
Platform engineers play a role in building a community by hosting product demos and show-and-tell sessions and advocating platform use. They focus on engineering partnerships and one-to-one relationships to extend existing offerings or ensure interoperability with other systems. In larger organisations, they work with enterprise architects to ensure consistency in technology and architecture decisions. This helps to promote the platform and increase its use.
Building a developer community is a one-way door decision, it's hard and it takes time to initiate. However, if done right, it can empower the entire organisation. For more detail on this topic, I highly recommend listening to this podcast by Amir Shevat and Mikeal Rogers:
Platforms change constantly
While the lifecycle of the platform extends beyond typical applications, it is constantly evolving and requires adaptation. Changes are driven by organisational and user needs, and intensify as more teams adopt the platform.
"The platform is not trying to anticipate everything. It's trying to enable people to build better solutions." - Platform Strategy: Innovation Through Harmonization - by Gregor Hohpe
Imagine you are building a cloud platform, whether on Azure, GCP or AWS. You need to constantly keep up with upstream changes in API version, access control, security and more. You may find that the functionality you've already put a lot of effort into is provided out of the box by the cloud providers themselves. Don't be afraid of this, it's good! It means you are on the right track, you just need to adapt.
Part of a platform engineering team's role is to guide the organisation through the ever-changing technology landscape. The team continuously monitors new and changing technologies and the benefits they may provide. Evaluating and testing these technologies turns this awareness into guidance and actionable knowledge. I plan to write more on this subject in a future article.
As adoption of the platform increases, so do the needs that the platform has to meet. There may be things that were overlooked in the initial design, or scalability may have become an issue. However, you must be prepared to absorb a lot of new information on unfamiliar topics.
Why multidisciplinary expertise is vital
From a technical perspective, platform engineering requires multidisciplinary expertise to navigate between different platform domains and emerging processes, covering various aspects from basic understanding of the field to specific tasks.
Platform engineers are typically involved in every stage of platform development ranging from problem discovery, prioritisation, solution design, implementation, technical documentation and support. To give you a more concrete example, in addition to the standard implementation-related work, they need to be able to do the following:
Understand non-functional requirements such as reliability, disaster recovery, scalability, security and user experience.
Work in a product-led engineering fashion and actively participate in product discovery, customer engagement, prioritisation and value measurement.
Write clear and concise user-facing documentation, including architectural concepts, how-tos and troubleshooting guides.
Platform engineers pave the way to production for other developers, connecting the dots and making sure other people don't reinvent the wheel. Given what's expected, this is not a junior role because you can easily get disoriented without solid fundamentals. Do you agree?
Rather than a fictional unicorn, platform engineers are more like real-life octopuses. They are highly intelligent in the way they coordinate a number of actions that take place in parallel.
The nature of leadership in platform engineering
Depending on the organisational context, a person leading a platform engineering effort may have the title of Engineering Director, Head of Technology, Engineering Manager or Product Manager. Whatever the title, this is a very demanding role, requiring a lot of stakeholder management, diplomacy, internal sales and general bridging. It requires holistic thinking, strong systems design skills and hands-on operational experience. In other words, they need to be the CEO of this initiative, always looking ahead and thinking big.
I always look for someone with skin in the game. Somebody who can play politics at the right level to get the organisation to actually build something that solves the developers' problems. On the other hand, people who spend all their time in the paper land without actively engaging with the platform engineers are not ideal for this role.
In my personal approach, platform engineering is about building the structure and the bridge to efficiency according to existing corporate standards. But it is also about breaking the rules. Sometimes you have to ignore standards and think independently. Many people struggle with this and prefer to play nice to avoid confrontation.
“When you know the rules, you’re allowed to break the rules.” Breaking the Rules – @ASmartBear
I have seen a lot of vague standards in my career. It's frustrating when you really want to make things better, but the current way of doing things is sub-optimal and no one is willing to look at it. Platform teams need leaders who empower and encourage them to challenge the status quo.
Scaling a platform engineering team
Sooner or later you will face the bottleneck of scaling your platform engineering team. The need to scale depends on several factors, including:
Consider the size and number of development teams that your platform supports.
Define what you are trying to enable your internal customers to do and assess your level of involvement in supporting them.
Clearly identify the boundaries of the platform and the desired level of integration with external systems.
Evaluate the current team workload, including the level of context switching between engineering, non-engineering and support activities.
Assess the time available for innovation and experimentation beyond the standard roadmap.
You don't want your platform team to scale linearly with the organisation or the number of developers it supports. In fact, I would say that considering a new platform team member shouldn't be a default choice. It always changes the dynamics of the team and may not be the optimal solution at the time. Fundamentally, track adoption and other metrics such as Net Promoter Score (NPS) or developer productivity SPACE to guide the growth of your platform team.
Typical symptoms of team scaling issues include:
Too much work in progress, which is not really caused by poor project management, but by the amount of ad-hoc work, end-of-life activities or simply end-user support. Before you consider adding more people to the team, try to put more emphasis on a healthy balance and a reasonable level of pushback. In other words, consider a sustainable pace first.
There is limited collaboration between team members as they struggle to understand the full scope of the problem. You start to see less active participation in meetings because people simply can't digest everything on their plate, so they keep quiet. This could be an indication of knowledge gaps in the team, so a greater focus on knowledge sharing and documentation could be a solution for the time being.
The growth of the platform creates gaps in skills and responsibilities. Maintaining a sufficient level of multidisciplinary expertise across all areas of the platform will become too difficult. You'll need to start thinking about becoming more specialised in one area, such as release management or observability. Understand the preferences of the team members and then decide if you have the blind spot, ad hoc expert involvement may also be an option.
Extending the platform boundary or supporting additional integrations requires more responsibility from the platform team. This includes both maintenance and support overheads. This is where you should push back by default, unless you have very specific reasons to expand.
If the problem space becomes too large for the platform team to handle, it's a good idea to consider extracting a piece of the platform into a separate module and then delegating the responsibility to another, more suitable team in the organisation. A good example might be considering adding support for data stores when it's not strongly represented in the team's current skill set, and it's better to delegate that responsibility to others while still maintaining it as a key platform capability.
In most cases, organisations start small with just one platform team. However, in large organisations, a platform may eventually require multiple teams to build and operate it. Based on my observations, there are two common ways to scale the platform team.
Organise by subdomains
This method is based on splitting the existing team into smaller sub-teams that focus on a particular area of the platform, while increasing the number of people. For example, a common use case for subdivision is as follows:
Site reliability engineering: Essentially responsible for lifecycle management of the platform, continuous improvement, scaling and performance monitoring.
Support: Answering other people's questions, helping with enablement and onboarding.
Implementation: Building the platform and automating processes. This can be broken down into specific technology areas such as networking, security, observability, compute, storage and more.
Consulting: Working across the product and platform teams and working with the company’s technology strategist. This role helps to inform platform strategy, build roadmaps and understand the business-specific problem space.
Migration: Supporting development teams in migration or merging other platforms.
There is, however, one important caveat:
“Generally speaking, teams composed only of people with single functional expertise should be avoided if we want to deliver software rapidly and safely.” Team Topologies by Matthew Skelton, Manuel Pais.
Organise by value streams
Based on my conversations with some people in the industry, they don't necessarily agree that scaling the platform team through smaller, more autonomous sub-teams is the right thing to do. Sooner or later it creates coordination overhead, knowledge gaps and becomes too hard to maintain.
Instead, they suggest aligning the team around the entire value stream. So they take ownership of the specific outcome from start to finish.
The way it works is that they form a fixed-time squad focused on enabling a specific product or team, frequently reconfiguring and evolving in their own space. The value stream in this case relates to services and products within the scope of the platform. It's based on a layered approach where one team provides the base platform and the second team focuses on supporting the service on top of the platform.
Which approach is better? Well, asking that is a bit like choosing between a road trip and a scenic train ride. Both have their charms. Let me leave this as an open question as it depends on the scale, scope of the work or structure of the team topologies. In my humble opinion, organising by sub-domain is more popular and typically happens organically. On the other hand, the value stream approach requires some careful planning and a different way of budgeting.
Let's leave it there for now
We've explored the unique aspects of platform engineering and how having the right people on board can have a significant impact on your business. We've talked about traits to look for in platform engineers and ideas for scaling your team. Anything I missed?
Now that you've read this article, I hope you have a more complete view of the platform engineering landscape, its pitfalls and advantages. Everything in the above article is based on my own experience and from my individual point of view, which is why I want to ask you:
What's your perspective on platform engineering?
Do you have any interesting real-life stories to share?
What new dimensions did you discover while reading?
Feel free to write to me directly or leave a comment here. I will make sure all your comments and questions receive a thorough response.
Thanks for reading Engineering Primer! Subscribe for free to receive new posts and support my work.