Crafting Platforms' Book
Chapter 04

Segmentation

Good fences make good neighbors.

— Robert Frost
Story

It’s 3:15 PM on a Tuesday, and every dashboard at Mountain Lab is flashing red.

Marta is on a Zoom call with Javi and the tech lead from the Integrations team. The site has been unresponsive for ten minutes. Customer Support is reporting a major incident in Jira.

“Javi, what’s going on?” Marta asks, watching the incoming request error rate climb to 90%.

“I’m looking into it,” Javi replies, typing furiously. “The database CPU is pegged at 100%. Connections are maxed out. But there hasn’t been a deployment to production today.”

“Wait,” says the Integrations lead. “My team just kicked off a massive load test on the new payment gateway integration, but that’s in the staging environment. It shouldn’t affect production.”

Javi sighs heavily. “Are they hitting the mountain-lab-primary-db cluster?”

“Yes, but the staging schema,” the lead says defensively. “We’re just testing the transaction throughput before the Q2 launch.”

Marta closes her eyes and rubs her temples. “There is only one cluster. Different schemas, sure, but it’s the exact same physical resources.”

The staging load test had consumed all the database’s compute power, starving the production schema of resources. The entire business was down because of a simulated test.

By 4:00 PM, the load test was killed, and the site recovered. But the incident review the next day was a tough conversation with Diego, the CTO.

“Why are staging workloads anywhere near our production data?” Diego asks.

“Because when we were twenty people, setting up a second database cluster was expensive and complex,” Javi explains. “We have one AWS account. We have one massive Kubernetes cluster with namespaces for dev, staging, and prod. We have one main database cluster. It’s a miracle this hasn’t happened sooner.”

Marta nods. “He’s right. As we build this internal developer platform, the foundation has to change. We can’t just give developers a golden path if that path leads to a shared highway where a fender bender brings traffic to a complete halt. We need isolation. We need boundaries.”

“So, what’s the plan?” Diego asks.

“We need to define our blast radius,” Marta says. “We’re going to segment the platform.”

Before we can start building CI/CD pipelines or provisioning databases, we need to answer a fundamental question: how do we structure the underlying environments so that teams can work autonomously without interfering with each other?

This is where segmentation comes in.

Core Concepts

A segment is a generic term for any physical or logical boundary around a set of workloads, resources, and users within a platform. We can create segments at very broad levels—such as dividing the entire company’s internal tools from its customer-facing applications—or at highly specific levels, like an isolated execution environment for one product team.

The most important aspect of a segment is that it defines a specific set of capabilities, guardrails, and constraints that are relevant to your organization. These characteristics serve as a guide: when you need to deploy a new platform resource, you look at its requirements and match them against the characteristics of your available segments to know exactly where it should land.

Regardless of its size, a segment is simply the act of drawing a fence to separate different parts of our infrastructure, allowing us to apply these distinct policies and configurations to each area.

Why Segment?

Segmentation solves a few critical problems in modern infrastructure:

  1. Blast Radius Reduction: The “blast radius” is the scope of impact when something goes wrong. If you have a single, monolithic infrastructure, a compromised credential or a runaway process affects everything. Segmentation contains failures to specific, isolated boundaries.
  2. Security and Compliance: Many regulatory frameworks, such as PCI-DSS or HIPAA, strictly require that systems processing sensitive data be entirely isolated from those that do not.
  3. Cognitive Load: By narrowing the scope of an environment, you reduce the number of resources a developer has to think about. They only see the infrastructure relevant to their context.
  4. Different Rules for Different Contexts: The policies that govern a production environment (strict access, heavy auditing) are counterproductive in a development environment where engineers need freedom to experiment.
  5. Isolation by Design: If you share a unified infrastructure for all workloads, you must rely on complex configurations—nested IAM rules, Role-Based Access Control (RBAC), and intricate network policies—to prevent cross-contamination. Physical segmentation relies on hard boundaries, like separate cloud accounts. You don’t need a perfectly crafted policy to block traffic between two environments if they are physically separated without a route between them. The architecture inherently enforces the isolation.

However, segmentation is a balancing act. Too little, and you risk a catastrophic failure. Too much, and you introduce crippling operational complexity: suddenly your team is managing fifty separate Kubernetes clusters or a hundred nested cloud accounts.

To get it right, we must approach segmentation across four dimensions.

The Four Dimensions

While there are many ways to slice an infrastructure, effective platforms segment along four primary axes: Sector, Tier, Region, and Tenant. Using the Platform Notation introduced in the previous chapter, we can formally define these dimensions as types that form our coordinate system.

Sector

The first and most critical dimension separates the internal platform context from the business context.

Type Definition: Sector(Name)

I use the term Sector to define these high-level boundaries. While “Domain” might be technically more appropriate from a Domain-Driven Design perspective, it is a heavily overloaded term in our industry. By using Sector(Name), we establish an unambiguous vocabulary: a sector is a distinct area of the infrastructure with its own governance and purpose.

  • Internal Sector: These host the shared tooling, enablement systems, and security services that support the entire ecosystem. In industry standards like the AWS Landing Zone or Azure Cloud Adoption Framework, these are often referred to as the Foundation or Core sectors. They provide the “management plane” for the rest of the organization.
  • Business Sector: These host the actual customer-facing applications and revenue-generating workloads. It is a common mistake to think that product teams own the business sector. In a well-crafted platform, the platform team owns and provisions the sector, while the product teams use it to deploy their applications.

Most organizations start with a single business sector, but there are two scenarios where adding more becomes necessary. First, Acquisitions: when a company acquires another, they often keep the acquired systems in a separate business sector to avoid premature integration while maintaining a unified governance model. Second, Diversification: a company launching a completely new line of business (e.g., a retail company starting a logistics arm) might use a new sector to ensure that a failure or security incident in the new venture cannot impact the core retail operations.

Note

Sector examples in this book

To keep our examples focused, we use a single internal Sector("platform") and a single business Sector("ecommerce"). I chose “platform” because it is the most intuitive name for developers, even though it carries a slight risk of being confused with the entire IDP. In some organizations, you might see this internal sector named Foundation, Core, or ControlPlane to distinguish it from the broader platform ecosystem.

This separation is vital for both compliance and resilience. Keeping your CI/CD runners and observability tools in an internal sector, physically separated from the business logic, ensures that your “eyes and hands” remain operational even if a massive traffic spike or a resource exhaustion incident brings the business sector to its knees.

Tier

The second dimension separates workloads based on their criticality and the data they handle.

Type Definition: Tier(Name)

I prefer to call these tiers rather than environments, because development teams often use terms like dev, qa, staging, and prod arbitrarily. Instead, think of tiers as strict boundaries governing access and data. Throughout this book I use two tiers: Tier("sandbox") and Tier("live").

  • Tier("sandbox") (Non-Production): The realm of innovation. Here, engineers use synthetic or heavily anonymized data. Product teams have the freedom to try new ideas, experiment, and break things without impacting the business.
  • Tier("live") (Production): The realm of revenue. This tier processes real customer data. Access is strictly controlled, heavily audited, and monitored. Direct human access to resources here should be the exception, with all changes flowing through automated pipelines.

By creating distinct Tier("sandbox") and Tier("live")—often implemented as completely separate cloud accounts or subscriptions—you ensure that a developer writing a messy script in a Sandbox can never accidentally drop a Live database.

Region

The third dimension organizes workloads by their physical location in the world.

Type Definition: Region(Name)

Region segmentation involves deploying separate instances of your platform in different geographies. You deploy in the US and Europe to reduce latency for local users, or because data residency regulations (like GDPR) require European citizen data to never leave European servers.

Tenant

The final dimension isolates workloads based on the owner.

Type Definition: Tenant(Name)

In a large engineering organization, you segment by internal Team. Using Tenant("payments") ensures the team gets their own isolated space, separate from Tenant("recommendations"). This limits the blast radius between different microservices, tightens security through least-privilege access, and enables accurate cost attribution.

The Coordinate System

Every resource in our platform is assigned a specific tuple that we call its coordinate. Following the grammar of Platform Notation, we always use a four-dimensional format: (Sector, Tier, Region, Tenant).

This coordinate acts as the “address” for every resource. If a dimension is not relevant for a specific resource, we use an underscore (_) to indicate it is ignored.

  • (Sector, Tier, _, _): A coordinate for resources that span all regions and tenants (e.g., a cloud account).
  • (Sector, Tier, Region, _): A coordinate for resources that apply to all tenants in a specific region (e.g., a VPC).
  • ("ecommerce", "live", "eu01", "payments"): A fully resolved address for a specific tenant workload.

Platform Segments

By applying one or more of these four dimension types, we create a segment. In our structural model, any combination of dimensions defines a valid segment. We don’t just think of segments as a linear progression from broad to specific; any cut across the platform’s coordinate system creates a boundary with its own characteristics.

While we usually define segments in the order of (Sector, Tier, Region, Tenant), this is not a requirement. You can combine any number of dimensions in any order that suits your infrastructure needs:

  • One-Dimension Segments: Define massive organizational boundaries. For example, ("platform", _, _, _) represents the entire internal management plane, while ("ecommerce", _, _, _) encompasses all business-related workloads.
  • Two-Dimension Segments: Add another layer of classification. For example, ("ecommerce", "live", _, _) defines the production environment for the E-Commerce sector.
  • Three-Dimension Segments: Create more focused infrastructure or organizational scopes. ("ecommerce", "live", "eu01", _) represents the European production network, but we might also have an AWS account at ("ecommerce", "live", _, "payments") that spans all regions for a specific tenant.
  • Four-Dimension Segments: The most granular boundary, identifying a specific owner in a specific context. ("ecommerce", "live", "eu01", "payments") defines the isolated execution environment for the Payments team in the European production environment.

The more dimensions we apply, the more specific the segment’s purpose becomes. As we’ll see in the next section, segments that do not include the Tenant dimension typically define the Core Space, while the fully resolved four-dimensional segment is what we call a Tenant Space.

Spaces

The true power of segmentation emerges when we move beyond simple labels and look at the architectural boundaries where workloads actually live. This model relies on a critical distinction, heavily inspired by how Operating Systems separate Kernel Space from User Space. We divide our infrastructure into Core Space and Tenant Space.

A fundamental reason for this separation is that these spaces are managed in completely different ways, particularly when it comes to access control. For the Core Space, you usually need a broad set of permissions as it entails setting up the core infrastructure, cloud accounts, global configurations, and so on. In contrast, to work on the Tenant Space, you only need writing permissions within the specific tenant’s segment, and never beyond it. That’s on purpose, and one of the main reasons for segmentation. We will go deeper into permissions for both spaces in Chapter 5.

flowchart TD
    subgraph CoreSpace [Core Space]
        direction TB
        Network[Networks]
        DNS[DNS Zones]
        Accounts[Cloud Accounts]
        Config[Global Configuration]
        K8s[Kubernetes Clusters]
    end

    subgraph TenantSpace [Tenant Space]
        direction TB
        subgraph PlatformTenant [Platform Tenant]
            Observability[Observability Stack]
            VCS[Version Control]
            CICD[CI/CD]
        end
        subgraph BusinessTenant [Product Tenant]
            App[Virtual Machines]
            DB[Databases]
            Storage[Storage]
        end
    end
    
    CoreSpace -->|Governs| TenantSpace

Core Space

In an operating system, the Kernel has unrestricted access to the underlying hardware and manages shared resources (memory, CPU, networking). In our platform, Core Space encompasses the underlying infrastructure boundaries. It includes everything “under the hood”—the shared resources and administrative boundaries that do not have a Tenant as a dimension.

The Core Space is not only the management place. Sure, we have services here to manage everything else, but it is also where we host the core infrastructure: network, DNS, accounts, subscriptions, global configuration, etc.

This doesn’t mean tenants won’t use these resources. For example, an AzureSubscription(Sector, Tier) belongs to the Core Space and is managed exclusively by the platform team, but all tenant workloads within that sector and tier will run inside it. When tenants are placed inside these spaces, they inherit all the characteristics of the space—like strict IAM boundaries or relaxed networking—but they do not manage them. The fences are built and maintained by the platform team.

Tenant Space

User Space in an OS is strictly sandboxed; applications run there and must use APIs to request resources from the kernel. Similarly, Tenant Space represents the fully resolved coordinate that includes the Tenant dimension. This is the isolated execution environment provisioned for a product team.

In practice, a Tenant Space looks like a combination of a Kubernetes Namespace, a dedicated Cloud Resource Group, and scoped IAM roles. Tenants have high autonomy within this boundary—they can deploy their applications and manage their internal configurations. However, their blast radius is strictly contained. They interact with the Core Space via APIs or an Internal Developer Portal, just as user applications make system calls to an OS kernel.

The Platform Tenant

Given that the platform team has broad access to the Core Space, it might be tempting for them to deploy their own services—like CI/CD runners, version control systems, or observability stacks—directly into it.

However, the platform team cannot use the Core Space to run these services. These are platform services, sure, but they don’t belong in the Core Space. If they are compromised, the entire infrastructure is in trouble.

Instead, we manage those services with a Platform Tenant, operating entirely within the Tenant Space (e.g., Compute(Sector = "platform", Tier = "live", Region = "eu01", Tenant = "platform")). We play by the same rules that other tenants play by in the platform, with the same constraints and the same bureaucracy. This strict separation not only ensures the Core Space remains pure infrastructure and networking, but it also helps validate firsthand what the experience is like working on that side of the platform.

Segmentation vs. Landing Zones

If you’ve read about cloud architecture, you’ve almost certainly encountered the term landing zone. AWS, Azure, and GCP all use it to describe a pre-configured, secure, scalable cloud environment that serves as the foundation for all workloads.

This is worth saying plainly: the segmentation strategy of your platform is the concept—the model you use to decide how your platform is divided. A landing zone is the implementation—the specific cloud account structure, policies, and guardrails that enforce that model using a cloud provider’s native primitives.

The same segmentation model can produce very different landing zone designs. One organization might create a single production account for all tenants at ("ecommerce", "live", _, _) and use resource groups for tenant isolation. Another might create one subscription per tenant, mapping directly to ("ecommerce", "live", _, "payments"). The underlying architectural principle of separating spaces stays the same. The implementation varies based on blast radius tolerance, compliance requirements, and operational capacity.

Every design decision we make in later chapters regarding infrastructure, networking, CI/CD, and observability will reference the coordinates where resources “land” within these spaces.

The Isolation Spectrum

Not all boundaries are created equal. To design an effective platform, we must understand the Isolation Spectrum. This determines how “close” two workloads are to each other and what layers of protection stand between them.

The key is to identify what is being shared. A separate cloud account is a powerful administrative boundary, but it is a common misconception to assume it guarantees physical isolation. In reality, multiple cloud accounts often share the same physical server. If your goal is to prevent a “noisy neighbor” or a side-channel attack on the CPU, the account boundary alone isn’t enough.

LevelBoundaryMechanismWhat is Shared?
1PhysicalAir-gap / Disconnected DCNothing
2HardwareDedicated Server (Bare Metal)Facility (Power, Cooling, Rack)
3AdministrativeCloud Account / SubscriptionPhysical Hardware, Cloud Provider APIs
4NetworkVPC / VNet / SDNManagement Plane, Physical Hardware
5RuntimeVirtual Machine / InstanceNetwork, Management Plane, Hardware
6LogicalNamespace / ContainerOS Kernel, Network, Management, Hardware

Understanding the Depth

  • Level 1 (Physical): The “gold standard.” Used for air-gapped systems where no physical connection exists between segments.
  • Level 2 (Hardware): You have a dedicated physical host (Bare Metal). Even if managed via a cloud API, you are not sharing CPU, RAM, or local I/O with any other customer or account.
  • Level 3 (Administrative): This is the baseline for cloud segmentation. Your CloudAccount(Sector, Tier) typically maps here. It isolates identity, billing, and API quotas. However, at this level, your workloads still share physical hardware (and the hypervisor) with other strangers in the cloud.
  • Level 4 (Network): Multiple networks exist within the same administrative boundary. They share the same management plane but have no direct path to talk to each other.
  • Level 5 (Runtime): Workloads share the same network and management plane, but are isolated by a hypervisor. This is the standard “separate VM” isolation.
  • Level 6 (Logical): The most lightweight isolation. Workloads share the same Operating System kernel. A breach or a resource leak here is the most likely to impact neighbors.

The Design Heuristic

The rule of thumb is: use the Administrative boundary (Level 3: Account) as the minimum isolation for Sector and Tier.

If you have extreme security, performance, or regulatory requirements that forbid sharing hardware with strangers, you must move up to Dedicated Hardware (Level 2). Within those hardened segments, use Network and Logical boundaries (Levels 4–6) to isolate Tenants and Regions based on your risk tolerance and budget.

Warning

Management Plane

An administrative boundary (Level 3) is a powerful fence, but it is also a single point of failure. If an attacker compromises your Cloud Account credentials, they can bypass almost every boundary below it—deleting your VPCs, accessing your VM disks, or shutting down your physical hosts. High-level segmentation is as much about protecting the management plane as it is about isolating the data plane.

Warning

Tags Are Overlays, Not Boundaries

Resource tags and labels should never be used for access management or isolation. Tag-based access control (ABAC) is inconsistent across cloud providers—particularly AWS—and a misconfigured or missing tag silently exposes resources. Use tags for what they’re good at: data classification, cost attribution, ownership tracking, and IaC references. They enrich the structural model with metadata. They do not enforce boundaries.

We will explore how to map this architectural model to concrete cloud structures (like AWS Accounts, Azure Subscriptions, and GCP Projects) in detail in Chapter 6: Infrastructure.

Microsegmentation

The segmentation model we’ve built so far—Sector, Tier, Region, Tenant—handles the coarse boundaries. But what happens inside a segment?

Even within a single Tenant Space, services communicate with each other. Traditional segmentation focuses on north-south traffic: the traffic crossing the perimeter (user requests hitting a load balancer, API calls entering from the internet). But in modern architectures, the majority of traffic is east-west: service-to-service communication within a segment. A payment service calling a fraud detection service. A webhook receiver querying a database.

Microsegmentation controls this east-west traffic. It is not a fifth dimension of our model—it’s an enforcement mechanism within segments, operating at the finest granularity. Think of it as the last layer of defense.

The platform team’s role is to provide microsegmentation as infrastructure. For example, deploying a container network interface (CNI) that supports network policies or a service mesh that enforces mutual authentication.

In many platforms, the starting point within a segment (especially in a Sandbox tier) is a default-allow policy, where all services in that segment can freely communicate. Microsegmentation is then implemented to layer in restrictions where they matter most. This can happen at the tenant level—where teams explicitly restrict which other tenants can reach their services—or even at the application level, where a single tenant restricts traffic between their own individual microservices to follow a zero-trust model.

This is particularly important in regulated environments. PCI-DSS v4.0 explicitly recognizes microsegmentation as a scoping mechanism for the Cardholder Data Environment. If you can prove that only specific services within a segment can reach your payment processor, the rest of your infrastructure falls outside the compliance scope.

We won’t go deeper into implementation here—networking, CNI choices, and service mesh patterns belong in Chapter 6 (Infrastructure) and Chapter 9 (Security and Compliance). But the concept matters for segmentation design: your segments define the walls, and microsegmentation controls the doors within those walls.

Patterns and Anti-Patterns

When designing your platform’s segmentation model, keep these strategic traps and best practices in mind.

Anti-Pattern: Tier Mixing (Environment Pollution)

The most common segmentation failure is running workloads with different criticality or data classifications within the same segment. Mixing sandbox and live workloads—even with logical separation—is a recipe for disaster. As we saw in Mountain Lab’s opening story, a load test in a non-production space can easily starve production resources if they share the same underlying boundaries.

Anti-Pattern: Sector Pollution

This occurs when internal platform tools (CI/CD runners, monitoring agents, security scanners) are deployed into the business sector. This creates a circular dependency: if the business sector experiences a massive failure, the tools you need to diagnose and fix it might be down as well. Keep your Internal and Business sectors strictly separate.

Anti-Pattern: Billing-Driven Hierarchy

Structuring your segmentation model to match corporate cost centers rather than security and isolation needs. This creates weak boundaries where you need strong ones (between environments handling different data classifications) and strong boundaries where you don’t (between cost centers that share the same security posture). Cost attribution is a metadata overlay (tags/labels), not a structural dimension.

Pattern: Bridge Isolation Model

Not every resource at a specific coordinate needs the same level of isolation. This is the most common real-world pattern: you choose the isolation level per resource type, per coordinate. The Tenant("payments") might warrant a dedicated, siloed database for compliance, but they can safely share a compute pool with other teams within their ("ecommerce", "live", "eu01", "payments") Tenant Space.

Pattern: Dedicated Security Sector

Isolate security observability—log archives, audit systems, SIEM—into a dedicated internal sector. By separating these from both the platform and business workloads, you ensure that your security trail remains intact even if the primary infrastructure is compromised.

Decision Frameworks

How do you choose your level of segmentation? Consider your organization’s maturity, industry, and resources.

  • Startups (Pre-Product Market Fit): Keep it simple. Two tiers: Tier("sandbox") and Tier("live"), each in its own cloud account or subscription. One sector is fine. Don’t over-engineer. You’re at the bottom of the isolation spectrum (Levels 5–6 for most resources), and that’s appropriate. Focus on finding market fit.
  • Scale-ups (Growth Phase): Introduce Sector("platform") to pull shared tools out of the product boundaries. Add the Tenant dimension logically—namespaces and resource groups—to track costs and manage access as headcount grows. Start moving critical resources up the isolation spectrum to Levels 3–4.
  • Enterprise / Heavily Regulated: Implement all four dimensions with hard physical boundaries. CloudAccount(Sector, Tier) at minimum, possibly CloudAccount(Sector, Tier, Tenant) for sensitive workloads. Microsegmentation within segments. Strict network isolation and policy enforcement to satisfy auditors and compliance frameworks. You’re operating at Levels 2–4 across the board.

Cross-Cutting Concerns

The four dimensions define the structural boundaries of your platform. But several concerns cut across those boundaries as policy overlays—they influence how you configure segments, but they aren’t dimensions that create new infrastructure.

Data Classification

Not all data is equal. A common model classifies data into four tiers: Public, Internal, Confidential, and Restricted. This overlays the Tier dimension—your Live tier handles Confidential and Restricted data, but not every Live workload touches PCI cardholder data or medical records. Only a subset of coordinates needs the strictest isolation. Data classification helps you decide which boundaries warrant moving up the isolation spectrum.

Identity Alignment

Your identity and access management should align with the segmentation model. Separate identity providers or tenants for Tier("sandbox") and Tier("live"). Separate service principals per Sector. Short-lived tokens instead of long-lived credentials. As zero trust architecture gains traction, identity is increasingly the primary enforcement boundary—not the network. We explore this in detail in Chapter 5: Identity and Access Management.

Cost Attribution

Cost attribution is a metadata overlay, not a structural dimension. The structural model naturally enables it: every resource within the ("ecommerce", "live", "eu01", "payments") Tenant Space is attributable to the Payments team. Use tags and labels to enrich this further, but resist the urge to restructure your cloud hierarchy around billing. The hierarchy should serve security and isolation first; cost tracking follows through metadata.

CI/CD Pipeline Isolation

Build and deploy pipelines should respect segmentation boundaries. Tier("sandbox") build runners cannot access Tier("live") secrets. Pipeline credentials should be scoped per (Sector, Tier, _, _) at minimum. We’ll cover this in depth in Chapter 7, but the principle is simple: your CI/CD system is only as secure as its weakest credential scope.

Defining Your Segmentation Policy

Designing a segmentation model is only half the battle. To make the coordinates meaningful, you must define the policy and guardrails that apply to each value in your dimensions.

As you move through the upcoming chapters on IAM, Infrastructure, and CI/CD, the nuances of these policies will determine how you configure every cloud primitive. Before moving on, you should be able to answer these questions for your own platform:

The Sector Policy

What differentiates your Internal (Sector("platform")) from your Business (Sector("ecommerce"))?

  • Governance: Who is the administrative owner? (Hint: In both, it’s usually the platform team, but the users change).
  • Security Controls: Does the Internal sector have stricter egress filtering because it handles secrets? Does the Business sector require more intense WAF rules for public traffic?
  • Audit: Are the logs for the Internal sector archived in a separate, immutable bucket for forensics?

The Tier Policy

What are the “rules of the road” for Tier("sandbox") vs. Tier("live")?

  • IAM Policy: Is JIT (Just-In-Time) access required for Tier("live") but not for Tier("sandbox")?
  • Infrastructure: Are you using spot instances in Tier("sandbox") to save costs, while requiring reserved instances and multi-AZ in Tier("live") for resilience?
  • Data Policy: Is production data strictly forbidden in Tier("sandbox")? Is anonymization mandatory?

The Region Policy

Where is your platform “allowed” to exist?

  • Availability: Which regions are enabled for each Tier? You might allow any global region for Tier("sandbox"), but restrict Tier("live") to only those with specific low-latency connectivity to your headquarters.
  • Compliance: Are there specific regions that cannot share data due to sovereignty laws (e.g., EU vs. US)?

The Tenant Policy

How do you treat different Teams?

  • Default Isolation: Do all tenants start with a “default-allow” network policy within their namespace, or are they isolated from neighbors by default?
  • Quota Templates: What is the “Small” vs. “Large” compute quota available to a tenant?

Dimensional Combinations

The true power of the model emerges when you define policies for combinations:

  • (Sector, Tier, _, _): What is the root IAM boundary in the Core Space? This is often the level where you provision your CloudAccount.
  • (Sector, Tier, Region, _): What is the network routing policy in the Core Space? Can ("ecommerce", "live", "us01", _) talk to ("ecommerce", "live", "eu01", _)?
  • (Sector, Tier, Region, Tenant): What are the specific deployment guardrails for the Tenant Space? Can Tenant("payments") deploy to Tier("live") without a security scan?

Defining these characteristics now will simplify every implementation decision you make later in the book.

Summary

  • Segmentation protects the business: Effective segmentation is the primary defense against cascading failures. By defining strict boundaries, you ensure that an incident in one area—whether a runaway load test or a security breach—remains isolated, protecting the stability of the entire organization.
  • The Platform Notation: The coordinate tuple (Sector, Tier, Region, Tenant) provides a unified language for the entire organization. It moves infrastructure management away from arbitrary labels and toward a structural model where every resource has a precise, policy-driven address.
  • Four-Dimensional Architecture:
    • Sector separates the management plane (Internal) from the revenue plane (Business).
    • Tier separates experimental environments (Sandbox) from high-criticality environments (Live).
    • Region addresses geographical residency, latency, and high-availability requirements.
    • Tenant isolates teams and customers, enabling autonomous delivery and granular cost attribution.
  • Core Space vs. Tenant Space: A professional platform mirrors the separation of concerns found in operating systems. The platform team manages the Core Space (the kernel), providing the underlying hardware, networking, and governance. Developers operate within the Tenant Space (user space), enjoying autonomy within secure, pre-configured fences.
  • The Isolation Spectrum: Not all boundaries require the same mechanism. Match your isolation strategy—from logical namespaces to administrative cloud accounts and physical hardware—to the actual risk and regulatory requirements of the workload.
  • Bridge Isolation Models: Resilient platforms leverage “bridge” models, where different resources at the same coordinate use different levels of isolation. This allows you to optimize for both security and cost, providing dedicated databases where compliance demands it while sharing compute pools for standard services.
  • Evolve Through Policy: Start with the minimum dimensions needed for your organization’s scale, but define the policies and guardrails for each dimension early. This foresight ensures that as your infrastructure grows, your segments remain consistent and your blast radius stays contained.

By establishing these boundaries, we’ve built the “land” where our platform will live. However, a fence is only useful if we know who has the keys to the gate. In the next chapter, we will build on this structural model to define how users and services authenticate and gain access across these segments.

Skills for This Chapter

AI Skill

design-segmentation — An AI skill that guides you through designing a segmentation strategy for your platform.

It asks about your organization’s scale, cloud providers, and regulatory requirements, then produces a segmentation design document defining Sector(Name), Tier(Name), Region(Name), and Tenant(Name) as part of your structural model.

Subscribe to the Newsletter

Enjoying the book? Join 1,000+ platform engineers getting articles, insights, and stories from the trenches delivered directly to your inbox.

Subscribe for free