The True Cost of Cloud Waste in Portfolio Companies
Cloud waste is one of the most pervasive and least addressed sources of value destruction in PE-backed technology companies. Industry analysts consistently estimate that 30-40% of all cloud spending is wasted. In my direct experience auditing portfolio companies, that number is often conservative for companies that have never undergone systematic optimization.
But "waste" is a broad term. To build an actionable remediation plan -- and to accurately model the financial impact -- you need to understand the distinct categories of cloud waste, their typical magnitude, and the confidence level with which you can identify and capture savings in each category.
Category 1: Zombie Resources (90% Confidence)
Zombie resources are cloud assets that are running and incurring charges but serving no productive purpose. They are the low-hanging fruit of cloud optimization, and they exist in virtually every environment I have ever audited.
Common examples include:
- Orphaned EC2 instances from decommissioned projects that were never terminated. I once found 47 instances in a portfolio company's environment that had been running for over 18 months with zero network traffic -- costing $23,000 per month.
- Unattached EBS volumes left behind when instances were terminated but their storage volumes were not. These accumulate silently and can represent thousands of dollars in monthly spend.
- Old EBS snapshots retained indefinitely because no one established a lifecycle policy. I have seen snapshot libraries exceeding $8,000 per month with data from three-year-old development branches.
- Idle load balancers provisioned for services that no longer exist, each costing $20-$50 per month regardless of traffic.
- Unused Elastic IPs -- AWS charges for allocated but unattached Elastic IPs, a cost many teams overlook.
Why 90% confidence: Zombie resources are binary -- either a resource is serving traffic and being used, or it is not. The identification methodology is straightforward (analyze CloudWatch metrics for zero utilization over a 30-day window), and the savings are immediate upon termination. The only risk is misidentifying a disaster recovery or failover resource as unused, which is why I always validate with the engineering team before terminating anything.
Typical impact: 5-15% of total cloud spend. For a company spending $150K/month on AWS, zombie resources typically account for $7,500-$22,500 in monthly waste.
Category 2: Over-Provisioned Instances (70% Confidence)
Over-provisioning is the practice of running compute instances (and databases) that are significantly larger than the workload requires. It is endemic in cloud environments because engineers naturally err on the side of more capacity, and because there is rarely any organizational incentive to right-size after initial deployment.
The pattern is remarkably consistent: An engineer provisions a c5.4xlarge instance (16 vCPUs, 32 GB RAM) for a new service because they are unsure of the resource requirements. The service launches, and average CPU utilization settles at 8-12%. No one ever revisits the sizing decision. The company pays for 16 vCPUs when 4 would suffice -- a 4x overspend on that instance.
Multiply this pattern across dozens or hundreds of instances, and the aggregate waste is substantial.
Why 70% confidence: Right-sizing requires analyzing utilization patterns over time and understanding application performance requirements. Unlike zombie resources (which are clearly unused), over-provisioned instances are serving real workloads -- the question is whether they could serve those workloads on smaller, cheaper instances. There is a performance risk: some workloads have bursty requirements that average utilization metrics do not capture. This is why I assign 70% confidence -- there is meaningful savings potential, but each recommendation requires validation against actual performance requirements.
Typical impact: 10-20% of compute spend. For a company spending $100K/month on EC2, right-sizing opportunities typically range from $10,000-$20,000 in monthly savings.
Category 3: Idle Capacity (70% Confidence)
Idle capacity is related to but distinct from over-provisioning. Where over-provisioning refers to individual resources that are too large, idle capacity refers to entire environments or infrastructure tiers that are running when they do not need to be.
The most common offender is non-production environments running 24/7. Development, staging, QA, and demo environments are typically used only during business hours -- roughly 50 hours per week out of 168. Yet they run continuously because no one has implemented scheduling.
The math is compelling: shutting down non-production environments outside business hours and on weekends reduces their compute costs by approximately 70%. For a company with $30K/month in non-production environment costs, that is $21,000 in monthly savings.
Other examples of idle capacity include:
- Oversized database instances running in Multi-AZ configuration for non-production workloads (no need for high availability in dev/test)
- NAT Gateways in environments with minimal outbound traffic
- Provisioned IOPS storage for workloads that could run on general-purpose SSD
- Always-on GPU instances for ML training jobs that run intermittently
Why 70% confidence: The scheduling and shutdown approach is straightforward for standard environments, but some non-production environments support automated testing pipelines, overnight batch processing, or teams in different time zones. Each environment needs to be evaluated for its actual usage patterns before implementing scheduling. The savings are real but require some operational adjustment.
Typical impact: 8-15% of total cloud spend, heavily dependent on the ratio of non-production to production resources.
Category 4: Architectural Inefficiency (50% Confidence)
This is the most complex category and the one with the highest potential savings -- but also the highest implementation cost and risk. Architectural inefficiency refers to fundamental design decisions that result in unnecessarily expensive cloud consumption.
Examples include:
- Lift-and-shift migrations that moved on-premises architectures to the cloud without redesigning for cloud-native patterns. A monolithic application running on a cluster of large EC2 instances could potentially be refactored into containerized microservices running on AWS Fargate at 40-60% lower cost.
- Inefficient data architectures -- storing frequently accessed data on expensive storage tiers, running full-table scans on large RDS databases instead of using caching layers, or maintaining data warehouses with uncompressed, unpartitioned datasets.
- Poor data transfer design -- routing traffic through unnecessary hops, transferring large datasets across regions or availability zones, or using expensive API Gateway calls for internal service-to-service communication.
- Monolithic database instances handling OLTP and OLAP workloads simultaneously, when separating them would allow each to be optimized independently.
Why 50% confidence: Architectural changes require engineering effort, carry deployment risk, and have variable outcomes. A projected 40% cost reduction from containerization might end up being 25% after accounting for container orchestration overhead and the engineering time invested. The savings estimates in this category are directional, not precise, which is why I assign lower confidence.
Typical impact: 10-25% of total cloud spend, but realization requires 3-12 months of engineering investment.
Aggregating the Opportunity
When I present waste findings to PE operating partners, I use a weighted model that accounts for confidence levels:
| Category | Gross Savings | Confidence | Weighted Savings |
|---|---|---|---|
| Zombie Resources | $15,000/mo | 90% | $13,500/mo |
| Over-Provisioned | $18,000/mo | 70% | $12,600/mo |
| Idle Capacity | $12,000/mo | 70% | $8,400/mo |
| Architectural | $20,000/mo | 50% | $10,000/mo |
| Total | $65,000/mo | $44,500/mo |
In this example based on a company spending $200K/month, the weighted savings of $44,500/month represent a 22% reduction -- or $534,000 in annual EBITDA improvement. The unweighted potential is $780,000 annually.
This is real money, and it is available in nearly every portfolio company that has not been through a rigorous optimization process. The question is not whether the waste exists -- it is whether you have the visibility and expertise to find it.
Ready to evaluate cloud economics in your next deal? Book a free discovery call to discuss your specific situation.