Testing Pyramid:
An Evolutionary Tale

Change is ubiquitous in software development. New languages, tools and frameworks are being constantly invented and old ones toppled. Yet the one thing unwilling to move is the Testing Pyramid. It seems to survive technological erosion and the attacks of armies of unit test haters. Let's explore its origins, impact, and how breakthroughs in testing reimagine the pyramid for the modern testing era.

The inception of the testing pyramid

Flashback to 2009: At the time, testing consisted mainly of two types of tests: Unit and UI. The latter is often also referred to as end-to-end (E2E) test. However, Mike Cohn, in his blog article and his book 'Succeeding with Agile' argues that something is missing:

'Testing [through UI] would certainly work but would be brittle, expensive, and time-consuming.'
‍Mike Cohn (2009)

He postulated that not all tests need UI, so his 2009 solution was the service test (later called integration test). It's a new layer that tests the services separately from the UI.

Mike Cohn added a new service layer to the testing pyramid

Mike Cohn pioneered the service layer

Based on this insight, Mike Cohn introduced the Testing Pyramid in his book 'Succeeding with Agile.' The pyramid emphasized a hierarchy of tests based on their granularity, speed, and costs. At the base stood unit tests, which were numerous but quick and cheap. The service/integration tests that balance granularity and scope were in the middle. At the top were the UI / e2e tests: fewer, slower, and costlier but broader in scope, thus giving you more confidence.

In their 2015 blog post, Google agreed with Mike Cohn and outed themselves as integration test fans. They were not alone. Many advocates followed:

guillermo rauch tweet on integration testing

for Vercel founder and JS legend Guillermo integration tests should be in the majority

and swyx agrees

In the same blog, Google suggested an ideal distribution of 70% unit tests, 20% integration tests, and 10% e2e tests. This allocation was not a strict rule but a guideline reflecting each testing level's confidence (leftmost arrow in the image below), cost and speed (right side).

effect on confidence, costs and speed (from left to right) of specific test layers and distribution

Questioning the pyramid - alternate shapes emerge

As software development practices evolved and tools improved, the rigidity of the pyramid began to erode. It gave birth to a slew of alternative shapes:

Test Diamond
Emphasizes integration and system tests, squeezing unit and E2E tests to the corners, arguing that integration tests offer a more cost-effective way to catch bugs.
Test Ice Cream Cone
A satirical take that suggests that many organizations put more emphasis on UI and manual tests (the top), with a decreasing focus on integration and unit tests, leading to a fragile testing base. In 2015, Google called it an anti-pattern.
Test Crab
Emphasizes wide-ranging UI tests and fewer unit, component, and API tests (making up the crab's legs). This reflects scenarios where user interfaces have complex interactions.
Test Trophy
Advocates a broad base of static tests to complement slightly fewer unit tests.

software testing shapes - pyramid, ice cream cone, crab and trophy

four (un) popular testing shape alternatives, source: @leichteckig

The testing world is in the midst of a paradigm shift. The journey has been enlightening, from the early days of relying heavily on unit tests to today's emphasis on end-to-end (e2e) tests. Let's walk through this evolutionary tale and understand each testing layer's significance, strengths, and challenges.

A sturdy foundation: unit tests

Unit tests serve as the bedrock of the testing pyramid. Their benefits include:

Precision - unit tests can identify issues in specific code segments
‍Speed - they're quick to write and even quicker to run
‍Promotes good practices - encourages modular, decoupled code and is foundational for the widely-accepted Test Driven Development (TDD) approach
‍Documentation - they effectively serve as documentation, making it easier for developers to understand the purpose and behavior of code segments
Regression detection - spotting regressions early in the development cycle leads to considerable savings in terms of time and effort.

Yet, it's not all sunshine and rainbows. Unit tests can be tightly coupled – a minor code change can result in many failing tests. This often provides developers with a false sense of security, leading to potential oversights. And makes maintaining them very annoying.

example of good practice for unit tests

The under- and over-appreciated prodigy: integration tests

Integration tests can be seen as the middle child of the testing family – pivotal yet often under- or overvalued depending on the organization you work for. They ensure that various parts of your application, such as controllers, databases (both in-memory and dockerized), and UI components, work harmoniously.

integration tests can be used to test many things

The reason why integration tests have earned their place in the testing pyramid is:

Higher confidence - they offer more confidence in the functionality than unit tests
‍Contract adherence - they're instrumental in ensuring that APIs function as detailed in their documentation, like Swagger docs
‍Cost-Efficiency - they balance the detail of unit tests and the breadth of E2E tests, offering a favorable cost-benefit ratio

However, the challenges include complicated setups involving test databases or external services, slower execution times due to reliance on external components, and greater flakiness due to the nature of the more complex execution environment (think HTTP timeouts, etc).

integration tests often require complicated setup (test db, external services, etc.)

The rise of e2e testing

End-to-end tests simulate user behaviors and interactions, covering the gamut from frontend to backend to databases. But they haven’t always been the darling of the testing community.

To what end? We consider e2e tests to start at the browser all the way to the backend. Historically, they faced criticism for:

Lack of precision - difficulties in pinpointing specific issues
Maintenance woes - a single code change often leads to multiple e2e test failures
‍Flakiness - numerous uncontrollable variables made them unreliable

However, the tide has started to turn, especially with the advent of modern e2e testing tools like the Playwright testing framework. Open-sourced by Microsoft, Playwright, akin to Cypress, offers innovative features such as live replay through traces and an intuitive API design, resembling a modern Selenium. The testing arena is witnessing a shift towards testing user behavior, emphasizing e2e testing's increasing importance.

‍

And the end-to-end camp is getting louder. Some go as far as proposing an upside down arrangement of the Testing Pyramid. But I wouldn’t go that far. A solid base of unit tests has its rationale. It gives you the quick feedback and pinpointing needed for fast iterations.

The Testing Hourglass is a promising way out of the testing shape agony. Once described as an anti-pattern, thanks to the progress in end-to-end (especially UI) testing, some integration-level checks can be executed from the top. We use it ourselves, with promising results. We will continue to blog about our findings.

heavy on e2e and unit tests, light on integration testing - the hourglass test distribution

hourglass test distribution, image by Alan Myrvold

Only testing gives you the necessary confidence

While the classic testing pyramid might seem slightly antiquated, given the advancements in e2e tests, it's essential to avoid getting trapped in dogmatic thinking. Use the right mix of unit, integration, and e2e tests depending on your project's nature, the tools at your disposal, and the risks you're willing to accept.

In any scenario, one golden rule remains steadfast - always write tests and design them to benefit the team. The goal is to deliver quality code with confidence, irrespective of the testing mix.

Also, since the cost-benefit analysis for e2e testing is changing significantly, stay informed about the latest in e2e testing. Combine different testing forms intelligently, and be discerning about what and how you test – after all, test coverage is a tool, not a target.

Marc Mengler, ceo and cofounder of octomind

Marc Mengler
Co-Founder & CEO at octomind

Testing Pyramid:
An Evolutionary Tale

The inception of the testing pyramid

Questioning the pyramid - alternate shapes emerge

A sturdy foundation: unit tests

The under- and over-appreciated prodigy: integration tests

The rise of e2e testing

Only testing gives you the necessary confidence

sitemap

contact

legal

Octomind, Inc

Octomind GmbH

follow us

Testing Pyramid: An Evolutionary Tale

The inception of the testing pyramid

Questioning the pyramid - alternate shapes emerge

A sturdy foundation: unit tests

The under- and over-appreciated prodigy: integration tests

The rise of e2e testing

Only testing gives you the necessary confidence

sitemap

contact

legal

Octomind, Inc

Octomind GmbH

follow us

newsletter

Testing Pyramid:
An Evolutionary Tale