Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The testing pyramid should look more like a crab (changelog.com)
44 points by jerodsanto on Nov 3, 2020 | hide | past | favorite | 63 comments


This article seems to fundamentally misunderstand why the testing pyramid is a pyramid in the first place. You shouldn't have more unit tests because they're easier to write. You should have more unit tests because they're easier to maintain. By definition, they focus on a small part of the software and should not break if an unrelated part of the software changes.

Of course, just because each individual part of a machine works, it doesn't mean the machine itself works. That's where integration tests come in. Turns out it's usually fairly easy to make sure all the parts work together once you've verified all of the parts individually, so you usually need fewer integration tests.

Once you've verified all the parts work and that they work together correctly, now all you need to do is make sure you actually made the right thing. Enter functional tests and E2E tests. These are usually especially costly to maintain, but they're also so high-level that you usually need very few of them.

Writing tests is easy, but they're only useful if you run them. It's not enough for a test to work when you write it - it has to continue working into the future. Maintenance is a recurring cost and, in my experience, is where automated testing can get really expensive. The testing pyramid helps minimize that cost.

In an ideal world, you would maximize testing at every level - but in the real world that's often just too expensive.


Unit tests are not intrinsically easier to maintain. The lower level the test the more it reflects implementation rather than specification and hence the more changes it demands when refactoring.

The practical upshot of these types of tests is often that they catch very few actual bugs but nonetheless demand constant and expensive maintenance when code is changed.


I think of it as double entry bookkeeping. You have to say what you mean two different ways, and those two phrasings have to be reconcilable. This is still a useful guard against careless errors! But it is a different problem from checking whether your software actually does the right thing.

The magnitude of change needed in unit tests while changing code is definitely maddening. For example: add an argument to a method. It's called only 3 places in production code, but each of those sites has 10 test cases, so there are 30 mock expectations to update now. And the IDE doesn't understand those so no automated support.

On the other hand, integration and end-to-end tests will go red when (seemingly) nothing at all has changed, because of drift in tangentially related components from other codebases. Usually the fixes are simple, but it requires a willingness to drop what you're doing and investigate when they happen. Otherwise they just get ignored.


I think of it as writing everything twice to appease the unit testing gods. If done indiscriminately it really is more of a ritual to ward of the bug gremlins than an effective development practice.

Integration tests do suffer from a whole set of problems that unit tests do not but with one fundamental difference: they're usually problems that are soluble with better engineering techniques.

Integration tests that go red when something other than the code changes often provide useful feedback, so I wouldn't necessarily rule them out.


This is true if your units under tests are bad abstractions. If you don't know yet what you will build, if your requirements change all the time, good abstractions maybe rarely emerge and then maybe unit tests make less sense.

However, I've worked on a number of applications where there were very well-defined units that were quite stable in terms of their abstraction, and then unit tests are a joy. "I have this tree structure and I need to transform it into another structure" and stuff like that.

It's also worth keeping in mind that "unit testing" doesn't have to mean "class (or method) testing". Sometimes the class is the right level of abstraction but sometimes it can also be a small module of classes working in concert etc.

I agree that tests shouldn't simply reflect implementation, but there are functional requirements that can be specified at a lower level.


I agree. Good, stable abstractions covering complicated logic are where unit tests shine.

Unfortunately I find this scenario is also more the exception than the rule.

Big ball of mud apps with few effective tests are the norm and apps which do very little complex logic under the hood (focused instead mainly on hooking together services) are also not at all uncommon.


If I design tests to capture the spirit of the class / module, the line coverage is only 50-60%. In Go each method needs a lot of individual attention to exercise all the error return branches up to a 90%+ coverage standard.


I find that unit tests are much less useful than integration tests.

When we write unit tests, we ask the person to write the code to spend a lot of work documenting how they expect the system that they are integrating with to work. What happens when that system behaves differently? Or changes in the future? And the way that mocks are set up are very sensitive to implementation details, which is a maintenance burden when you have to change that.

As a simple example, I had a function that was loading data into a database with a series of insert statements. I sped it up by several orders of magnitude by using a COPY to send CSV data. Guess how well that worked with the mock database interface in the unit tests? (Long story short, it didn't. I replaced that unit test with an integration test.)

And yes, I've heard all of the lectures from test first folks. But somehow I've never had the experience of things working that way with my projects. (Possibly because most of what I do is close to the software/database interface.)


If it's talking to the database, it's not a unit test.

https://blog.metaobject.com/2014/05/why-i-don-mock.html


I'm aware of this.

If it is talking to a database, it's not a unit test.

If you've mocked out the database, it is a unit test.

My point is that the unit test version is a lot of work to create and tends to be useless. (Which is similar to the point that your blog link made.)


If you write your code in such a way where business logic and database access are separated you won't have to mock out the database. Business logic is where the Good Stuff is. We can trust that the database access package is working because we probably didn't write it and it probably has it's own set of tests.


If your business application deals with large volumes of data, trying to abstract out the database is a recipe for performance problems. Instead you want to push logic to the database.

To give a real example, I'm working with time series generated by sensors on factory equipment. They can easily have millions of records per day.


No. Stubbing out the database does not turn a test into a unit test , it turns it into a stubbed-out integration test. (Mocking is something different)

And since the point of an integration test is to test the integration, having a stubbed out version of that is fairly pointless, as you have discovered.

If you create actual unit tests, and allow those to drive your design, you will find much more joy.


It's not just that the article misunderstands that unit tests are easier to maintain (if well written at least), but they also don't seem to discuss the problem of combinatorial explosion that plagues end-to-end tests.

There are only two ways of avoiding that problem: ignoring it and simply give up on comprehensive testing, or realise that comprehensive testing can only be done by writing mostly unit tests.


End to end tests are less precise to triage when they break, and they break more often than unit tests. As the author said, they show all bugs, so they break a lot.

When the test is broken, it's not providing coverage until it's fixed. Which is why I advocate for less assertive testing in that case: https://assertless.org/

The priority of fixing end to end tests becomes critical, and so test maintenance is more important. With a large test suite of functional tests, that effort is not sustainable in my experience. I've been there. An image comparison suite of tests has the same issue, except it may defer all assertions, which is good, but it offloads a huge amount of analysis to the engineer consuming all the output. You really do need a human to check those tiny style changes it finds, a human with extensive knowledge of the product and it's ever-changing visual quirks. I also maintain a large visual testing suite like that.

Cypress is not different from Selenium in that aspect. (There's not much real difference at all that I can see.)

I think end to end tests have more of a value than the traditional testing pyramid would indicate, but there's pitfalls and benefits to each type. A suite of concise unit tests will find obvious bugs with precise granularity incredibly quick on the code they cover, which can never be complete coverage, but it doesn't need to be. This can be used for things like commit hooks that end to end tests just can't.


Unit tests don't just fail to find bugs because they don't give complete coverage (all testing has that problem, and the author is mistaken to say that end-to-end testing shows all bugs), they fail to find bugs resulting from faulty assumptions about how the units will interact. Boeing's faulty Starliner test demonstrated how that goes [1].

The true value in unit testing is in revealing the bugs it will find early in the process. One benefit of this is there will likely be fewer to be found in the end-to-end testing (or on deployment.)

There is no value in debating which is better. The important issue is how to allocate your testing budget effectively, because there is never even close to enough time to exhaustively test.

[1] https://www.engadget.com/2020-02-29-boeing-starliner-failed-...


Pyramid approach, inverted pyramid, crab shaped does not make sense because it tries to be catch all guideline. Same with any 'maturity models' for software projects.

People are forgetting there is something like 'risk based' approach to testing. This is the only way to allocate budget effectively, where stuff that can drop half of your database should be tested well and stuff that will make your UI look funny probably less.


I hate to say this, but I do think CMMI's model is a good idea at its core. The model itself is actually pretty solid. That is, everything it describes is something a good organization should do, and the requirement to document the processes + train people is a sound one.

But the appraisals and how most businesses treat it are, well, moronic.

The appraisals are easily gamed so that if you do everything on paper you're pretty much going to get your target level (3 is good enough for 99% of contracts if matters at all). And businesses try to treat the model as a process, when it's not.

The former means that the appraisal results are useless. You cannot judge anyone by being level 3 or 5. That often just means they played the game better, not that they're actually better.

The latter, though, is the worst part. CMMI is broken into a number of process areas (I'm happy to report I no longer recall the number of areas). It describes them in psuedo-process or procedural terms. That is, you could take what they describe and make a process, but it would be a very linear and simple process. What's intended is for you to say how your process maps to their model.

But organizations don't do that, they do moronic things instead like scrap the above and write a whole new document that's 40 pages of nothing but plagiarizing the CMMI book saying how they do things, but it's almost entirely a lie because no one (except the sacrificial team sent to be appraised) actually does it that way.


> and they break more often than unit tests.

That isn't my experience, unit tests breaks all the time and has to be rewritten from simple refactoring while larger tests pass as long as you keep behavior the same. This means that the bigger tests are way better for refactoring and safely making changes to your code. If most of your tests are unit tests and you don't have bigger tests covering the same things then refactoring safely is hell.


If your unit tests are breaking all the time, try testing less at the unit level and more at the integration level. There are a lot of dogmatists out there who preach that unit tests should have "X % coverage" no matter what, but that's a ridiculous claim. Different projects will require different mixes of testing strategy, and part of the engineering process is arriving at your verification and validation methods, codifying them, ensuring that they provide valid output, and making sure you're using your time efficiently. If most of your time is spent fixing tests, re-evaluate whether those tests are worth doing that way. Your time might be better spent testing at a higher level or even manually.


Right, I never write traditional unit tests, I just test the API boundaries as you said. Sometimes it makes sense to draw a new API boundary in the code for some complex logic that is unlikely to change, but most of my tests looks pretty close to others integration tests.


Hum... Depending on exactly what you test, it's not only the coverage that increases with a larger scope, but the tests also break less.

If you engineers things for testing¹, e2e tests need much less maintenance than unit tests. Yes, their failure is also more severe, but overall, they are a large gain on that area.

But then, you may discover that your current project just can not be adapted to enable e2e tests, like some projects can not be adapted to enable unity or integration tests. The single size preaching is completely flawed.

1 - Not on the ways the unity testing fans uses that phrase obviously, but if you keep your (human and machine) interfaces stable and generic enough to survive some software evolution.


It is easy to triage end to end tests: the ONE change you made must have broke something.

You run into trouble when the test suite takes long enough to run that you forget about ONE change and make a lot of them before testing. If that is the case you are right, the more the test covers the harder it is to triage.


I work in Testim.io and I want to say it's much easier to triage and fix e2e tests than unit tests in the CI when using such a platform.

Most people just don't but lots of large companies like Microsoft and Salesforce do


> There's not much real difference at all that I can see

Not with a tickbox list of features, except that Cypress is insanely nice to use, whereas Selenium is fiddly and unpleasant.


> "And when we look at what Cypress allows you to do (which is write many useful tests that have very little flake) then you wanna write more end-to-end tests."

Very little flake? That's not my experience with Cypress. The Cypress e2e tests on our build server sometimes fail for no discernible reason. Then you run them locally, and different tests fail, sometimes for understandable reasons (but then why didn't they fail on the server?) or sometimes also for no clear reason.

I'm spending a lot of time maintaining and fixing our Cypress tests, and the work they add seems far worse than the help they provide. Most of the time, a failing e2e test is not a problem with the code, but a problem with the test.

Honestly, I find myself longing for the days of Protractor. Horrible to set up perhaps, but in my memory of it, it was reliable once you got it working.


I've found 2 reasons that e2e tests are flakey (whether it be selenium, puppeteer or cypress):

1. poorly developed tests. There are nuances that go into writing browser interaction tests that the less-experienced will have to work through.

2. There's a weird functionality implemented in such a way that makes testing difficult. This could be something like a modal that has a fly-in animation and if you try to click it while it's animated it won't fire the onClick action.

If you want e2e tests to be successful, you have to write testable front end code, which is pretty dang hard for a lot of reasons.


I’ve seen the same at several different companies. There seems to be just too much that can go wrong during an e2e test like this: network failures, slow CPU causing timeouts, etc. At one company, the entire suite of 700 tests was re-run 3 times nightly, taking 5+ hours, and some tests still failed randomly.

I have found that the majority of errors can be caught by unit and integration tests on a single service, and that the extra burden of maintaining Cypress just isn’t worth the small amount of coverage.


> The Cypress e2e tests on our build server sometimes fail for no discernible reason.

You'll probably be interested to hear that Cypress are... uhm... addressing this by adding "flaky test detection" and some amount of automatic retrying in v5.

We use Cypress at work, we have flaky test problems, but I'm not convinced much of it is actually Cypress's fault. As others have mentioned, writing front-end code to be testable by E2E is not necessarily straightforward, and when you're trying to add tests to a big existing codebase I think you kinda have to accept some flakiness and retrying.

What I've been wondering about recently though is: if you construct applications in a certain way do you still need real E2E tests? For example, React is declarative, and with a state-management system like Redux shouldn't you be able to get the same kind of assurance that E2E gets you without needing an actual browser? Maybe things like animations, visibility etc are where that comes unstuck?


As I remember it Protractor was just a wrapper around WebDriver with some glue code to wait for Angular doing its thing after data changes. If you're using e.g. React or anything else that does not do dirty checking, you should still be able to use WebDriver.

(That said, that's certainly still flaky enough.)


The cool part was that all of the extra JS Protractor used (sometimes for native WebDriver tasks) made things _less_ reliable.


> Well, the unit tests can find you a few logical errors, which is great. I write unit tests for that all the time. But all possible sources of error are discovered by end-to-end tests.

Citation needed, sounds anecdotal. If this really is how your app behaves then you're going to run into problems eventually because end-to-end tests are slow as well as difficult to write+maintain, and you'll need hundreds (if not thousands) of them to cover all the important business logic in your app. This will slow down your developer productivity (PRs take long to merge because people wait for CI to finish) and ability to deploy quickly.

If your code is well-structured, your important business logic should be easily testable with unit and/or integration tests, and you don't need an end-to-end test to discover that, for example, a calculation is wrong. The pattern of functional core/imperative shell embodies this really well, I think.


> end-to-end tests are slow as well as difficult to write+maintain

Citation needed, sounds anecdotal.

End-to-end tests should run as fast as any given use case (for most software, that means relatively quick), plus some set-up time as testing overhead. But that testing overhead is often self-inflicted and can be managed. I have anecdotes to prove it.

For the speed to write end-to-end tests, shouldn't they be the easiest to capture? I need the system inputs and I need to capture the system outputs. If you imagine usage-driven development, then the last step is to capture your initial state, use your new code, capture the final state and voila, here's your new end-to-end test.

> your important business logic

Consider the reality that no one cares about your backend. No one. Is it correct enough for the application is the only business question concerned with the backend (and most applications have a HIGH fault tolerance).

What does everyone care about? Your frontend. The colors, dropdown box or search field. That column should be over here. Focusing your testing on the parts of your application people could shrug about is not sound reasoning.


I'm pretty sure more people care about the VAT being calculated correctly than about whether it's displayed on the left or on the right.

You're taking a reasonable point of view - code matters only inasmuch as it has an effect on users - and take it to an absurd extreme.


Then you misunderstand. You are right that the VAT or similar financial matters require a very high degree of precision and accuracy. Physics simulations and similar computations even more so. These applications have a very LOW tolerance for error and -- rightfully so -- tend to have a lot of attention paid to their backends.

I contend that most applications are not these. I'll put it forward as professional experience that even financial adjacent systems have a surprising degree of error tolerance. Our engineers/scientists are constantly striving to improve correctness because that's what engineers do but it's plainly not a priority for our bottom line. Countless errors have been caught in the last year that made individual results quite questionable. But on the whole, the system and its results are accurate enough that we have plenty of customers and business keeping us busy.


>If your code is well-structure

You've alluded to the real benefit of end to end tests here, IMHO. They make very few demands about how your code is structured. This is fantastic if you've just been handed a big ball of mud.

There is a definite trade off between faster running tests and tests which don't need to be rewritten every time you change some code.


Is that a benefit of end-to-end tests? Seems more like encouraging more unit tests would discourage bad coding habits if anything.


As always in these discussions I like to remind everyone that you can have good and bad implementations of everything, and just doing unit testing doesn't make your engineers suddenly understand how to write testable code. So, you can well end up in a situation with a reasonable amount of code coverage, but not quite as high as you might like and when someone invariably digs into why they find while parts of the system are testable, parts of the system aren't. Now you're left with a large test suite that provides marginal value because it doesn't give you confidence your code can be deployed if just because it's passing, but the shite engineering practices heavily test implementation details which make maintenance slow to a crawl.

Good engineering is what creates good engineering. What you actually do, I suspect matters a lot less.


Yeah there's no silver bullet for any of these problems probably.

That said an end-to-end test seems just as likely to merely test implementation details. You could have all kinds of changes in behaviour without necessarily doing anything incorrect.

I suppose the point is that you need to test where you can define what correct behaviour is. And if this is only possible for end-to-end tests then something is screwy.


I don't find it does discourage bad coding habits. It just amplifies the problems they cause.


Well I suppose any form of discouragement is pointless if it is not recognised and acted upon.


I prefer to think of the vertical dimension of the testing pyramid as time not test type. Nobody can agree on what the different between a unit test and an integration test is, much less end to end. Time is objective though.

At the bottom of my pyramid I have the tests that I always run - even if there is no way they could break (I just changed the IP network stack, there is no way the add() function could break - but I'll run that test anyway.

Then I have the tests that run pretty fast, but I find it worth some effort to add some dependency analysis to ensure I only run them when it is possible my code broke them.

Next there are tests that are slow, so people working in the area it tests run, but everybody else disables them. These tests are just useful enough that the group that runs them doesn't stop running them, but not useful enough for any other group working on a different area of the code.

Then there are tests that a very very slow, so I only run them when I am specifically worried I might have broke something that the test covers (typically never because you forget about them when you should, but there are just enough exceptions to get a mention).

Last are tests that I only look at after they have failed 3 times in a row on the CI system, and then I'll spend a week wildly/blindly throwing possible fixes at the CI system before I break down and run them myself.

Note that last two are for the CI system to run, not humans. In every case I've seen they take long enough that you have to combine the changes of several people before running them just so the CI system can keep up. They are still valuable to have because some important things cannot be tested any other way. However where possible you should find a different way to test that functionality.


By moving the 'meat' of the tests up high into the hierarchy, the author has just re-invented the testing ice cream cone with a different flavor.

In software, we can make pretty much any process work for 18 months, before it starts to fall apart. If you don't stay at a company for at least a couple of years after they start doing testing in earnest, you won't really see that what you're doing doesn't scale/isn't resilient to changing requirements.

I have watched so many people try to rescue deeply coupled integration or E2E tests and it's just painful to watch. It's a deadly cocktail of cargo culting and Sunk Cost Fallacy - we don't know for sure all the corner cases these tests cover, so we aren't going to delete them and lack the confidence to rewrite, so we'll spend all day trying to fix them, and if that doesn't work, we'll pair with someone for day 2 to get it fixed. That's 3 man days, for a handful of tests. I've seen it many times, on different teams, rarely are there enough other people noticing how crazy this is to stage an intervention. It's crazy.

A contributory reason to why fixing such tests takes so long is that they're so slow. Slow tasks have poor feedback loops. Testing, at least when done as part of CI/CD, is meant to provide fast feedback, and E2E tests fail at this (most especially at 18 months and beyond, where your E2E test is one of hundreds).

There's a physics and a psychology to the tiers in the testing pyramid that I meant to write up publicly but I don't think ever escaped a corporate wiki. Here are the Cliff's Notes, based on my own metrics but corroborated by the handful of people who've inspired my testing journey:

1) Moving tests down a tier reduces the power of the test, so you need more of them (about 5x)

2) Moving tests down a tier makes them much faster. (8x common, 10x best case, depending on framework)

3) The simplest tier of tests will be rewritten or deleted when requirements change. All other tests will be 'rescued', sometimes at great expense in time and energy.

Rules 1 & 2 create a pseudo-rule, the 5/8ths Rule. If you move a functional test to units, the same coverage will run about 30% faster in aggregate, and you will not have to provide ongoing support for those tests. That's a huge win. If you pull a test down 2 levels, they'll run 60-75% faster.


What we really need is a giant flow chart that factors in every single thing that should go in to the calculus of how to structure your testing.

Blanket strategies and approaches are just never going to be right for all codebases and companies. There probably are aggregate rules which kind of generalize, but I think the actual system under test and the organization it sits inside is an incredibly important consideration that needs to be taken into account.

I'd love to see how everyone's different advice pans out for all possible permutations of org and codebase they could be tried against.


This got so long I decided to split it in two:

It is important to note you can't pull all of the tests down. I tend to use a plumbing analogy that I probably have stolen from the literature or a mentor - It's likely that QA already tested all of the pipes at the factory before they were installed in your house. But every plumber will turn on the taps and flush the toilet at once before they leave, to make sure the stuff going in one end actually comes out the other (and only the other). Test all the bits in isolation, but always leave at least one happy and one unhappy path test going through the whole stack to make sure good and bad answers end up in the right place. I think here the pyramid analogy is a bit wrong, because in the pyramid there's a geometric reduction at each level, but I have a suspicion that in a best case scenario, you can achieve a log scale reduction (because your E2E tests can potentially reflect the height of the decision tree, rather than the surface area of a module)

Tests higher up the pyramid rarely if ever suggest, let alone demand API improvements. If the first person writes elaborate mocks instead of simplifying the design to not need them, they've locked in the implementation in a way that unit tests don't (Sunk Cost). This can turn a masochistic programmer into a sadist.

And that is the trap most of us fall into the first few times, and some never outgrow. It's easier to write greenfield E2E tests, but they're hell to maintain, and the create a codependent testing relationship, where the existence of the tests dissuades you from engaging in healthier activities. I almost think that you should ban E2E testing frameworks until the team has gotten comfortable with Unit Tests because of this. With E2E, you can get really far down the wrong road and find yourself in a blind alley that you now lack the imagination or conviction to escape, because everything else feels remedial. That is the main problem with unit tests - they feel like some sort of grade school activity. Doing something so simple feels like doing multiplication tables, which is kids work. I am a smart and functional adult, I shouldn't be doing something so boring. This is making me feel dumb.

It takes a special disposition to enjoy a simple task for what it is, and many software developers - myself included - were lured into software by the idea of automating any task that is so simple. Rarely do I let production code be so dominated by idioms with so little meat, so why do I enjoy that in tests? Honestly, I don't have an answer, except "better in the tests than in the production code," which is hardly an answer at all.

And importantly, all of these uneasy feelings are used as evidence by anti-testing people that it's too painful and too little gain and we should stop doing it. A form of learned helplessness.


> If the first person writes elaborate mocks instead of simplifying the design to not need them, they've locked in the implementation in a way that unit tests don't (Sunk Cost). This can turn a masochistic programmer into a sadist.

You mostly use elaborate mocks for unit tests. There is much less need for mocking in integration tests.


Don't do 'unit tests' at all. Your sweet spot in testing should be the fastest possible 'semi-integrated' (real clients, faked services) tests that can be done reproducibly and hermetically, testing specific end-goal functionality.


I would agree because I see my devs writing unit tests for plumbing code that does not have any logic. I am annoyed by that. So I am 'party pooper' when it comes to unit testing. But I would like to have unit tests for something that has calculations or non trivial logic only.

Other thing I would really like 'risk based' approach, like what is the worst thing that can happen with that code. If it can drop half of the records in table, yes do checks and unit testing. If it is going to be something in the interface not displayed or displayed incorrectly then meh.


There is some value in unit tests for plumbing. Something I see a lot is a set of values being transformed across several domains, losing data at each step. A very simple test that inputs values on one end of the pipe, and checks the values on the other end of the pipe, makes it very easy to be sure that you didn't drop something.


If you transform values, then it has logic, so it does not fit in my definition of simple plumbing.

If you check that integer value AgencyId is passed to another object as integer value with the same name of AgencyId that is basic plumbing that should not be covered by unit test.

Of course someone might forget to wire that up somewhere, but it is not something I would say writing test and then maintaining it costs more than just fixing it when you find it and once it is fixed there is no real reason to keep regression on it.


> real clients, faked services

I'm curious what you think this looks like in practice. I've had success with these kinds of tests, but only for very simple external services. For example, writing a fake elasticsearch backend for a few known inputs and outputs might be a good fit. But what about an RDBMS? You can't realistically write a mock backend that can talk to a real client.


As much as it's reasonable, of course. If it's impossible to spin up a mock service or a DB, then client fakes can be used at the cost of narrowing the test's coverage and lowering authoritativeness.



I'm co-founder of an end-to-end testing product, but I don't actually agree that better end-to-end testing coverage means you should create less unit tests. The reason why is because there's a class of things you want to test that are much better served at the unit testing level. Unit tests are always going to be faster to execute and by definition have a much smaller scope, which is helpful in reasoning about what you're testing.

Our product is really focused on making end-to-end tests much easier to create and maintain. Here's an example video I recorded yesterday of automating Monday.com's sign-up flow using our product: https://www.loom.com/share/1336a5bfa9f54f2190269961991802b5 - looking at this I don't really see what unit tests this would replace, rather I think it gives coverage for things that you'd often have no automated coverage for - i.e. the interplay between disparate components along with interacting with email - basically the steps that form a complete workflow from the user's perspective.


The author talks about how many other sources of bug there might be outside of your code like integration, assumptions, compilers etc that e2e tests will help catch.

Well at least a large portion of those bugs might be able to be caught by unit tests! No reason to write less of those, having good unit and integration tests still make you more confident about your application. And because they are faster to run the feedback loop when they break is also faster


Carcinisation truly knows no limits


Science Twitter: "Everything is crab."

HN two weeks later: "What if testing were crab?"


For those not on Twitter, this went viral a couple weeks ago: https://en.wikipedia.org/wiki/Carcinisation


I write a bunch of e2e tests, and not many unit tests. I'm on the fence about if this is a good idea or not. I feel like when you are starting at zero, you need more e2e tests, before writing more unit/service tests as your product is solidified.

I wish e2e tests were faster. I feel like this should be achievable, but it's a problem way out of my depths. I feel like you could run a browser within a docker container at 2x or 5x speed with enough virtualization. I've been told this would not work in practice.


e2e testing is the way to go. But our current tech stack is really bad for testing. Testing in browsers seems an afterthought, browsers first were designed for features, later on debugging was added, quite good nowadays, but testing is out of the scope. Until we get easy to test browsers everything will continue being fragile and bug prone.


Tests should document behavior and aid in refactoring and maintaining boundaries. E2e tests don’t document modules (functions/types), only the highest interface, and it doesn’t maintain any separation of layers etc.

All kinds of tests are needed, but I find the pyramid arises from the fact that there are a lot more low level units than high level behaviors.


This is why humans need to test end-to-end when there is a UI component. Trying to automate tests on a GUI is the most torturous aspect of software with no long term success unless your GUI is primitive and never changes. It’s easier to just send it off to a low-cost center and have them hand test various scenarios and create a report.


Don't forget that generally e2e tests take longer to run and typically require more test assets.

For example, if e2e tests involve simulations running in real time, then by increasing the number of distinct e2e tests you also increase the requirement on test assets and/or overall CI duration.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: