Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, three 9 reliability for a single server.

1-.001^3 = .999999999, which is under a second expected per year, which the client will never notice even with good monitoring tools, and therefore will never invoke the contract.



Your assuming independence to a level that does not exist. Consider a Y2K style bug in the OS would could take down all severs for an extended period of time. Or someone could write a virus that uses a zero day exploit etc.


I think it's more likely that the programmer screws up in keeping the separate servers independent through database migrations.


I am making no such assumption. See [1] in my original post. I already talked about intersection. Feel free to add Y3K to the list of nuclear war, chinese hackers, etc. The intersection is incredibly small, and not something that I am going to include in my back-of-the-napkin calculation.


Your failover code never has bugs?

See the linked discussion, only 3 out of 20 top sites had 5 nines.


For enough time and money dumped into code auditing and hiring smart people, no. Not that most companies should do that, but it is possible if you want to pay for it. Most companies (rightly) prioritize innovation, scalability and profit margins over absolute reliability.

How many of those top sites actually prioritized reliability? Is it even justifiable for their business models? I bet you can find a lot better reliability engineering in bank, credit, and stock systems. For example, when was the last time the Visa credit network crashed (as a whole, not localized outages)? Nasdaq?


Nasdaq states that their system has 4 nine's: http://www.nasdaqtrader.com/trader.aspx?id=tradingusequities




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: