Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Been through this recently in a fairly large enterprise

We have some in house software which runs in k8s. Total throughput peaks at about 1mbit a second of control traffic - it's controlling some other devices which are on dedicated hardware. Total of 24GB of ram.

The software team say it needs to run across 3 different servers for resilience purposes.

The VM team want to use neutronix as their VM platform, so they can live migrate one VM to another.

They insist on 25gbit networking, and for resilience purposes that needs to be mlagged

The network team also have to have multiple switches and routers, again for resilience.

So rather than having 3 $1000 laptops running bare metal kubes hanging off a pair of $500 1G switches eating maybe 200W, we have a $140k BOM sucking up 2kW.

When something goes wrong all those layers of resilience will no doubt fight each other. The hardware drops, so the VM freezes as it restored onto another host, so K8s moves the workloads, then the VM comes back, the k8s gets confused (maybe? I don't know how k8s works).

It's all needlessly overspecced costing 30 times as much as it should.

But from each individual team it makes sense. They don't want to be blamed if it doesn't work, they don't have to find the money. It's different departments.



One of my favorite bits of hardware is a UPS. I’ve played with several over the years, from fancy server-grade rack-mount APC stuff to inexpensive edge stuff. Without exception, downtime is increased by use of a UPS. I used to plug a server with redundant PSUs into the UPS and the wall so it could ride out UPS glitches.

Even today, a UPS that turns itself back on after power goes out long enough to drain the battery and is then restored is somewhat exotic. Amusingly, even the new UniFi UPSes, which are clearly meant to be shoved in a closet somewhere, supposedly turn off and stay off when the battery drains according to forum posts. There are no official docs, of course.


Sounds like crappy UPSes. Even the cheap old used eBay Eaton UPSes I have in my homelab have a setting for "Auto restart" and the factory default setting is "enabled".

But even rackmount UPSes are more of an "edge" sort of solution. A data center UPS takes up at least a room.


I assume that datacenters UPSes are better, but I’ve never used one except as a consumer of its output.

But I’ve had problems with UPSes that advertise auto-restart but don’t actually ship with it enabled. And that fancy APC unit was sold by fancy Dell sales people and supported directly by real humans at APC, and it would still regularly get mad, start beeping, and turn off its output despite its battery being fully charged and the upstream power being just fine (and APC’s techs were never able to figure it out either).


> I assume that datacenters UPSes are better [...]

I don't know about specific datacenter models, but in our colocation there are humans available 24/7. So the UPS might not start after failure, but there's a human to figure it out.


Most (all?) decent datacenters also have generators on site, and the intent is that the UPS will never run out of charge. So the fully-discharged case is an error and it might be intentional to require intervention to recover.


Yeah, some people treat UPSes as "backup power" but that's not really what they're intended for. Their intended purpose is to bridge the gap during interruptions... either to an alternative power source, or to a powered-off state.


Sure, but when you stick a UPS in the closet to power your network or security cameras or whatever for a little while if there is a power interruption, you expect:

a) If the power is out too long for your UPS (or you have solar and batteries and they discharge overnight or whatever) that the system will turn back on when the power recovers, and

b) You will not have extra bonus outages just because the UPS is in a bad mood.


I completely agree with B. But alas, people love buying shitty cheap UPSes.

But A is along the lines of the misconception that I'm referring to... There should be no such thing as "the power being out too long for your UPS". A UPS isn't there to give you a little while to ignore the problem, it's there to give you time to address it. Either by switching to another source of power, or to power off the equipment.

Now, the reason that every UPS that supports auto-restart has it as a configurable option, is because you often don't want to do this for many reasons, e.g.:

* a low SOC battery could not guarantee a minimum runtime for safe shutdown during a repeated outage

* a catastrophic failure (because the battery shouldn't be dead) could be an indication of other issues that need to be addressed before power on

* powering on the equipment may require staggering to prevent inrush current overload

The whole use case of "I'm using the UPS to run my equipment during an outage" is kind of an abuse of their purpose. It's commonly done, and I've done it myself. But it's not what they're for.

But also, if you want a UPS that auto-restarts -- they exist -- but you get what you pay for.


Some of these is IMO a bit silly:

> a low SOC battery could not guarantee a minimum runtime for safe shutdown during a repeated outage

A lot of devices are unconditionally safe to shut down. Think network equipment, signs, exit lights, and well designed computers.

> a catastrophic failure (because the battery shouldn't be dead) could be an indication of other issues that need to be addressed before power on

This is such a weird presumption. Power outages happen. Long power outages happen. Fancy management software that triggers a controlled shutdown when the SOC is low might still leave nonzero remaining load. In fact, if you have a load that uses a UPS to trigger a controlled shutdown, it’s almost definitional that a controlled shutdown is not a catastrophe and that the system should turn back on eventually.

All of your points are valid for serious datacenter gear and even for large server closets, but for small systems I think they don’t apply to most users, and I’m talking about smaller UPSes.


> > a low SOC battery could not guarantee a minimum runtime for safe shutdown during a repeated outage

> A lot of devices are unconditionally safe to shut down.

Yeah, but that doesn't mean you want to expose them to brownout conditions when your UPS is depleted. If the power is continuing to flip on and off, it's better to just leave it off if you don't have the battery to prevent even short interruptions. A good UPS can do this automatically for you. A cheap one will just stay off and let you respond to the outage.

> This is such a weird presumption.

It wasn't a presumption I was making for all users -- but an example of why some users might not want auto-restart as a feature. Of course, if you want auto-restart as a feature, you can buy a UPS that has it as a feature and turn it on.

> they don’t apply to most users, and I’m talking about smaller UPSes.

Yeah, I know the situation: Someone has a network closet on a budget with a UPS they've sized to get them a few minutes of runtime. They put a UPS on the BOM because it checks a box. So they buy a low-end UPS that either doesn't have the feature, or it doesn't work right.

The solution is just to buy the right UPS for the thing they were trying to do... and test it.


The funniest thing about huge enterprises is that they often have processes so convoluted and restrictive for everything, that getting stuff done by the book is basically impossible, so people get creative with the limitations and we often end up with the sketchiest solutions in existence.

I hope the words 'web server hosted in Excel VBA' illustrate the magnitude of horrors that can emerge in these situations.


Raspberry pi on a network controlled power supply to rebroadcast udp broadcast traffic across subnets


I saw an entire physical switch configured for bridging VLANs. It was even labeled as such. 802.1q is hard and confusing if you don't know what you're doing.


which is exactly why this being different departments makes no sense

one infra team - provides the entire platform

any other approach and you’re dicking around




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: