Hacker Newsnew | past | comments | ask | show | jobs | submit | bombela's commentslogin

Indeed, Linux allows anything but "/" and "\0" in filenames. Those days its reasonable to refuse utf8 filenames. But one must always validate first!


> Indeed, Linux allows anything but "/" and "\0" in filenames.

For what it’s worth, NT allows any 16-bit quantity but L'\\' (0x005C) in filenames (even nulls); it’s the Win32 layer on top of it that imposes all the other weird restrictions and mappings.


The NT Object Namespace itself indeed has no restrictions on filename characters except for "\", but once you reach a real filesystem like NTFS or FAT, the forbidden characters continue to be blocked.

https://projectzero.google/2016/02/the-definitive-guide-on-w...


In summary, Unicode code points (characters) are 32 bit. JavaScript manipulates Unicode in utf-16 for historical reasons, because at some point before Unicode, 16 bit was deemed enough (ucs-2). utf-16 run length encodes Unicode 32 codepoints into one or two code units. Splitting in a middle of a codepoints produces one invalid half string, and one semantically different half string.

emojies are a sequence of Unicode codepoints producing a single grapheme. Splitting in the middle of a grapheme will produce two valid strings, but with some funky half baked emoji. So for a text editor it makes sense to split between grapheme boundaries.


> Unicode code points are 32 bit

21-bit, actually. It was supposed to be 32-bit, but UTF-16 caps out at 21-bit, so they lopped eleven bits of potential from Unicode (and UTF-8, so no more six-byte encoding).

> at some point before Unicode

No, in the early days of Unicode.

> run length encodes

Um… what? RLE is a data compression thing, UTF-16 has nothing to do with it.


> 21-bit, actually. It was supposed to be 32-bit, but UTF-16 caps out at 21-bit, so they lopped eleven bits of potential from Unicode (and UTF-8, so no more six-byte encoding).

Although, conveniently this means that UTF-8 bytes 0xF8 through 0xFF are always nonsense so the third party Rust type `ColdString` uses leading bytes 0xF8 through 0xFF in its 8 bytes of representation to indicate "I am an inline UTF-8 string, but, the UTF-8 starts in the next byte with a total length of N bytes" where N = byte - 0xF8

This leaves the continuation marker bits alone so ColdString can use those in that front byte to indicate "I am not actually inline data, I'm a pointer, rotate me so these indicator bits are my LSB and zero out them out to make me a 4 byte aligned pointer".

Which leaves all other 8 bytes values for the valid UTF-8 strings, which all begin with either ASCII or a byte between 0xC2 and 0xF4 inclusive.


>> Unicode code points are 32 bit

> 21-bit, actually

Less than that. https://en.wikipedia.org/wiki/Code_point#In_character_encodi...:

“The Unicode code space is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with 65,536 (= 2¹⁶) code points. Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112”

That makes it log(1,114,112)/log(2) bit. That’s about 20,09.

(https://www.unicode.org/versions/Unicode17.0.0/ assigns 159,801 of them to characters)


Don't know what you are being down voted (or my grand parent comment for that matter). You are very correct.


Sorry, I was thinking of 0x1FFFFF as the end, but it’s 0x10FFFF. Forgetful.

If you are going to be pedantic, go all the way. 2^21 is 0 to 2_097_151. Unicode codepoint range is 0 to 1_114_111, slightly more than 2^20 (0 to 1_048_575).

I would argue that Unicode v2 onward; circa 1991 (Unicode Consortium and the ISO/IEC working together); is what anybody knows as Unicode with the 0 to 1_114_111 codepoints easily manipulated as a 32 bit value.

I meant variable length encoding, RLE encodes a number of successive repetition indeed.


How much does it cost and where to buy? If I have to call, I cannot afford it.


Okay so click on the first link I gave you.

And then scroll down to the model you want.

The second to last column is price. Click on that (Or click on the first column where it has the model number. Either one works.).

Then click "add to cart".


Ah, only a single column for me. No ad blocker. The html source doesn't even include the word price! I must be getting a different page than you somehow.

I can click on the picture to get the details and volume pricing. The 300W model has no price and a minimum order of 30 units. So I thought it ended there.

But it recommends to check out https://www.powerstream.com/DC-PC-24V-400.htm where I ultimately found the add to cart for $300.


I find SMD the easiest. Using a pinecil soldering iron, or a cheap hot air station. Then through hole, because it's annoying to hold and flip the PCB around without everything falling. And finally soldering two wires is just an exercise in frustration. Helping hands or not, it's just plain annoying.


Another option for simplicity in dual stack is to assign visually similar addresses:

    - ipv4: 192.168.0.42
    - ipv6: prefix:192:168:0:42
I only do this for static/server machines, configuring Linux with a fixed ipv4, and append the fixed ipv6 host to the Router Advertisement prefix.


If I hadn't put my long-running machines' -er- ULA-derived [0] SLAAC addresses into my local DNS ages ago, I'd either do exactly that, or slice off the "redundant" parts of the IPv4 address off, so that I could choose to assign sixteen additional bits of addresses to each host. That is:

  - ipv6: prefix:192:168:0:42
would become

  - ipv6: prefix::0:42:[0-ffff]
[0] I'm really not sure how to succinctly say "The autonomously-configured addresses on my LAN's ULA prefix".


The almost imperceptible sliding took me w while to find out too. My phone was always on the floor any time I wasn't watching it. And if it vibrated, it was all of a sudden in a hurry to plung off tables. Active obsolescence I am telling ya!

Every morning I had to wakeup fast enough before the vibrating alarm would have it jump off my bedside table.

Of course the back cracked quickly with the constant falling. It eventually met its demise during a bike accident. I landed on a tiny rock that pushed through my pocket, exploded the glass, and ultimately broke the charging circuitry. You could see the hole through the front glass! And it was still playing music. At least until the battery died.


> Typography pet peeve, how do I disambiguate that dot?

I have resorted to "0.0.0.0".


Reasonable.

I wish there was a common convention for logical grouping like we have in math when disambiguating operator precedence with parenthesis, but those are already taken for asides in regular prose. Maybe curly braces?


`0.0.0.0`.


I still got corrupted metadata with metadata raid1c3 on btrfs on a power loss. I never had this happen with ext4 alone or atop Linux raid.

I want to be clear that losing (meta)data in flight during a power loss is expected. But a broken filesystem after that is definitely not acceptable.

Some postgresql db endedup soft corrupted. Postgresql could not replay its log because btrfs threw IO errors on fsync. That's just plain not acceptable.


I had a metadata corruption in metadata raid1c3 (raid1, 3 copies) over 4 disks. It happened after an unplanned power loss during a simulated disk failure replacement. Since manual cleanup of the filesystem metadata (list all files, get IO errors, delete IO errored files), the btrfs kernel driver segfaults in kernel space on any scrub or device replacment attenpt.

Honestly the code of btrfs is a bit scary to read too. I have lost all trust in this filesystem.

Too bad because btrfs has pretty compelling features.


There is nothing special about roman concrete compared to moderns concrete. Modern concrete is much better

The difference is that they didn't have rebar. And so they built gravity stable structures. Heavy and costly as fuck.

A modern steel and concrete structure is much lighter and much cheaper to produce.

It does mean a nodern structure doesn't last as long but also the roman stuff we see is what survived the test of time, not what crumbled.


> There is nothing special about roman concrete compared to moderns concrete. Modern concrete is much better

Roman concrete is special because it is much more self-healing than modern concrete, and thus more durable.

However, that comes at the cost of being much less strong, set much slower and require rare ingredients. Roman concrete also doesn’t play nice with steel reinforcement.

https://en.wikipedia.org/wiki/Roman_concrete


I think you are incorrect. Compared to modern concrete, roman concrete was more poorly cured at the time of pouring. So when it began to weather and crack, un-cured concrete would mix with water and cure. Thus it was somewhat self healing.

Modern concrete is more uniform in mix, and thus it doesn't leave uncured portions.


We have modern architecture crumbling already less than 100 years after it has been built. I know engineering is about tradeoffs but we should also acknowledge that, as a society, we are so much used to put direct economic cost as the main and sometimes only metric.


You would be very unhappy if you had to live in a house as built 100 years ago. Back then electric lights were rare. even if you had them the wiring wasn't up to running modern life. my house is only 50 years old and it shows signs of the major remodel 30 years ago, and there are still a lot of things that a newer house would all do different that I sometimes miss.


I've lived in a 100 year old house and and in a brand new house, they both had issues. That also both had advantages too. Oddly the older house had a better designed kitchen. Our lives change over time and our housing has to adjust to that too.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: