Building the next generation file system for Windows: ReFS

ComputerGuru · on Jan 16, 2012

The NTFS features we have chosen to not support in ReFS are: named streams, object IDs, short names, compression, file level encryption (EFS), user data transactions, sparse, hard-links, extended attributes, and quotas.

Of these, I'm sorry to see the demise of sparse files. This was, IMHO, the single most under-utilized feature of NTFS, and I was able to integrate support for sparse files into a number of clients' applications (I'm a low-level consultant and developer) to great effect. While the increasing size of volumes along with the sub-par utilization of this feature makes it an obvious victim when creating a new filesystem and looking for features to drop, sparse files can be amazing for other reasons.

One of the advantages of sparse files is that they can be used to naively support certain seek-related behaviors. If you create the file right, you can save yourself a lot of code and complexity in any applications consuming that data.

The biggest advantage of sparse files though is speed. For instance, you can create a container file of X size filled with zero bytes, and only use as much space as the end application requests (for example, creating a virtual disk of 2TB that only takes up 100MB on disk).

I, for one, am sad to see this feature go. For anyone interested in this amazing feature, have a read here: http://www.flexhex.com/docs/articles/sparse-files.phtml

mbell · on Jan 17, 2012

I think i'll miss hard links the most. I recently switched from linux to windows due to better display management and it seems that many programs fail at handling symlinks in a reasonable way, only hard links make them act rationally.

ComputerGuru · on Jan 17, 2012

I was going to mention that in my comment, but I refrained for a number of reasons: a) old non-symlink-aware Windows software will treat symlinks as hardlinks (by and large), b) it's very possible that hardlinks will be present in ReFS in one form or the other as Microsoft has a long history of using a dozen different names and implementations to create hardlink-like behavior over the years, and c) the only apps that can mishandle symlinks are those that were written with symlink support in mind, but they mangled it and did a bad job pulling that off - which, while very much lamentable, cannot be blamed on MS.

Honestly, any serious application that doesn't take symlinks into account in 2010s is a joke. Unfortunately, in the Windows world, there's a lot of them. Even hard-core backup applications (I've consulted for a few backup companies) mess this one up.

Hardlinks were the precursor to softlinks, in this day and age their only purpose is to let you say "I've given up on the software I use handling soft links properly," and while we can wish for such a feature, I don't think it's that bad of a decision to drop them.

mbell · on Jan 17, 2012

>Honestly, any serious application that doesn't take symlinks into account in 2010s is a joke.

I agree, but it doesn't stop me from having to use these programs where they clearly kill their competitors in terms of features or usability.

Not to start a fight but windows programs take a serious step back in ease of use (for me) vs their linux counterparts. I work from home so my home PC = my work PC.

Unfortunately I'm a massive fan of multiple displays (5 currently) in various orientations and linux epicly fails at this. I used linux as my primary OS for 7 years but finally gave up over this singular issue.

keeperofdakeys · on Jan 22, 2012

Softlinks aren't just a different version of hardlinks, they have totally separate semantics. Hardlinks allow you to link multiple files without worrying if the original is destroyed, softlinks link to a file location. Take backups for example, I can have versioned backups and hardlink identical files to a previous version. The advantage of hardlinks is that I can treat the backups as fully independent, and use them with any program. If I were to use softlinks, or even deltas, I would need to use the original software to delete, or even extract the contents. With hardlinks, I can delete and copy the backups with any application. (Rsync and time machine actually implement this mechanism).

CPlatypus · on Jan 17, 2012

I agree that it's a strange set of choices. Eliminating quotas and compression might be OK for something positioned as a basic filesystem, but for something billed as a "next generation" filesystem it seems odd. Eliminating extended attributes is an even bigger step backward, because they're such a useful building block for other OS features (e.g. look at their use to in Linux to support ACLs and security labels).

Failing to support sparse files, though . . . man, that's just insane. That would relegate ReFS to the status of a toy in most filesystem developers' minds, even before you consider their increasing usefulness when storing virtual-machine disk images in a shared filesystem to support migration, etc. It's hard to imagine that none of the many people who must be involved in this at MS raised the red flag. What seems more likely, from what I know of MS culture, is that some people did raise it but then some idiot dictator with a reputation built on some long-irrelevant project ignored or dismissed their objections.

mappu · on Jan 17, 2012

I'm pretty sure Microsoft's VHD file format has built-in support for sparse volumes... maybe it was a case of since they didn't need it, they could trim everyone else's advantage.

Since ReFS doesn't look to be in the client version of Windows 8, though, i don't think it'll make much difference to applications - and i expect they'll add some of these features back before Windows 9. Hopefully they have a more transparent versioning system than NTFS this time around

sfoskett · on Jan 17, 2012

If ReFS is really in the Server version of 8, and if it really doesn't support sparse files, consider the implication of built-in deduplication. It does sparse files at a layer above the filesystem, along with dedupe and compression. Or, as mappu points out, ReFS could assume it's running on VHD/CSV, punting those features to a lower level.

Perhaps Microsoft is making a decision to focus the filesystem on being a simple storage engine and moving features into other modules (primarily) above or even below it?

CPlatypus · on Jan 17, 2012

Dedup isn't quite the same as sparse files. If the filesystem is unaware of dedup, then it still has to allocate its own structures corresponding to the dedup'ed space, seeking to the next hole or next allocated block won't work (not that many applications are smart about holes or that MS didn't already suck in that area), etc. Dedup - even in its most absolutely simple form of just detecting zero blocks - does at least avoid the fatal problem of allocating actual disk space uselessly, but sparse files are still basically a filesystem problem and need to be treated as such.

Craiggybear · on Jan 16, 2012

I'd agree with you about sparse files. I find it hard to imagine a modern file system without support for them.

Someone · on Jan 17, 2012

Some loose remarks:

- named streams are out => it becomes unlikely that we will see these become popular on any OS (because being incompatible with the market leader is problematic; see Mac OS X, .DS_Store). I find that a pity.

- I guess quotas are out because there will be something else replacing it?

- Can anyone explain why a modern filesystem should have a limitation on path length? For APIs I can understand it because the standard C library thinks paths are fixed-length, but for file systems? I would think this complicates the implementation, as every directory would need to know the length of the deepest path below it (in case one attempts to rename it). Aggregating that info upwards whenever a file is created or renamed (let alone deleted) cannot come for free, can it?

wcoenen · on Jan 17, 2012

> I guess quotas are out because there will be something else replacing it?

If it works like ZFS, then you will be able to create as many filesystem volumes as you want on top of a storage pool, each with their own size limit. You could create separate file systems for each user.

> Can anyone explain why a modern filesystem should have a limitation on path length?

The 32K limit doesn't really matter, because we're effectively stuck forever with the much lower and older limit of 260 characters. Longer paths would crash existing applications who use 260 byte buffers to store file system paths.

http://serverfault.com/questions/163419/window-256-character...

wvenable · on Jan 17, 2012

Named streams were always problematic especially in a connected heterogeneous computing environment. I'm not sad to see them go.

Someone · on Jan 17, 2012

But that is chicken and egg. I would rather see a world where we worked on solving those problems, instead of giving up on them. On the good side, they are dropping support for 8.3 filenames (but it will be interesting to see how they solve the 'copy from this new file system to FAT' problem)

keeperofdakeys · on Jan 22, 2012

> Can anyone explain why a modern filesystem should have a limitation on path length?

Performance and verbosity of code. Having a limit allows the creation of fixed-sized structures, which makes the code much simpler (remember, code has bugs, so more code is more bugs). It also makes the generated machine code much simpler, leading to better performance.

jensnockert · on Jan 17, 2012

I guess it is the classic "You can create files with any name but don't ever try to access it" that happens in Windows with extremely long names.

adgar · on Jan 17, 2012

> Aggregating that info upwards whenever a file is created or renamed (let alone deleted) cannot come for free, can it?

Not free, but it's not asymptotically significant.

tlb · on Jan 17, 2012

Even when you reparent a directory? Reparenting a directory to a longer prefix would have to sometimes fail.

blibble · on Jan 16, 2012

sounds very much like the "current generation" to me, ZFS has done just about everything that article covers for a while, and it supports most of this too:

"The NTFS features we have chosen to not support in ReFS are: named streams, object IDs, short names, compression, file level encryption (EFS), user data transactions, sparse, hard-links, extended attributes, and quotas."

zdw · on Jan 17, 2012

You can also actually boot from ZFS (the code for this is actually GPL'ed in grub), and many of the caching features like ZIL and L2ARC are "third party opportunties" whereas they're built into ZFS.

I wonder if/when this will actually take off - most Microsoft "edge case" solutions have trouble gaining adopters. If it gets boot off mirror support, it has a chance.

1010011010 · on Jan 17, 2012

Too bad they can't just use ZFS, but, NIH and all.

icebraining · on Jan 17, 2012

I think Microsoft's legal department would freak out about having a (even if "slightly") copyleft licensed software in the Windows core.

Game_Ender · on Jan 16, 2012

He closes with: "We believe this significantly advances our state of the art for storage."

I don't think that's true at all. As others have mentioned, it appears they are matching the state of art achieved by ZFS.

rcthompson · on Jan 17, 2012

He says our state of the art. I guess by that he means the state of the art of storage in Windows file systems.

daniel02216 · on Jan 16, 2012

I'm not sure I see the difference between a log-structured file system and what they have proposed for their robust disk update strategy, especially when you add integrity streams into the picture. Anyone with more filesystems knowledge than me want to clarify this?

obtu · on Jan 17, 2012

Their approach looks a lot like BtrFS: everything in a BTree, append-only, with checksums.

jensnockert · on Jan 16, 2012

Seems very cool, the only problem seems to be that it isn't bootable. I hope that this might get the Linux folks a bit more serious about modern resilient filesystems.

mbell · on Jan 17, 2012

Most or all of what was discussed is supported by ZFS (which you can use via FUSE in linux) and the vast majority is either already available or part of the planned feature set for btrfs, which will probably be the primary FS of linux moving forward.

The 'Linux folks' have been working on this for quite awhile, if it weren't for licensing incompatibilities with ZFS, they'd likely already have it.

ComputerGuru · on Jan 17, 2012

The 'Linux folks' have been working on this for quite awhile, if it weren't for licensing incompatibilities with ZFS, they'd likely already have it.

If you'll pardon my saying so, that's because it's not 'the Linux folks' that have been working on it - it's other non-'Linux folk' companies (cough RIP, Sun cough) that took it upon themselves to make a better filesystem for their (coincidentally, and nothing to do with Linux) open source operating systems. The closest the 'Linux folks' got was ReiserFS under Hans Reieser, who's work was largely rejected by the mainstream 'Linux folks' working on the kernel... until the months just before his arrest and conviction for the murder of his wife.

ZFS has as much to do with Linux as NTFS has to do with Linux - developed wholly outside of the Linux community by people not in the Linux community nor associated with the Linux community, with implementations available for Linux that are not redistributable with the kernel for patent- or licensing-related reasons.

But, yes, BtrFS (developed by Oracle) is indeed a 'Linux folk' attempt at creating a modern filesystem. And BtrFS does indeed predate ReFS.

mappu · on Jan 17, 2012

Note that booting from BtrFS is pretty difficult, if at all possible, putting it on similar footing to ReFS/Protogon.

mbell · on Jan 17, 2012

The GRUB update in May 2011 added support for booting both ZFS and btrfs. If your having issue all you need to do is update grub to get proper support.

icebraining · on Jan 17, 2012

which you can use via FUSE in linux

There's also a native port: http://zfsonlinux.org/

detay · on Jan 17, 2012

Quote: "With this in mind, we will implement ReFS in a staged evolution of the feature: first as a storage system for Windows Server, then as storage for clients, and then ultimately as a boot volume. This is the same approach we have used with new file systems in the past."

So it will be a bootable FS eventually.

obtu · on Jan 17, 2012

Grub can boot off BtrFS filesystems. There is one caveat remaining (bug 759772: grub's core.img needs to be embedded in the correct part of the btrfs partition). When using GPT (required with large disks) that part is already safely located on a 1MB GPT boot partition.

tlb · on Jan 17, 2012

If the boot partition has to be some other non-resilient FS, the whole system isn't very resilient.

rbanffy · on Jan 17, 2012

Wasn't it supposed to arrive in Vista?

Now, seriously, if I got a dollar for every new Windows filesystem announced for every next version of Windows and canned before launch, I'd be at least five dollars richer. By the time they deliver it, IF they deliver it, BtrFS will be widely used in Windows servers. ZFS already is way more advanced than what they propose.

The only major change I saw was when Microsoft ditched HPFS to go with NTFS.

contextfree · on Jan 17, 2012

"Wasn't it supposed to arrive in Vista?"

No. WinFS was not a filesystem, it was an application layer on top of NTFS (and SQL).

In fact the changes to the underlying filesystem originally motivated by WinFS (i.e. TxF) did make it into Vista.

4ad · on Jan 17, 2012

NT was designed to use NTFS from the start. Because NT was originally NT OS/2, it was also supposed to be able to use HPFS, but NTFS was always the primary filesystem.

BTRFS on Windows?

And yes, ZFS is more advanced, at least from what can be deduced from this article.

rbanffy · on Jan 17, 2012

> BTRFS on Windows?

Sorry. Editing accident. Can't correct it anymore. I meant Linux servers, of course.

kirrmann · on Jan 16, 2012

Now finally Windows FS gets some love.

catch23 · on Jan 17, 2012

Seems like the opposite of love -- they're removing a large swath of features in order to bring some new ones in.

justncase80 · on Jan 17, 2012

Frankly, we don't see feature removal nearly enough these days. A move towards simplicity seems like a win.

catch23 · on Jan 17, 2012

Simplicity is not so useful in operating systems or filesystems though. There are enough "simple" filesystems out there -- the useful filesystems are the ones that aren't so simple. There's lots of great educational "simple" OSes, but nobody uses them for real work.