This video is really cool. I’ve been following dataflow approaches for a while, including some of Frank McSherry’s (usually enjoyable) articles. None of the comments mention https://materialize.io so I may as well (an open source commercial offering based off these concepts).
Watching this explanation I’m slightly curious whether things like materialize and noria are a bit limited in that this could be a paradigm for an actual functional reactive programming language rather than specifically a “data” thing. It appears to have the structure of nested contexts (loops, scopes, etc) advocated by structured programming (ie “goto considered harmful”). It can reliably calculate an answer at each point in time for each state of input, concurrently and with parallelism. Even if there are multiple inputs with their own notion of time (not covered in the video). That’s, like, the holy grail of PLT these days, isn’t it? Or am I missing something?
Yes, we consider ourselves an “open core” company. Timely and differential, the core compute engine, are fully open source projects, but the Materialize layer atop is licensed under the “Business Source License” (BSL).
We think the BSL strikes a good balance between giving back to the community—four years after every release, the code is automatically relicensed under Apache 2—and ensuring we can build a viable business. And you’re free to use Materialize for any purpose in a non-distributed (i.e., single node) deployment without paying for an enterprise license.
In my opinion dataflow is the only true representation of computations. Unlike normal code, it represents dependencies and parallel computation perfectly. Because of this, it is also a great basis for a hardware implementation.
Although in conventional languages with control flow, we're more used to how they're done, both branching and looping can be done straightforwardly in dataflow without introducing any non-dataflow constructs - just graphs with vertices connected by edges.
BTW, in case anyone's wondering why there have been no recent updates to the pages on my visual dataflow language, it's because the many improvements I've been making, particularly big changes to the type system, have required a lot more work than I expected. I haven't abandoned work on it, but it will still be some time before it's ready for release.
Those show that it's possible, not necessarily that it's a good interface to make those parts of programs. Houdini's shader language has had branching done in a data flow graph for a long time. Touch designer has a kind of looping construct too. You might want to take a look at these domain specific interfaces if you are doing your own graph, they are well done.
Fundamentally though, the density of text expressions exceeds a data flow graph by a huge margin. If what is being done isn't fundamentally a directed acyclic graph, visualizing it with a graph becomes more difficult to absorb than the expressions as text.
I looked at the data flow paradigm a couple of years ago. Back then I thought that the difference to just a "ordinary" functions is not that big, and for performance (which is important for my data work), you do not want to deviate from the traditional way too much.
Anyone felt the same or can provide a real-world problem, where data flow is actually working better that other solutions?
It’s like a cache, except it keeps itself in sync with the database automatically and generates “materialised views” using data-flows based on the queries that get asked of it and will automatically generate new ones if someone makes a query it doesn’t already have a data flow for. Parts of data flows can also be shared across views.
The paper linked in the github goes into detail about the performance gains, but it easily outperforms straight database calls and caching setups.
If your functions are transforming chunks of data into other formats/types, you are already doing what data flow graphs are doing. Generalizing can give much more structured concurrency.
Watching this explanation I’m slightly curious whether things like materialize and noria are a bit limited in that this could be a paradigm for an actual functional reactive programming language rather than specifically a “data” thing. It appears to have the structure of nested contexts (loops, scopes, etc) advocated by structured programming (ie “goto considered harmful”). It can reliably calculate an answer at each point in time for each state of input, concurrently and with parallelism. Even if there are multiple inputs with their own notion of time (not covered in the video). That’s, like, the holy grail of PLT these days, isn’t it? Or am I missing something?