That statement is overly vague and sounds like marketing BS. I worked on a proje...

buro9 · on Sept 30, 2016

> If Microsoft also has an optimized way to get data from SQL Server to R I can see how they got a 100X speedup. In certain cases using the MKL libraries can give you that as well, but I suspect the speedup just comes from improving the data transfer method.

The optimized method is that you can run R inside the database in the latest version of SQL Server.

I've actually installed Windows again, just to play with this feature (though I cannot make claims to actually putting it to good use yet).

Jweb_Guru · on Sept 30, 2016

No need to install Windows to get R in a database, you can run with PL/R on Postgres (unless you have a particular desire to run it within SQL Server, of course, but doesn't it run on Linux now?).

http://www.joeconway.com/plr/

blahi · on Sept 30, 2016

That's a lot less than what Microsoft is offering.

Jweb_Guru · on Sept 30, 2016

That may be, but it's not clear to me from the article. What's the feature I'm missing that makes the SQL Server offering so much more compelling?

blahi · on Oct 1, 2016

Out of core algorithms, attatchment of data straight into R for a couple of very important things.

Jweb_Guru · on Oct 1, 2016

> Out of core algorithms.

Does this just mean third-party modules? If so, doesn't http://www.joeconway.com/plr/doc/plr-module-funcs.html suffice?

> attatchment of data straight into R for a couple of very important things.

Isn't this doable through http://www.joeconway.com/plr/doc/plr-global-data.html?

Apologies if these are dumb questions, as I'm not very familiar with R.

blahi · on Oct 2, 2016

No. They have their own highly optimized algorithms. They also have their their own distributed data structures.

I am not sure about that 2nd link. Seems like it's just UDFs written in R.

Jweb_Guru · on Oct 2, 2016

Ah, okay. It wasn't clear to me that Microsoft had written its own algorithms. That does seem very useful, then (though presumably those algorithms and data structures could be used outside of the SQL Server environment, I presume that Microsoft is using them to encourage people to use SQL Server rather than another solution).

I believe the second link is referring to being able to initialize and share data between functions within the R runtime (rather than having to transfer back and forth between Postgres and the runtime). Is that not what you were referring to?

blahi · on Oct 2, 2016

>being able to initialize and share data between functions within the R runtime

That's right.

>rather than having to transfer back and forth between Postgres and the runtime

But that's not. There's a difference between being able to use data outside of the database (from the R runtime) in my UDFs (executed in Postgres) on one hand and being able to attach 2TBs of data straight from an SQL table in the R runtime on the other. I don't even care that much about the algorithms. Moving the data is the bottleneck most of the time. And Microsoft is actually late to the party (but better than never). Oracle, Netezza, Vertica and Hana have been able to do it for quite a while now.

You are spot on about being able to use the algorithms outside of SQL Server. You can use them on Teradata or Hadoop or rent your own VMs on Azure to use them or you can buy standalone licenses too.

Jweb_Guru · on Oct 2, 2016

So by "attach 2 TB of data straight from an SQL table into the R runtime" you mean that Microsoft taught R to interact directly with SQL Server's storage engine? If so, I agree, data movement is almost always the bottleneck for large data sets, and I don't think PL/R can do that (though I am not sure if that's a necessity due to the way Postgres's language plugins work, or something that could be done with enough effort).

However, if all you mean is that SQL Server can transfer the data a tuple at a time to R on the same server (in memory), I believe that PL/R and Postgres interact like that already (again, maybe I'm wrong). And I don't know how much extra overhead that provides over talking directly to the storage engine, anyway.

blahi · on Oct 2, 2016

>Microsoft taught R to interact directly with SQL Server's storage engine

They have created 2 new services for SQL Server 2016 - BxlServer and SQL Satellite which facilitate the communication and data exchange. They obviously have additional speedups for the proprietary runtime (that was one of the main selling points of the company they acquired - fast data access to several RDBMS), but it's plenty fast for regular R too.

https://msdn.microsoft.com/en-us/library/mt709082.aspx

_jx7j · on Sept 30, 2016

This is what the docs say

"When you select this feature, extensions are installed in the database engine to support execution of R scripts, and a new service is created, the SQL Server Trusted Launchpad, to manage communications between the R runtime and the SQL Server instance."

So basically SQL Server is talking to the R session. The speedup is coming from R being installed locally and the "communication" which I've yet to figure out.

baldfat · on Sept 30, 2016

> I've actually installed Windows again, just to play with this feature (though I cannot make claims to actually putting it to good use yet).

I heard the Linux SQL Server is surprisingly decent.

https://blogs.microsoft.com/blog/2016/03/07/announcing- sql-server-on-linux/

apathy · on Sept 30, 2016

postgresql used to have a way to embed R into PostGreSQL too:

https://github.com/jconway/plr

I guess it still does, though I haven't used it in years. You can, of course, do the same thing with Python.