Read on for our path to making JS compute over GBs and eventually TBs of data in subsecond time. We’d love for even more people to get involved, such as contributing code or, for enterprises, engaging with us!
BACKGROUND: GPUs, Arrow, and JS
From phones to servers, GPUs are everywhere. The top supercomputers are made from them: Nvidia’s new DGX-2s run at a jaw-dropping 2 petaflops and have 512 GB of GPU RAM. Modern frameworks like Tensorflow and NVGraph already leverage the heck out of these. (… Contact us if you want to experiment with us on NVGraph!)
The Apache Arrow data format solves interop and is already being adopted by Spark, Pandas, and other projects. By agreeing on Arrow, passing data between Arrow-compliant frameworks requires no data conversions. For framework makers, that means writing fewer connectors, and for users, more interop and at faster speeds.
New layers do even more. With Plasma, we don’t even have to copy Arrow data, just pass a pointer. With the GOAI project, those pointers can be to GPU memory. 2018 is nuts: end-to-end GPU computing is becoming the new normal.
- Open infrastructure. When Graphistry brings a third-party dependency to our customers, we’re wary of embedding anything closed source and even single-source open core. We were delighted when the GOAI startups joined up with Apache Arrow, and soon after, we donated our first code drop.
- Rally around standards. We can achieve outsized impact by identifying interop points framework builders can target. When they do, users of every other tool in the ecosystem benefits. For example, as we’re starting to figure out an Arrow-aware ODBC variant, all Arrow-aware BI tools could have out-of-the-box fast data support for any Arrow-compliant database, even without the many vendors coordinating with one another.
Reference architectures for JS GOAI bridges to ML, GPU, and Big Data frameworks
TECHNOLOGY & ROADMAP
We’ve started with node and reference JS implementations. We expect mobile can follow in node’s footsteps, and standard browsers after. We’re progressing through several areas:
- JS IO: A JS apache-arrow reader & writer, for async batch & streaming interop with Arrow format data, complete with examples for Pandas and MapD. A node-plasma binding for zero-copy sharing of CPU and GPU memory (100GB+). A node-goai buffer library for easily sharing data between the CPU and GPU.
- Zero-copy nodejs<>python GPU web services. Data is passed via node-plasma. Reference GOAI-capable PyData Docker, and helper library for web requests in a GOAI-aware node web framework (express?) and GOAI-aware python web framework (flask?).
- JS dataframes and graph compute: Symbolic compute for leveraging dataframe tech like PyGDF and Ray, upcoming graphframe tech like NVGraph, and upcoming Arrow-aware database tech like a more native Turbodbc. Ultimately, something like a js-linq for pushing code, not just data.
- SQL & Cypher/Tinkerpop: Beyond the direct JS project, there has been pressure to make database wire protocols work out-of-the-box with Arrow, especially given modern systems are increasingly columnar. As a BI-for-investigations company, we are connecting teams here to develop ideas like an Arrow-aware ODBC that give one target.
For open source coders, there’s a lot possible, and we welcome new and out-of-order efforts. Anything in the above roadmap is fair game! The github project is a great place to get started, or emailing someone on our team.
For industry partners, we are happy to engage in experimental projects for use of these technologies. The Graphistry team already regularly engages with the US government and the F500 for tackling problems in security, fraud, and fintech in how they use tools like Splunk, Elastic, and Hadoop. We are especially excited by connecting JS to PyData/PyGDF, and given the nature of our visual investigation product, building up tooling and usability of NVGraph. We’re often in SF, Austin, DC, and NYC and are happy to catch up – feel free to reach out!