Updated: 6.24.2020 to reflect PyGraphistry now supporting the 2.0 Upload API
The Graphistry team has been quite busy with v2.29. While we’ve been quiet on the tool side as we’ve been directing attention to our volunteer efforts like ProjectDomino.org for medical anti-misinformation, and as you’ll see, a lot has been happening! As usual, get the full release notes on our support site, directly download the enterprise release, and one-click launch from AWS & Azure Marketplaces as soon the update as is approved.
We’re especially excited for a few efforts around the v2.29.5 update:
- 2.0 API that can already handle 100X+ bigger data & up to 11X faster
- RAPIDS 0.13, including many updates to BlazingSQL, cuDF (regex), cuGraph, and cuML
- LearnRAPIDS.com launch for introductory and advanced resources for Python GPU analytics across various techniques and use cases
- Fixes and tweaks: Especially around better page load and filtering experiences
We recommend most users to upgrade to 2.29, and even more so for any 2.27X and RAPIDS users.
Figure: Comparing loading JSON (api=1) vs. RAPIDS Arrow (api=3)
Check out the official RAPIDS 0.13 release notes.
We’re especially excited about the bug fixes, lower latency, and features such as around GPU regex and more graph algorithms.
RAPIDS 0.14 is just around the corner as well. It will introduce exciting out-of-core memory upgrades from BlazingSQL, so we’ll be rushing the Graphistry 2.30 release for it as well.
Graphistry is helping lead the RAPIDS effort at the new RAPIDS Academy to improve training and onboarding for analyst and developer teams getting into the Python GPU ecosystem. We’re kicking it off with a live session and instructor-led lab on GPU security analytics later this month. Free to register, though sign up now for one of the live lab slots!
Figure: RAPIDS Academy launched!
Fixes & tweaks
There are quite a few little fixes and tweaks. Of most note:
- Page loads should be faster!
- For 2.27.X users, loading bugs around the “75%” loading and encoding initialization (sometimes impacting network map layout, colors, and icons), and issues on reloading a viz after restart, should be mostly all gone
Graphistry 2.0 Upload API: 100X Bigger and up to 11X Faster!
Initially for direct HTTP REST users, the new Upload API adds support a variety of formats: CSV, Parquet, Arrow, ORC, and for JSON, multiple encoding styles. We recommend experimenting with the Arrow API for in-memory and Parquet for on-disk. In the current release, those as well as the CSV format benefit from GPU acceleration.
Bigger graphs: In about the ~1 second a JSON upload of 10K edges gets processed, the new API can handle 100M edges. That’s 1000X bigger graphs while staying just as interactive! The rest of the system handles about 7M nodes and edges, depending on the client hardware, and the new ingest component represents the beginning of our efforts to increase this number overall.
Figure: Upload 100X+ more data and 11X+ faster!
More properties: In areas like log analytics and machine learning, we often have many columns. The new uploader can handle many more columns. For datasets with many columns, JSON users can upload numbers, bools, and dates about 18X faster. The initial string support is about 6X faster, and we expect it to go up more as we add dictionary-encoding support. In all many-column cases, the experiences feels much more interactive.
Figure: Load many graph columns 10X+ faster!
The rough idea for the new upload API is:
- Use your account credentials to get a short-lived (1hr) JWT API token
- Upload the JSON metadata file with optional details like visual encodings
- Upload an edges table and optionally a nodes table using various formats, with Arrow (in-memory) and Parquet (on-disk) as the most recommended
We created a gist sharing how to use Python to directly call the new HTTP methods. Edit: PyGraphistry now automatically supports the 2.0 API (enable via graphistry.register(api=3, username=…, password=…)) and the underlying REST protocol is specified in the main docs.