Graphistry v2.37.43: Recommended minor update over 2.37.40

Posted by Graphistry Staff on October 6, 2021

The latest version of Graphistry is now available for self-hosting users, including in submission to the AWS and Azure cloud marketplaces. The main reason for the release is to fix an an issue some users self-hosting version 2.37.40 have reported issues with logged-in users being unable to access visualizations.

Read More

GPU Graph intelligence for everyone with Graphistry Hub Pro

Posted by Leo Meyerovich on September 23, 2021

Get started now! Try on a CSV On behalf of the Graphistry team, I am excited to officially announce: Hub Pro (Graphistry Hub for Professionals) is the next step of our vision of bringing GPU-accelerated visual graph analysis to everyone! The launch is one step of many. One side of bringing GPU graph capabilities is […]

Read More

Graphistry 2.37.40: Azure Trialing & Individuals, Share your graphs, JupyterLabs with GPU monitoring, Gremlin/Neptune/Cosmos connectors, and more!

Posted by Graphistry Staff on September 4, 2021

Get started now! Try on a CSV The latest Graphistry release makes GPU-accelerated visual graph analysis easier for data scientists and teams and grows where you can use it. The current release hits many themes of our mission of bringing 100X investigations through GPU visual graph intelligence to all analysts: Affordability – Azure: Azure: Experiment […]

Read More

Explore relationships in tricky CSVs with the no-code file uploader, now with multi-column mode via hypergraphs!

Posted by Graphistry Staff on May 11, 2021

Enabling quick no-code and low-code investigations of relationships in data is priority #1 for Graphistry this year. Our recent no-code file uploader first solved a simple top scenario: when a table has 2 columns to link, like src_ip and dest_ip, or say user_id and page_id, explore it as a graph. Within a few clicks, you […]

Read More

Graphistry 2.37.11: No-code graph visualization, airgapping, big Excel files, internationalization, RAPIDS 0.19, and more!

Posted by Graphistry Staff on May 8, 2021

Good news: Graphistry 2.37.11 brings across-the-board advances for visual graph analysis for UI users, GPU analysts, embedding developers, and improvements for both on-prem (air-gapped) and Hub cloud users. It also includes RAPIDS 0.19 and associated GPU ecosystem features and upgrades. Read on for our favorite improvements below, and the release notes for additional update details. […]

Read More

Visually explore the relationships in any CSV/XLS with GPU graph analytics.. and no code!

Posted by Graphistry Staff on February 18, 2021

We just made exploring the relationships in your data that much easier.  Already available for free Graphistry Hub accounts and Enterprise (Docker) users in v2.35, you can easily drop any CSV/XLS export into Graphistry and explore its relationships as a graph! Edit 2/19/2021: … And if your data looks good, think about joining the Web […]

Read More

Graphistry 2.33.17: Graph-App-Kit, RAPIDS 0.16, and more

Posted by Graphistry Staff on November 8, 2020

Graphistry 2.33.17 introduces a powerful piece in our mission to bring powerful visual investigations technology: Dashboarding. With StreamLit, embedded in our open-source graph-app-kit project, you can quickly create interactive graph dashboards and share them. In addition, 2.33 includes the big RAPIDS 0.16 release and continued GPU infrastructure improvements. Read on for these below, and check […]

Read More

Graphistry 2.32.4: Badges and RAPIDS 0.15!

Posted by Graphistry Staff on October 14, 2020

Graphistry 2.32.4 is now live! The most requested features are around icons, badges, and RAPIDS. If you are new to Graphistry, try exploring relationships in your data with GPU visual graph analytics for free on Graphistry Hub and one-click launch a private server in your cloud provider (including our new GovCloud mode). As usual, check […]

Read More

The Future of GPU Analytics Using NVIDIA RAPIDS and Graphistry

Posted by Leo Meyerovich on October 22, 2018

When everything runs on GPUs, we can fundamentally shift the way we experience data analysis much like video moving to HD or shifting from black-and-white to color. What if you could load your full dataset, ask whole-table questions like what are the patterns, and get the answers… immediately? What if you could do that visually, replacing writing queries with simple infinite zoom and direct manipulations down to the level of individual data points? Core analytics areas like security, fraud, operations, and customer 360 are entering this sci-fi-level world of rapid hypothesis iteration.

Running analytics end-to-end on GPUs, all the way from the data warehouse to what’s on screen in your browser, is not easy. Graphistry first brought that experience to investigating event and graph data. Starting from before the Rapids team was even officially formed, we have been collaborating with them on how to get these techniques into the hands of all analysts. With the official project announcement of Rapids, we thought it would help to share our promising early experiences.

Enter Apache Arrow & GoAi

RAPIDS is one of NVIDIA’s biggest contributions to the GPU Open Analytics Initiative (GoAi), and is poised to become its computational backbone. (We previously overviewed GoAi for the web and visual analytics.) Big data framework developers are shifting to fast data — handling more data at millisecond levels. Similar to how many SQL analytics tasks moved to distributed Hadoop, and then Hadoop moved to in-memory Spark, we are seeing the rise of in-GPU GoAi. Contributors already include most GPU database developers (OmniSci, BlazingDB, FastData, u2026), visual analytics developers (Graphistry), and broader data eco-system OSS companies like Conda.

To make the set of tools work together, GoAi members rallied around Apache Arrow. It is a file format and set of protocols that support in-memory typed dataframes with zero-copy data transfers between tasks and libraries. Clouds let you rent instances with multiple GPUs that have 16GB GPU RAM each, and NVIDIA DGX nodes already store 512GB+ in-GPU RAM. This unlocks running most tasks entirely in the GPU, and as streaming frameworks emerge, nearly everything is fair game.

For a taste of what happens when you switch to streaming of Arrow files between GPUs, the following videos show a before/after of the Graphistry 2.0 engine. The first video shows our original hand-written visual analytics engine: GPUs in the browser, GPUs in the data center, and optimized networking. This year, we rewrote our interop code into Arrow (forming the core of Apache Arrow[JS]): the result is our new visual analytics engine — which runs in any browser — takes much less code, handles about 5X more data, and runs visibly faster:

Graphistry 1.0 Engine


Graphistry 2.0 Engine


NVIDIA RAPIDS

Apache Arrow unlocks and speeds up interoperability between analytics tools, and RAPIDS provides convenient GPU IO and compute layers. This can help all the way across the data pipeline: sending data from CPU Spark to GPU frameworks, converting untyped CSVs to typed Arrow, performing tabular operations like filtering, and under the same family, supporting additional analytics areas like ML and graph. Enterprise-grade GPU analytics tools like Graphistry (visual analytics) and BlazingDB (warehouse interop) are incorporating it as part of a common core that is better than CPU alternatives but not fundamentally differentiating between specific analytic tool categories: Rapids is part of the GoAi rising tide.

RAPIDS is still early, but the numbers already look great. As a few examples of core data tasks, on a Titan V single GPU with 12GB GPU RAM and 32GB CPU RAM, similar to a cloud device, we see significant speedups on loading data and a simple cross-filtering task (filtering followed by histogramming). The result is 100M-1B row datasets become interactive!

Test setup:

  • Titan V single GPU, 12GB GPU RAM, 32GB CPU RAM machine
  • Representative of a $1.0/hr AWS P3.2 preemptible
  • IO: Load 100M rows (x 6 floats) or 1.5B rows (x 1 float) of data, as CSV and Arrow
  • Compute: Cross-filter (filter + histogram)
  • Compare CPU (Pandas) with GPU (PyGDF for filtering and Numba for histograms)

image-1

The early results are spectacular — 20-30s computations become subsecond, 100M-1B row datasets become easy… and that is when bursting on just one GPU.

Graphistry + NVIDIA RAPIDS

Think of Graphistry as a UI for accessing RAPIDS tech without coding. Graphistry is the only full-stack GPU visual analytics platform, meaning we use GPUs all the way from your browser to the data center. The platform has been architecting to use Arrow end-to-end in the pipeline over the last year and helping bring similar Arrow-based workflows to the web, and RAPIDS has been a big motivator for that. As new RAPIDS functionality become available, they become drop-in replacements along our pipeline. The result is visual analytics users get to leverage RAPIDS — and broader GoAi frameworks — without writing code.

Our results around GoAi have been raising eyebrows all the way from operational analyst to bank executives. NVIDIA RAPIDS has been a key investment for us, and not discussed here, especially in terms of marching to a multi-node multi-GPU future. Hard tech startups have to be targeted in the bets they make, and Graphistry is excited to welcome RAPIDS into the GoAi community!

Read More

Graphistry in the Verizon DBIR

Posted by Leo Meyerovich on April 11, 2018

2018/4/11 10:00

Read More