It feels likes eye-popping times for those deep into building the future of visual data experiences. With Looker exiting (-> Google for $3B), Tableau exiting (->SalesForce for $16B), and less public, Periscope & ZoomData exiting, the Graphistry team is experiencing good feelings and key reflections. One of them is… the $16B exits are just a prelude to the next $160B in opportunities.
When everything runs on GPUs, we can fundamentally shift the way we experience data analysis much like video moving to HD or shifting from black-and-white to color. What if you could load your full dataset, ask whole-table questions like what are the patterns, and get the answers… immediately? What if you could do that visually, replacing writing queries with simple infinite zoom and direct manipulations down to the level of individual data points? Core analytics areas like security, fraud, operations, and customer 360 are entering this sci-fi-level world of rapid hypothesis iteration.
Running analytics end-to-end on GPUs, all the way from the data warehouse to what’s on screen in your browser, is not easy. Graphistry first brought that experience to investigating event and graph data. Starting from before the Rapids team was even officially formed, we have been collaborating with them on how to get these techniques into the hands of all analysts. With the official project announcement of Rapids, we thought it would help to share our promising early experiences.
Enter Apache Arrow & GoAi
RAPIDS is one of NVIDIA’s biggest contributions to the GPU Open Analytics Initiative (GoAi), and is poised to become its computational backbone. (We previously overviewed GoAi for the web and visual analytics.) Big data framework developers are shifting to fast data — handling more data at millisecond levels. Similar to how many SQL analytics tasks moved to distributed Hadoop, and then Hadoop moved to in-memory Spark, we are seeing the rise of in-GPU GoAi. Contributors already include most GPU database developers (OmniSci, BlazingDB, FastData, u2026), visual analytics developers (Graphistry), and broader data eco-system OSS companies like Conda.
To make the set of tools work together, GoAi members rallied around Apache Arrow. It is a file format and set of protocols that support in-memory typed dataframes with zero-copy data transfers between tasks and libraries. Clouds let you rent instances with multiple GPUs that have 16GB GPU RAM each, and NVIDIA DGX nodes already store 512GB+ in-GPU RAM. This unlocks running most tasks entirely in the GPU, and as streaming frameworks emerge, nearly everything is fair game.
For a taste of what happens when you switch to streaming of Arrow files between GPUs, the following videos show a before/after of the Graphistry 2.0 engine. The first video shows our original hand-written visual analytics engine: GPUs in the browser, GPUs in the data center, and optimized networking. This year, we rewrote our interop code into Arrow (forming the core of Apache Arrow[JS]): the result is our new visual analytics engine — which runs in any browser — takes much less code, handles about 5X more data, and runs visibly faster:
Graphistry 1.0 Engine
Graphistry 2.0 Engine
Apache Arrow unlocks and speeds up interoperability between analytics tools, and RAPIDS provides convenient GPU IO and compute layers. This can help all the way across the data pipeline: sending data from CPU Spark to GPU frameworks, converting untyped CSVs to typed Arrow, performing tabular operations like filtering, and under the same family, supporting additional analytics areas like ML and graph. Enterprise-grade GPU analytics tools like Graphistry (visual analytics) and BlazingDB (warehouse interop) are incorporating it as part of a common core that is better than CPU alternatives but not fundamentally differentiating between specific analytic tool categories: Rapids is part of the GoAi rising tide.
RAPIDS is still early, but the numbers already look great. As a few examples of core data tasks, on a Titan V single GPU with 12GB GPU RAM and 32GB CPU RAM, similar to a cloud device, we see significant speedups on loading data and a simple cross-filtering task (filtering followed by histogramming). The result is 100M-1B row datasets become interactive!
- Titan V single GPU, 12GB GPU RAM, 32GB CPU RAM machine
- Representative of a $1.0/hr AWS P3.2 preemptible
- IO: Load 100M rows (x 6 floats) or 1.5B rows (x 1 float) of data, as CSV and Arrow
- Compute: Cross-filter (filter + histogram)
- Compare CPU (Pandas) with GPU (PyGDF for filtering and Numba for histograms)
The early results are spectacular — 20-30s computations become subsecond, 100M-1B row datasets become easy… and that is when bursting on just one GPU.
Graphistry + NVIDIA RAPIDS
Think of Graphistry as a UI for accessing RAPIDS tech without coding. Graphistry is the only full-stack GPU visual analytics platform, meaning we use GPUs all the way from your browser to the data center. The platform has been architecting to use Arrow end-to-end in the pipeline over the last year and helping bring similar Arrow-based workflows to the web, and RAPIDS has been a big motivator for that. As new RAPIDS functionality become available, they become drop-in replacements along our pipeline. The result is visual analytics users get to leverage RAPIDS — and broader GoAi frameworks — without writing code.
Our results around GoAi have been raising eyebrows all the way from operational analyst to bank executives. NVIDIA RAPIDS has been a key investment for us, and not discussed here, especially in terms of marching to a multi-node multi-GPU future. Hard tech startups have to be targeted in the bets they make, and Graphistry is excited to welcome RAPIDS into the GoAi community!
Graph visualization has proven to be powerful for investigating almost any type of data, and most recently the team at Graphistry was able to help in uncovering a massive Ethereum heist on two of the world’s most popular DApps (distributed applications).
AnChain.ai and Graphistry recently partnered to investigate the world’s first publicly identified BAPT (Blockchain Advanced Persistent Threat). The investigation identified the BAPT-F3D hacker group, which was responsible for stealing 12,948 ETH (~ $4 million) between July and August 2018 from various vulnerable smart contract DApps. As of today, BAPT-F3D is still actively attacking.
Fomo3D and the Airdrop Vulnerability
AnChain.ai, which specializes in security for the blockchain ecosystem, analyzed the wildly popular game u201cFomo3Du201d ( the #1 DApp in July 2018) and its copycat u201cLast Winneru201d (the #5 DApp in August 2018). These games are DApps based on Ethereum Solidity smart contract and operate quite openly as Ponzi schemes or exit scams. At high level the game works as a lottery with players buying keys that reset the timer for a round. Keys continue to get more expensive over time, and eventually when the time runs out, the player who bought the last key wins the entire pot.
Additionally the game included another side-betting opportunity when a player buys their keys. When a player buys their keys they have a percentage chance to win an u201cairdropu201d to instantly win ETH from a growing sidepot. The more a player gambles on their chance, the more they stand to win. And this airdrop function is where things got interesting.The airdrop function contained a vulnerability, which allowed coordinated attackers to steal the equivalent of more than $4 million USD across both games in just a few days.
Finding the Industry’s First Blockchain APT
Combining Graphistry’s industry-leading GPU-powered investigation platform with AnChain.ai Situational Awareness Platform (SAP), AnChain.ai gained a holistic view of all millions of events and over 30,000 addresses related to the games. As a result, the AnChain team was able to identify the first known Blockchain Advanced Persistent Threat (BAPT), dubbed BAPT-F3D. This was the first known BAPT in blockchain history. Further bytecode artifacts similarity analysis by SECBIT Labs confirmed this BAPT group of 5+ addresses are strongly correlated, as likewise seen in the visualization.
Figure: Center white node – main contract; intermediate money sinks seen on path to APT accounts identified by anomalous high-volume behavior. Paths with many edges (transactions) are either killchain or benign use that are visually separated by their operational behavior.
The AnChain.ai SAP was able to identify the following traits related to BAPT-F3D:
- Advanced: Leverages massive scale of sophisticated attack contracts to exploit a vulnerability in the u201cairdropu201d feature; Anti-Forensics capability that self-destructs the blockchain artifacts. Coordinated crime.
- Persistent: Well planned, and operating continuously for weeks; Constantly upgrading attack contracts from V1 to V3. Moving from target to target
- Threat: Financially motivated threat targeting specific smart contract DApps with similar vulnerabilities, stealing $4 millions worth of ETH and counting.
Impacts and Conclusions
Using knowledge graphs, AnChain.ai was able to document a new type of threat facing DApp owners, exchanges, and the growing blockchain ecosystem. For Graphistry, the analysis proved to be very similar to our work in anti-fraud and money-laundering investigations, although with very new and interesting twist. But most importantly, it shows the power of knowledge graphs and GPU-powered graph investigations to quickly expose the important connections and relationships across millions of pieces of data.
We think of this as the user interface for a world increasingly dependent on data, machine-learning, and AI. Analysts have similar needs whether investigating malware or phishing incidents, tracking the flow of illicit funds, fraud within a healthcare system, or hundreds of other data driven projects. Humans need to be able to see and understand what is in their data. They need AI and ML models to not be impenetrable black boxes. By bringing an interactive and investigative front end to these technologies, we hope to make them more accessible, usable, and ultimately deliver far more impactful analysis and applications.
Learning to Whitebox the SOC-in-a-Box
Even as organizations automate their security operations with orchestration and AI, some of the most important parts of security investigations continue to depend on human analysis and talent. These critical moments in the investigation remain frustratingly slow, and need categorically different technologies that are optimized for human-in-the-loop analysis.
A balanced security strategy requires us to augment and extend human skills and abilities for the many daily tasks that we cannot trust to bots. This is one of the key goals at Graphistry, and we have previously described the fuzzy data aspect of the problem in our previous article, u201cSecurity in the Age of Maybeu201d. Orchestration and AI are important parts of modern security strategies, but we have to remember that analysts need to deal with them. This article digs into our experiences around the challenges and opportunities presented when orchestration and AI meet critical human-in-the-loop phases of an investigation.
Hurry Up and Wait
Security investigation workloads have outpaced the ability of organizations to hire analysts, so it is no surprise that teams are replacing people with programs for low-level and low-risk tasks. The interesting part, as in most things, is where automation stops short.
Security-critical workflows still often end in or depend on human-in-the-loop (HITL) analysis, and for good reason. Distinguishing real threats from false positives, understanding the true scope of an infection or intrusion, or pulling the thread to expose a hidden attacker are just a few examples where human analysis remains essential. The outcome of these investigations determines the real security of an organization, so tickets and projects remain a daily reality.
Unfortunately, these investigations often remain slow and laborious, and are where efficiency and insight can go to die. As soon as tools make the handoff to the human analyst, the process regresses by 15 to 20 years. We go from automated process to an analyst squinting at dashboards and writing command-line style search queries. In order to make security operations run faster, we need to bring the same ethos of automation, orchestration, and intelligence to the messier, more complicated iterative work of human in the loop analysis. If we don’t, then much of the anticipated benefit of investing in those tools could be lost in a case of u201churry up and waitu201d. This means that the speed, visibility, and reliability we gained through automation could be lost at moment it matters the most!
Augmenting Human Analysis
If we want to improve a human outcome, it makes sense that we design for and try to extend natural human skills. That is why Graphistry has made unprecedented investments into building best-of-class visual technology. Unlike programs, people understand information visually. Humans deal with enormous amounts of data and complexity every day when it is shown visually, and this is why we convert virtually any data into visual graphs. Using graphs we literally see the connections and relationships between our events, entities, and metadata. That could be seeing the progression of an attack along the kill chain or it could be seeing the layers of obfuscation within a money laundering scheme. In either case, a picture instantly reveals what would be relatively impenetrable if analyzed in a table of data.
Analysts are also wrestling with new types of data that may not always be intuitive. Machine learning and AI have become central to all types of analysis. The problem for many analysts is that the algorithms driving these models are often a black box that the analyst simply has to take on faith. Graph visualization has the power to provide analysts with the human UI into machine learning insights. Instead of looking at a generic alert reporting anomalous behavior, an analyst can actually see clusters, outliers, and complex relationships in the data. Likewise, the graph provides a direct visual interface for easily driving these systems, such as steering machine learning towards different parts of the dataset, and triggering actions on identified regions.
Leveraging Scale Without Letting It Get in the Way
The team at Graphistry has created a variety of core GPU technologies, which lets us unlock the needed flexibility to visually interact with large amounts of data. That includes simply seeing and understanding 100X+ more of our data in context. But since the final answer that we are looking for is often small, we also need to easily remove the noise and drill down or pivot to follow the intuitive flow of the investigation.
The goal is that we never want to limit the scope of an investigation, because we can’t see all of the important data, but at the same time we need to make sure the data doesn’t get in the way of seeing what’s really important. This is frankly where most see the difference between having a pretty picture and having a truly interactive investigation. Analysts need the ability to pivot across data sources on the fly, view events in the context of a timeline, or view data in the context of the network. Being able to do this without changing screens or writing new queries is critical for making sure analysts can investigate intuitively, creatively, and actually leverage the skills that make human analysts so valuable.
Automating the Human Workflow
In the previous topic, we were focused on improving our analysts vision: enable them to see more information, see deeper into relationships, and adapt on the fly. To close the loop, we need to focus on the speed of the workflow and how we accelerate those insights. Just because a workflow involves a human doesn’t mean that we can’t speed it up by orders of magnitude. This why Graphistry has pioneered the use of investigation templates and visual playbooks as a highly interactive investigation environment rather than rigid and hard-to-edit software.
First, a template allows an investigation to automatically begin with all the data that an analyst will need. With a trigger as simple as a single SIEM alert, Graphistry can automatically connect to and query any and all data sources to pull in the relevant context. This could be logs from other tools in the SIEM, NetFlow stored in a Spark cluster, and a variety of metadata from Bro logs in Elasticsearch. Without writing a single query, the analyst can right click on an incident, and all the necessary data is queried and prepared for analysis.
Crucially, that data is delivered through a highly interactive and visual workflow. Each step or pivot can have its own unique visualization setting tied to the needs of the analyst. Instead of being rigidly predefined, the analyst can tweak settings such as to look at a wider time range or find out more about a specific entity of interest, thus remaining fully interactive and explorable.
Organizations face a similar challenge when bringing orchestration into human-in-the-loop scenarios. Scripts should not be a blackbox that only other scripts can use. The visual graph and templates solve the human side of orchestration: analysts can simply click-and-fire!
This is just the beginning of what Graphistry does, but it hopefully serves to illustrate the path forward for security organizations. Analysts are some of the most critical assets in the enterprise, and it doesn’t make sense to simply automate around them. They need to be in the process. This is what we call turning the blackbox into a whitebox. To do so, we need to give analysts tools that augment their skills, and close the loop around automated workflows around data lakes, AI, and orchestration. At Graphistry, that is our mission.