Explore relationships in tricky CSVs with the no-code file uploader, now with multi-column mode via hypergraphs!

Posted by Graphistry Staff on May 11, 2021

Enabling quick no-code and low-code investigations of relationships in data is priority #1 for Graphistry this year. Our recent no-code file uploader first solved a simple top scenario: when a table has 2 columns to link, like src_ip and dest_ip, or say user_id and page_id, explore it as a graph. Within a few clicks, you could explore the generated visualizations with all the features of Graphistry’s core platform: rich interactive visuals, no-code controls, GPU acceleration, web-friendly sharing, and more.

Get started now! Try on a CSV

Challenging many graph and relationship analysis tools, most data tables have more than 2 columns, so it is important to make it easy to explore 3 or more columns. For example, it should be easy to simultaneously explore relationships acrosss columns like src_ip, dest_ip, user_id, alert and country. Before, people would have to generate multiple 2-column graphs, or turn to a coding solution like our PyGraphistry Python API’s hypergraph transform (now with multi-GPU support!). The latest update to Graphistry’s file uploader brings hypergraph capabilities to the UI. Hypergraphs make many common cases a lot easier by automatically turning multi-column tables into graphs!

Any CSV/XLS should work. For this article, we’ll revisit the case of a device activity log from a security honeypot (credit: Mike Sconzo’s SecRepo.com).

1. Drag-and-drop your data… and that’s it!

After you have a Graphistry account (self-host or free for non-sensitive data on Graphistry Hub), head to the new file uploader.

Drag-and-drop your file. Almost by magic, that is enough to start visually exploring your the relationships in your data directly as a graph! Scroll down to the bottom of the page: Graphistry waits a few seconds to make sure you are done, and then shows a visualization you can start exploring. (If using a multi-sheet XLS, Graphistry defaults to the first sheet, which you can switch later.)

In this case, we quickly see 2 IP addresses are at the heart of the honey pot activity. Furthermore, we see both have frequently recurring vulnerabilities near them: MaxDB, NetAPI. Savvy readers may recognize those 2 IP addresses, “172.*.*.*”, as internal devices, so those are our honeypots luring in hackers. With almost no work, we are instantly seeing the top entities, patterns, and outliers across multiple data columns.

Figure: Drag-and-drop a CSV to automatically generate a graph visualization

Under-the-hood, Graphistry’s hypergraph transform enables our UI to skip any initial manual configuration. Hypergraphs are a fancy way of saying a table row links multiple entities (columns). For example, if there are two IP addresses mentioned on a table row, they turn into nodes, and get linked together. Likewise, if another row mentions one of the IPs again, and this time with some alert name, the alert gets extracted as entity too, and gets linked to the existing node for the IP. Hypergraphs automatically turn data tables into knowledge graphs in an intuitive and visual way.

Instead of simply linking all the table cells together, which can be overwhelming, Graphistry is a bit smarter. It will pick the first table in your dataset and inspect the columns to guess which ones to use, and how to link them together. For example, columns with words like “name”, “ID”, and “alert” are likely to be entities. Even more subtle, columns like “src_abc” and “to_xyz” are also likely referring to entities worth showing and linking. Upon inferring this configuration, you receive an interactive visualization preview at the bottom of the screen, and can use the subsequent UI controls to modify the auto-filled settings.

One simple example of a table =hypergraph=> graph transform would do a case of hacker 128.0.0.1 scanning honeypots 172.0.0.2, 172.0.0.3 for vulnerability vuln123:

Event table:


| src_ip | dest_ip | vuln |
| 128.0.0.1 | 172.0.0.2 | vuln123 |
| 128.0.0.1 | 172.0.0.3 | vuln123 |

=>

Inferred entities (hypergraph nodes):


| node | category | type |
| 128.0.0.1 | n | src_ip |
| 172.0.0.2 | n | dest_ip |
| 172.0.0.3 | n | dest_ip |
| vuln123 | n | vuln |

Inferred relationships (hypergraph edges):


| source | destination | src_ip | dest_ip | vuln |
| n:128.0.0.1 | n:172.0.0.2 | 128.0.0.1 | 172.0.0.2 | vuln 123 |
| n:128.0.0.1 | n:vuln123 | 128.0.0.1 | 172.0.0.2 | vuln 123 |
| n:172.0.0.2 | n:vuln123 | 128.0.0.1 | 172.0.0.2 | vuln 123 |
| n:128.0.0.1 | n:172.0.0.3 | 128.0.0.1 | 172.0.0.3 | vuln 123 |
| n:128.0.0.1 | n:vuln123 | 128.0.0.1 | 172.0.0.3 | vuln 123 |
| n:172.0.0.3 | n:vuln123 | 128.0.0.1 | 172.0.0.3 | vuln 123 |

2. Inspect & clean

We can immediately start using the visualization, both for making insights, and often, to make data wrangling fast, smart, and easy. Graphistry’s UI will automatically highlight the top entities, relationships, and outliers, and we can use the point-and-click controls for actions like filtering, coloring, clustering, and inspecting.

We may want to tune the gaph’s shape. We can control the data’s shape in-tool, but when sharing with others, it’s better to perform many of the actions in the file uploader by tweaking the inferred settings.

Data preparation may be needed for issues like invalid file formats or missing data. For that, you can inspect the data tables Graphistry found in your upload(s) and fix them in your favorite tool like Excel or Jupyter.

The honeypot logs look fine so we can continue: 220 rows, and the IP address columns are strings (object) and the first/last seen columns are timestamps (datetime64[s]).

Reshape your visualization

We can do some explorations in-tool, such as coloring by time, to see patterns in when entity interactions first started. In this case, we see waves that are earlier, later, and on-going:

Likewise, we may want an additional entity type (column) beyond those that were automatically picked by default. In this case, we want to see how the ports used are correlated. We quickly see one honeypoint is uniquely active on port 445, and secondarily on port 139:

We can also specify edges. In this case, we can make each row have the src and dest IPs point to the ports and vulnerabilities:

You can fine-tune much more, such as which columns to merge values with repeating values. (By default, it merges across all, though we’d likely want to instead have categories IP for src/dst, then separate Port and Vulnerability). Upcoming articles will share more, so stay tuned.

Happy graphing, and swing by our Slack channel for ideas and help.

Get started now! Try on a CSV