Graphistry 2.30.26: Graph gallery, edge weighting, a faster release schedule, and more

Posted by Graphistry Staff on July 12, 2020

Graphistry weighted edges live clustering animation

Figure: Clustering animation when neighbor edges weigh more than non-neighbors

 

Graphistry v2.30.26 is a fast and worthwhile follow on to our big v2.29 introduction of the 2.0 API for quickly uploading big graphs (Apache Arrow + RAPIDS!) and the 2.30.11 launch of Graphistry Hub for free open visual investigations on managed Graphistry GPU instances. Every release continues to push the edge on what is possible with visual graph analysis!

Our latest release introduces several big things:

  • 1. Weighted edges for more control of graph clustering
  • 2. A graph gallery of your 2.0 uploads
  • 3. Accelerated release schedule
  • 4. Details: Improved REST docs, 5X+ faster migrations, 2.0 API bugfixes

As a reminder, check out RAPIDS Academy (LearnRAPIDS.com) to register for some of our free upcoming instructor-led trainings, including and intro to graph and security GPU analytics, and an intro to multi-GPU Python computing that was recently used to win the TPCx-BB industry benchmarks.

For more release information, such as on migrations, see our official v2.30.26 release notes.

Get started now!

 

1. Edge weights for influencing clustering

With the 2.0 API, you can now use edge data to influence clustering. Weighted edges get used in many clustering scenarios where one edge should “count” more than others. As some examples:

  • Data model: Some edges may inherently count more than others, such as friend vs. acquaintance relationship edges in a social network, and might have an attribute like strength
  • Aggregates and multiedges: When an edge between two nodes represents many edges, such as when many transactions are summarized as one, there may be an edge count
  • Algorithmic scores: In algorithmically generated networks, such as in dimensionality reductions and nearest neighbor rankings, there are often edge scores such as distance
  • Annotations: Some edges may be added for purely presentational purposes, so setting them to a weight of 0 enables preventing them from influencing clustering

To use weighted edges, bind a float32 edge column using binding edge_weight. The values are automatically normalized between 0 and 1. Optionally, also set the global scaling factor (url parameter edgeInfluence) or tweak it live in the layout settings panel:

graphistry\
    .edges(edges).bind(source='s', destination='d')\
    .bind(edge_weight='1_if_neighbor')\
    .settings(url_params={
        'play': 5000, 'edgeInfluence': 1.5,
        'pointSize': 0.3})\
    .plot()

For more information, try the edge weight notebook tutorial.

 

 

Figure: New graph gallery for managing uploads

2. Graph gallery for managing data

You can now explore and manage your uploads! In the gallery page, you can explore your past uploads, edit their names and descriptions, and delete individual graphs or in bulk.

Developers can also programmatically manage data via the new 2.0 REST API. For example, PyGraphistry users can set the dataset name and description as part of their code:

graphistry\
  .edges(pd.read_csv('edges.csv'))\
  .bind(source='user', destination='ip')\
  .name('Visitors').description('Map user to IP')\
  .plot()

3. Accelerated release schedule

With the launch of Graphistry Hub and our new internal cloud infrastructure, we have accelerated our release schedule. The new schedule is roughly:

  • Graphistry Hub: Daily
  • Self-hosted releases (minor): Every 2-3 weeks
  • Platform upgrades (major): Every 6 weeks, synchronized with the official RAPIDS GPU community release schedule (cuDF, BlazingSQL, …)

As a reminder, all self-hosted users get support for both their version and assistance in upgrading to the latest, just reach out for pointers or to schedule an appointment.

The result is self-hosted users get a predictable experience that they control, including the ability to stay on-edge, and we can work with all users on new features via the Graphistry-managed Hub instances. Best of all, our steady improvements means more features to everyone, and with more reliability.

Getting to this point has been a feat of engineering both by Graphistry and the broader end-to-end GPU computing community. Look forward to an article on what this looks like underneath!

Details: They matter.

Thank you to SK, Jan, and other users who shared bug reports, especially on the new 2.0 API. We fixed a bunch, including:

  • Improved handling of reserved column names such as src and dst
  • Improved handling of nodes not in the edge table, and vice versa
  • Filter generation over dates

A variety of smaller features and documentation improvements are now also available. Our internal favorite is a 5X+ speedup in data migrations across version! See the full release document for more information.