Graphistry 2.27.8: RAPIDS 0.11, NodeXL imports, speedups, & hardening

Posted by Graphistry Staff on February 11, 2020

 

We recommend upgrading to Graphistry 2.27.8 to improve your existing flows and plug into more data such as NodeXL for social media analysis. Read on for an overview of March’s release. For a full breakdown and upgrade considerations, see the detailed release notes.

Binaries are now available for enterprise users and JS/Python clients are available on GitHub/npm/pip.  AWS Marketplace and Azure Marketplace may take 4 days – 2 week for 2.27 to become available.

 

Take performance to 11!

RAPIDS ecosystem upgrades bring us to 0.11: cuDF, cugGaph, cuML, and BlazingSQL. These improvement performance and fix a variety of bugs, including some that surfaced to Graphistry users, and bring a variety of new features when you’re using them directly. Graphistry’s notebook environment ships with them already enabled: enjoy! See individual announcements by Nvidia and BlazingSQL.

The 0.11 upgrade includes Apache Arrow 0.15, which is not backwards compatible with earlier Arrow versions. This should be transparent to Graphistry end-users, especially on fresh uploads.

 

NodeXL Logo

Social media network analysis with NodeXL imports

You can now easily level up your NodeXL social media analysis with Graphistry’s smart and GPU-accelerated visual graph analytics! This is a great way to explore data such as from Twitter.

Many users of NodeXL enjoy their ability to pull in data from social networks, but then wonder how to look at it all: Graphistry’s visual scale and visual analytics solves that. You can now more easily get the best of both worlds! Graphistry 2.27 shares notebooks for taking NodeXL exports and viewing them in Graphistry. You can use these directly, or embed as Python in your own apps. Colors, images, and links will be automatically loaded!

 

To use the new feature, either publish your NodeXL analysis to NodeXL’s public Graph Gallery or export the XLS, and then put the path into the following snippet. Optionally, if you know the data source (ex: ‘twitter’), include that, otherwise pass in no parameter there:

 

graphistry.nodexl(
    'https://www.nodexlgraphgallery.org/Pages/Workbook.ashx?graphID=220124',
    'twitter').plot()

 

We encourage you to try it out. Reach out if you have further ideas or improvements! The notebook comes preloaded in Graphistry’s Jupyter environment, and you can see online in the public Github demo folder.

Figure: NodeXL XLS file export loaded into Graphistry

 

Deploy big, small, and *just* right

We are increasingly seeing Graphistry used on resourced-constrained developer instances (e.g., our recent AWS g4dn reduced-cost Marketplace offering) and on massive yet shared multi-GPU systems. New controls enable you to specify CPU/GPU replication factors across all Graphistry services via simple environment variables in your custom.env . Whether you have 1 GPU with 2GB GPU RAM or 16 GPUs with 512GB GPU RAM and  5 GPUS already claimed by a GPU DB, you can now explicitly right-size Graphistry.

For example, if there is one GPU with 16GB RAM, and 8 CPU cores with 64 GB RAM, we may want 4 GPU workers (2 of each type) to slightly oversubscribe them (more concurrency), and 32 CPU workers for the same reason. Note that some services, such as the Caddy reverse proxy, remain as one Docker service with natively-determined resource use:

data/config/custom.env:

# Mixed GPU/CPU workers
STREAMGL_NUM_WORKERS=2
FORGE_NUM_WORKERS=2


#CPU workers
STREAMGL_CPU_NUM_WORKERS=16
PM2_MAX_WORKERS=16

Blue/Green updates from the same node!

We are starting to facilitate same-node migrations! We encourage most admins to follow a more traditional blue/green deployment strategy in order to minimize downtime: upgrade by creating new servers and simply switching DNS when validated.  (For our cloud users, this also means an OS update!) However, new hardware is not always practical, so we are starting to introduce local green/blue migration scripts: they will incur more downtime, but can run same-node. We recommend experimenting with them in a sandbox before using in a production setting. Local migrations look quite similar to remote ones:

/var/g_new $ FROM_PATH="/var/g_old" TO_PATH="/var/g_new" DOCKER_SUDO=sudo ./migrate-local.sh

 

For cloud users, we generally recommend  live migrating to a fresh server, which is now much faster and resumable and retryable:

/var/g_new/etc/scripts $
    sudo \
    FROM="ubuntu@old.site.com" FROM_PATH="/var/graphistry" \
    TO_PATH="/var/graphistry" \
    ./migrate.sh

 

2.0 Speedups & Fixes

We’re continuing to speedup and fix issues around the 2.0 RAPIDS-based engine.

  • Performance: Many large datasets will load ~2x faster, and we have several more (big) tricks up our sleeves.
  • Data brushing and time value fixes improve several components
  • When loading visualizations from the pivot tool, a buggy race has been fixed that may have prevented visual styles to load

For for the full list, see the main release notes.

Looking Ahead into March & April

  • Cloud users & developers: We have some major announces coming throughout February, March, and April. Especially if you’re an analyst or a developer, stay tuned!
  • Admins: We are testing improved performance controls and monitoring approaches, and are happy to share previews
  • Connectors! We are especially interested in engaging with Azure Sentinel and Kusto teams. Let us know about those, or others you care about!
  • GraphThePlanet II will be on February 27th (RSA week in San Francisco). The event is sold out, but we may be able to help, please contact your closest Graphistry rep for info. We look forward to catching up with everyone, meeting new folks, and spreading the graph love!
  • Our founder Leo launched The Data Ride Alongs ! The first big episode was on February 4th with Stanford Medicine’s Julie Wu and Nvidia’s Corey Nolet on exploring cancer gene and protein mutations across thousands of patients  using GPUs, graphs, and UMAP for accelerated high-dimensional visual analytics.  Tune in for future ones, and snag the open source code after!

 

Get started now!