Graphistry 2.40.53: Introducing GFQL – The first graph dataframe query language

Posted by Graphistry Staff on December 13, 2023

Get started now! Try on a CSV

The latest Graphistry release marks the introduction of GFQL, the first graph dataframe query language. Additionally, the release adds infrastructure features such as initial OpenTelemetry support. We strongly recommend the upgrade for any Graphistry API users as it also includes fixes around load-time handling of predefined encodings. It is available for on-prem enterprise users, is live on Graphistry Hub, and in preparation for cloud marketplaces.

Key release features include:

  • PyGraphistry GFQL: GraphFrame query language: Perform dataframe-native multi-hop graph search queries without needing a database or separate compute system, just “pip install graphistry”. Check out the graph search tutorial ipynb notebook (live binder), and we will be posting a larger article about it
  • OpenTelemetry: One by one, the services are gaining OpenTelemetry logging, metrics, and tracing support
  • Admin tools: The CLI now supports examining account resource utilization and transferring data ownership between accounts
  • Bugfix: Encodings: As an additional bugfix stemming from an earlier Google Chrome upgrade breaking Graphistry, color/icon/size encodings defined by API users now more reliably take effect at pageload.

See the main release notes for additional features and fixes.

Hello GFQL: The first graph dataframe query language

GFQL is the codename for our new kind of dataframe-native graph query language. We are releasing GFQL under the BSD-3 open source license because it is friendly for commercial use. GFQL brings the expressive power of the popular Cypher family of property graph query languages to working with dataframes. The initial single-threaded version handles many-hop computations in interactive time on graphs with millions of edges. Unlike typical graph query languages, GFQL runs in its embedding Python process, and as we will discuss in future articles, enables hardware acceleration.

Our community started using GFQL for the same reason we built it: GFQL runs embedded, so we can easily use it from our application & compute tiers. Like Spark GraphFrames, GFQL gives easy native interop and optimization with dataframe code, except GFQL brings property graph search, and without requiring external infrastructure. For example, GFQL can run within a Jupyter notebook, a Python web app, or Spark/Dask task, and without needing to manage infrastructure around a new database or compute engine.

Consider the following typical scenario of running graph queries within an interactive dashboard. Just as Python Pandas is a convenient library for tabular in-memory wrangling without needing to go back to a database every step, GFQL simplifies in-process wrangling of graph data. We’re already using GFQL when working with CSVs, Parquet, Pandas, cuDF, SQL, and graph databases. Here we use GFQL to cleanly express property graph queries over dataframes for finding all the senators strongly connected to both Sen. Chuck Schumer and Sen. Nancy Pelosi:

import graphistry, pandas as pd

g1 = (graphistry
  .nodes(pd.read_csv('senators.csv'), 'id')
  .edges(pd.read_csv('relns.csv'), 's', 'd'))

g2 = g1.chain([

    n({“id”: “Schumer”}),

    e_forward(edge_query=“weight > 0.5”),

    n({“type”: “Senator”),

    e_reverse(edge_query=“weight > 0.5”),

    n({“id”: “Pelosi”})

])

GFQL already covers a large subset of OpenCypher, which can be seen by comparing the above query with an OpenCypher version:

MATCH

  (n1: {id: “Schumer”})

  -[e1]->

  (n2: {type: “Senator”})

  <-[e2]-

  (n3: {id: “Pelosi”})

WHERE
  e1.weight > 0.5 and e2.weight > 0.5

RETURN n1, e1, n2, e2, n3

Without GFQL, we see users having to roundtrip this graph code with a graph database, or trying to translate it into the equivalent tabular code, such as with raw Python Pandas dataframes:

merge1 = pd.merge(
  people_df, relationships_df, 
  left_on='person_id', right_on='start_person_id')
merge2 = pd.merge(
  merge1, people_df,
  left_on='end_person_id', right_on='person_id')
merge3 = pd.merge(
  merge2, relationships_df,
  left_on='person_id_y', right_on='end_person_id')
final_merge = pd.merge(
  merge3, people_df,
  left_on='start_person_id_y', right_on='person_id')

result = final_merge[
    (final_merge['person_id_x'] == 'Schumer') &
    (final_merge['type_y'] == 'Senator') &
    (final_merge['person_id'] == 'Pelosi') &
    (final_merge['weight_x'] > 0.5) &
    (final_merge['weight_y'] > 0.5)
]

A big benefit of GFQL is that it is already integrated into PyGraphistry. The result is GFQL works with a variety of accelerated Python dataframe, graph computing, and data science libraries. For example, we can enrich the subgraph with algorithmic findings and plot it:

(g2
  .compute_igraph('pagerank')
  .encode_point_color('pagerank', ['blue', 'yellow', 'red'], as_continuous=True)
).plot()

We will be writing more about GFQL in the coming days. Of note, we designed GFQL for GPU acceleration. If this is relevant to your team, please reach out.

To start, you can

pip install graphistry

and check out the tutorial. You can also run it live binder, just remember to `pip install graphistry igraph`, and for optional plot calls, register with your free Graphistry Hub account API token.

Unreported updates

Some features from other recent releases that we have not reported include:

  • Health check tool: Users can go, from their profile menu, to a health check tool that will report on their system status like WebGL support and internet connection quality
  • Plotly: Notebooks now ship with Plotly charts
  • SSO: Handle more SSO environments and a variety of bugfixes
  • Docs: Improvements around graph algorithms, graph APIs, and other areas

Learn more in our public release notes.

Get started now! Try on a CSV