Accelerating an analyst’s “time-to-graph” for investigating the relationships in their data has always been a top benchmark at Graphistry. We are now bringing the accelerated graph experience to the full team with graph-app-kit, our open-source graph framework around Streamlit for low-code dashboarding–which now ships by default with Graphistry installations. In this tutorial, we will walk through how team members can use, create, and share Streamlit dashboards.
Architecture: Low-code Jupyter authoring for interactive Streamlit graph dashboard apps
A Graphistry account is the only thing required to use graph dashboarding with Steamlit. To save time, self-hosted Graphistry servers build in graph-app-kit (Streamlit) & Jupyter notebooks, including preintegration for shared editing, that can be quicklaunch on AWS & Azure. Alternatively, you can manually self-host graph-app-kit from our source tutorial or use a streamlit.io account and plugin your Graphistry account.
Part 1: Make a new dashboard view
First, we will make a new dashboard based on an existing dashboard, change some settings, and then inspect the result. Dashboards are in a folder called “views/”, so we also call them views as a naming convention.
To copy your first view, go to the Jupyter notebooks file browser (located on the left-side panel) and:
Step 1. Copy (ctrl-c / command-c) into your computer’s clipboard the contents of /graph-app-kit-private/views/demo_04_simple/__init__.py
Step 2. Go to folder /graph-app-kit-private/views
Step 3. Create folder demo_testing/
Step 4. Create .py file using option “Python File”
***Be sure not to select “Python 3 (ipykernel)” nor “Python 3.8 (RAPIDS)”, which are .ipynb-format notebook files
Step 5. Paste in the Python code from your clipboard
Video 1: Using Jupyter to create a new private view folder and __init__.py copying contents from /graph-app-kit-private/views/demo_04_simple/__init__.py
Next, we will customize how the new dashboard appears in the dashboard view selector and double-check that our changes show up live. Each view controls how it appears by defining the method def info(): in its __init__.py, which you can customize.
Step 6. In Jupyter, find the def info(): at the top of your /graph-app-kit-private/views/demo_04_simple/__init__.py
Step 7. Change field ‘id‘ by editing variable app_id to be a new, unique url-friendly value identifying the view, like “demo_04_simple”
Step 8. Change field ‘name‘ to a more user-friendly value you’d like to be shown in the dashboard picker, like “Demo testing”
Video 2: Configuring a view to appear in the dashboard picker and using the live reload feature to see the results of your changes
To see the results of your changes, which take effect immediately:
Step 9. Go to the private dashboards area
Step 10. On the top-right you see a button asking whether to refresh the cache, click yes, as that will load in your changes
Step 11. Select “Demo testing” from the dashboard picker on the top left
Your dashboard should be loaded in, and the URL should include the ID of your dashboard
You can edit the dashboard using regular Python/PandasStreamlit/Graphistry from here – see Part 4 for some common tips when ready.
Part 3: System Administrators – Toggling Public and Private Views
Regular Graphistry users can skip this step.
Optionally, when self-hosting Graphistry or graph-app-kit, a system administrator may want to control whether the public and/or private dashboard links appear in the navigation menu.
Video 3: Using the system administrator panel for controlling public/private dashboard functionality visibility
Step 1: Go to Admin Portal
Step 2: From ‘django-waffle’ select ‘Flags’ and choose your public/private dashboard.
Step 3: Flag for users and/or organizations you want visible to the dashboard.
Step 4 (Optional): Configure Organizations and People by selecting Users
Part 4: The execution pipeline convention & useful tips
A useful convention in graph-app-kit is to use sidebar_area(), run_filters(), main_area(), and st.cache() . They provide a simple structure to dashboard low-code that makes them easier to debug and maintain for both yourself and others.
The sidebar area typically contains most of the UI input widgets. The docs detail many built-in input widgets. The sidebar area returns a Python dictionary of the user’s input settings for use later in the pipeline.
More advanced sidebars may load data like database schemas to provide smart autocomplete, or dynamically change the input controls using standard “if … then … else” statements.
Using the user input values from the sidebar, this step will typically query a database, apply filters on a file, and use Pandas, RAPIDS, and PyGraphistry to wrangle the data. Advanced users may even run ML and AI models using PyGraphistry or other tools. The result is typically a Python dictionary with Pandas/RAPIDS dataframes and other values.
The main_area() is where visualizations typically go. As input, it gets both the sidebar_area() user input values and the run_filters() output.
|Tip 1: st.cache()|
We strongly recommend caching data across usage to speed up load times and decrease system load. Caching is great for tasks where you may expect the same result to happen multiple times in the same session or across sessions, such as when loading CSVs, storing database queries, and running filters or machine learning results.
More on Streamlit’s cache decorator can be found here.
Tip 2: Wrap – Keep things local and protect them with a try/except
For convenience, you may be tempted to make some data and operations global, like running pd.read_csv() at the top-level. This may cause two common issues you want to work around via wrapping with methods and try/except blocks:
- Wrap initialization and method code with try/except. If there is any error, that may cause the dashboard not to be loaded. So wrap everything with try/except to signal this is happening. Writing exceptions to the screen via st.write() can help debug without having to look at the command line.
- Defer work to an initial run_all() call. You can still store the results using the Python keyword global. An important gotcha to avoid is a dashboard loading a large dataset or running a slow query on system load, as that will significantly slow down all other dashboards… even when the slow one isn’t open! By deferring code to only be called by the run_all() method, these costs will only be incurred upon actual use of the dashboard.
Tip 3: Handling larger files
To handle bigger files quickly and efficiently–both for initial parsing (ex: 500MB CSV) and manipulating (transforms, analytics)–Graphistry servers support GPU dataframes & GPU M. You can switch your
import pandas as pd with
import cudf , which uses the same API as pandas. For datasets bigger than GPU memory, you can likewise use
dask_cudf. An example of this can be found here, or in our Further readings.
Happy graphing, and be sure to check out some of our other tutorials and docs listed below.