Many graph visual intelligence investigations start with loading in an edges table and nodes table, such as an events table (edges) that connect items in an accounts table (nodes). Our new no-code file uploader is great for exploring files, even for tricky configurations. However, many of our users don’t realize they can use it with two files!
For this tutorial, we looking at finding which soccer matches would be the most fun to attend. We take a classic network perspective on FiveThirtyEight Club Soccer Predictions. For our soccer-curious Americans, there are many unknowns, such as which teams play one another and which leagues are most central. Likewise, which teams are the leaders and outliers in each league, and which teams are truly international? Using FiveThirtyEight’s tables of matches and team rankings, we can explore how different teams compete against one another in practice. For the busiest match, we look at which teams have the highest offense scores and are also likely to compete with one another.
For completeness, we also include the PyGraphistry version for low-code notebooks and dashboards.
Part 1: Setup edges table
We start by loading the edges table (game matches) into Graphistry’s free File Visualizer:
- Drag and drop file spi_matches.csv
- Hit next and confirm the load data columns `team1` and `team2` have the same data type “object” (meaning text)
- Hit next and configure the graph to only connect the columns `team1` and `team2`, and further down, “Merge values duplicated across all columns”
See screenshots below for each step
A network diagram connecting teams should appear at the bottom of the page. Each edge represents a match: clicking on an edge will show the details of the corresponding match. It automatically positions teams (nodes) based on who they compete most against, creating competitive rings that span multiple official leagues. We can visually understand this by creating a histogram for “edge:league” and using its “Set Coloring” button to color edges by them.
For example, we see the Spanish teams are divided into a primary and secondary division. The primary division competes more internationally, though in practice, we can clearly separate primary division teams like Barcelona into international-level teams vs others that compete in practice more like second division teams. Likewise, while none of the second division Spanish teams act like international-level ones, Eibar is more like a solid primary division team than any second division team, at least in terms of who they compete with.
Part 2: Setup nodes table
To really dig into the data, we want to load in the data for inspecting individual team properties and visually correlating them against other teams:
- Without leaving the page, switch back to the Upload stage by clicking on its name or using the left arrow buttons
- Drag and drop file spi_global_rankings.csv
- Hit next and confirm the load data column `name` has the type “object” (meaning text), which matches that of “team1” and “team2” in the edges table
- Hit next and enable, at the bottom, toggle “Shape node file”. Select the newly uploaded file and detected table, and select column “name” for “node id”.
See screenshots below for each step
The diagram at the bottom of the page should refresh. Now, when you click on a team node, it should include scores like “off” (offense) and “def” (defense).
We can now further inspect the leagues. For example, for livelier games, we might prefer teams with high offense scores. To explore this, we can add a histogram for “point:off”, and hover over high-value off bars to see what lights up. There are 1-2 offensive teams in each international-grade division, and a Spanish cluster shows up well. Using the data brush, we can more easily move a selection window around the Spanish primary division teams, and see that Barcelona vs. Real Madrid would be aggressive. But, Atletico Madrid scores quite high too! Can you spot any others?
Part 3: Histogramming and coloring
A useful technique in Part 2 is highlighting, filtering, and coloring based on node/edge attributes like team league and offense/defense scores.
To use a data attribute for highlighting and filtering:
- On the bottom left, add a histogram for a property of choice, like “edge:date” or “point:league”
- Hover over a histogram bar to visually highlight the matching nodes/edges, such as showing all teams with a high offense score
- Click on a histogram bar to filter for nodes (edges) with that value
- To undo the filter, click again, or open the top filter panel and disable / remove the filter
See screenshots below for each step
To use a data attribute for highlighting and filtering:
- As above, add a histogram for a property of choice, like “edge:date” or “point:league”
- In the histogram, use the buttons like “Set coloring” or “Set size” to use that attribute
- For colors, try using gradient vs categorical to see the difference in effect. Attributes like time, money, and score work great with gradient palettes, while unique IDs and names work better with categorical ones.
- For a gradient coloring on an attribute like “edge:date”, try toggling “reverse” as well. This achieves effects like “cold to hot” coloring.
See screenshots below for each step
Part 4: Optional – Python
We can also automate these steps with Python:
import graphistry, pandas as pd
graphistry.register(api=3, username='###', password='###')
matches_df = pd.read_csv('https://projects.fivethirtyeight.com/soccer-api/club/spi_matches.csv')
teams_df = pd.read_csv('https://projects.fivethirtyeight.com/soccer-api/club/spi_global_rankings.csv')
(graphistry
.edges(matches_df, 'team1', 'team2')
.nodes(teams_df, 'name')
).plot() # set plot(render=False) to get the URL instead
Happy graphing, and swing by our Slack channel for ideas and help!