The sharing paradox: To scale collaborative investigations, Graphistry has been locking things down

Posted by Leo Meyerovich on August 27, 2021

Collaboration is a critical way modern data analysts scale their insights and impact. It represents the kind of capability gap the Graphistry team aims for in our broader mission to 100X the investigation process. Our team has been working hard on releasing the first wave of new features to scale graph-empowered investigation collaborations across team and organization boundaries. These features are informed by a concept for secure collaborative software that I’ve informally thought about for years but never saw clearly described.

We’re calling it the Sharing Paradox: By making it easier to limit access, the counter-intuitive result is we’re increasing sharing and collaboration.


The big idea: Scaling collaboration by restricting access

Graphistry already builds social computing into our visual graph intelligence technology because collaboration helps analysts look further and impact more people. Analysts can share live URLs of their rich no-code GPU-accelerated graph visualizations, low-code investigations, and Python data science notebooks with embedded interactive GPU sessions. Not seen in most data exploration tools, analysts can even collaborate across successive investigations by low-coding a team playbook of visual investigation automation templates.

However, Graphistry reached a point where enabling more people to share a data experience is, paradoxically, often less about who can access it and more about who cannot. Tools by growth hackers get attention through promotions like Twitter threads, Slack alerts, and email feeds. However, no data owner wants to be in the news as the next big data leak, and we shouldn’t be building tools that encourage that kind of mistake. That rightful fear of data exposure means collaboration tools that optimize on scaling reach often end up used for promoting watered down analyses with limited utility. The Sharing Paradox tells us that, for more impactful sharing, the energy that goes to 100X’ing reach should also go to 100X’ing security restrictions. By simultaneously increasing reach and restricting access, we can make it safe for team members to connect more useful data sources and share more meaningful investigations with more people.

In the spirit of 100X investigations, it now makes sense to think about 100X collaborations. Secure collaboration has been a top 3 priority for the launch of our new cloud-native tier, Graphistry Hub, and the next 6 months of its roadmap. Many of our users work with sensitive enterprise, government, user, financial, security, and otherwise proprietary data, so access control features like being able to run on air-gapped networks have been necessary for successful adoption since our early days. However, the Sharing Paradox is a design principle that makes us think about flexible attenuation as a deep part of how we unlock richer analyst workflows. Starting with the recent v2.37.30, we have begun exposing our new foundational layers for secure collaborative investigations.

 

Figure: Sharing panel in action
This article walks through our first wave of new access control features and, counter-intuitively, how they open up new collaboration scenarios:

  • Secure defaults: YouTube-like unlisted web key URLs
  • Sharing for professionals: Introducing the sharing panel & access control lists
  • Notifications as secure delegations
  • API UX: Sharing for data scientists and developers
  • Corollary: Improve collaboration to improve security
  • Looking ahead: Profit centers, teams & going beyond RBAC

Secure by default: YouTube-like “unlisted URLs” are web keys


secret_url = g.plot(render=False)

You’ve likely used the Graphistry free tier’s secure defaults on sites like YouTube. New uploads default to publicly visible + unlisted: they are only editable by the creator, and until explicitly shared, only the creator can guess the URL. Importantly, unlike most social media sites that continuously promote content, Graphistry defaults to private. The only people who can see your data are those with whom you’ve shared a link. That means we take care in our UIs and APIs not to leak which ones exist. This is known as web key ocaps. For example, if you past the URL in a private Slack chat, anyone with the URL can know about it, and anyone they share it with.

For example, when an analysts make a useful interactive plot in a notebook or our no-code file uploader, they can then send the link to someone just as they would a regular webpage… because they are. It’s about as easy as you can get — everyone already does it all the time! By taking care that the links are actually securely shareable web keys, we’ve combined easy collaboration with true data security, and our users have collaborated on all sorts of headline news investigations like on US voter suppression campaigns.

Securing sharing unlisted items as web keys is a brilliant way to meet the Sharing Paradox. Web-based collaboration is already based around the idea of people sharing links, and passing around view or edit links is about as simple as you can get.

Investigations need more than this kind of one-size-fits-all default security policy. A subtle example where simple web keys break down for some of our users is how TLS solves the man-in-the-middle attacks that most teams care about, but not some heightened threat models. Consider investigators collaborating on the above elections report before it reached ABC and others. Clicking a Graphistry HTTPS URL lets snooping internet service providers see you are going to "https://hub.graphistry.com", but encrypts the rest, including the secret path "/graph/graph.html?dataset=UnguessableSecret123". The ISP doesn’t get to see the actual magic link. This kind of protection is good enough for most teams. However, if the investigators share the link over a tool like Slack, iCloud, or Google Documents, they are deciding on a trade-off: is the benefit of collaborating over those third-party services worth the risk of the services getting access to their data? For some of our users, that’s too risky. For example, corruption investigations are at risk when communication tool providers are fine bending to regional government officials, and customer data agreements may prohibit sharing data with other companies.

Simple web keys got us far, but we clearly needed to do more. One option was to support fancier links, such as expiring ones and password-protected ones. This can go quite far as, in the extreme, links can encode full-fledged continuations and arbitrarily fine-grained capabilities. However, from a user interface perspective, we wanted to focus more directly on the typical policies, and working backwards, more direct ways of interacting with them for different types of users.

Sharing for Professionals: Introducing the sharing panel & access control lists

Our new sharing panel makes additional typical collaboration patterns safe, easy, and quickly configurable. If you’ve shared anything like a Google Document or Dropbox link, it should feel familiar. These designers successfully navigated the Sharing Paradox.

sharing panel - invite list

Directly from your visualization or private gallery, pressing a visualization’s share button opens our new sharing panel. From there, you can flip the sharing policy to private and include your colleagues. Now, viewers must be logged in to get access. You can configure each invitee as a Viewer or Editor, and even give one of them ownership. As you share with more people, the autocomplete feature becomes quite handy.

The sharing panel is a simple collaboration UI that combines sharing with a powerful concepts around authorization. Sharing panels are a great example of respecting the Security Paradox by putting you just a few clicks away from targeted sharing:

  • Control: Visualization owners can now individually configure their own custom discretionary access controls
  • Reach: To make configuration easy, our panel is initially exposing one of the simplest discretionary access control policies: an an access control list of invited users and with a dropdown for whether they can view vs edit. DAC ACLs aren’t enough in team settings, so we’ve built in another layer of expressivity, which will be the focus of Part II.

Similar to how Stripe made payments easy for websites, we’ve built in our Share button to make secure collaboration easy for investigations, starting with the visualization URLs.

Notifications as secure delegations

When sharing with collaborators asynchronously, it helps to use the notify option so that they’ll receive an email invite right in their inbox. They do not need a Graphistry account before you share with them: they can sign up on-the-fly when receiving their private link, and will be redirected as needed. Depending on a quick dropdown setting, they can get edit rights, or the secure default of only view right.

To make collaboration even faster and scale further, invited users can not only view/edit, but also invite people from their own network. Supporting delegation is important for scenarios like meetings where a Zoom presenter will be busy talking and needs another teammate to handle additional invites, and sharing across different teams.

The access control policy is an important attack vector that requires its own secure design. Three notable ways we had to do that for the invite system are:

  • Secure delegation: When a user invites someone else, they can only give as much access as they already have. If a user can only view, then they invite a friend to view, but not edit. Otherwise, the privilege escalation would violate integrity.
  • Attenuated invites: When inviting a collaborator, whether for direct access or delegated, being able to choose their access level
  • Sharing network confidentiality: Autocomplete of invitees makes it a lot easier to share, but everyone hates spam, and if you have multiple projects, you may want to protect your other collaborators’ identities. So, we show immediate collaborators and provide autocomplete… but only up to your personal sharing network.

Yet again, we see the Sharing Paradox rear its head: To encourage wider sharing, not only do we need to make it easy to send invites, but we must also help restrict what the invitees can see and do so the item owners are willing to make those invitations.

API UX: Sharing for data scientists and developers


graphistry.privacy()
# or equivalently:
# graphistry.privacy(mode='private', invited_users=[], notify=False, message='')

shareable_and_embeddable_url = g.plot()

Figure: Private-mode one-liner

As usual, Graphistry exposes our new sharing capability as APIs streamlined for easy and flexible use in both data science notebooks and custom dashboards & embedded analytics apps. The Sharing Paradox is not just for point-and-click end-users, but developers and data scientists too!

For data science notebooks, the typical desirable security policy is the same as in our unlisted url scenario. Data scientists can generate plots in their notebook and share links to the notebook or embedded visualizations, and with no additional work, get the same great security guarantees of unlisted web keys. (Notebooks kernels often have heightened resource access, so we limit live ones to logged-in Staff-level users.) Likewise, if the data scientist or developers want to switch to another security mode, such as default-private (non-public) or some preferred sharing group, they are just one line away: see the above code snippet. One-offs can even be quickly done with no code. Just go into the sharing panel in the generated visualization and flip the needed toggles.

More advanced notebooks, and custom apps and dashboards, may want to go even deeper. To see example one-liners for handling a variety of different use cases, you can explore the new sharing tutorial.

Corollary: Improve collaboration to improve security

Edit 09/23/2021: Added thanks to Devdatta Akhawe (Head of Security, Figma)

Just as the Sharing Paradox suggests increasing security can increase collaboration, we can go in the reverse direction: increasing collaboration can increase security. This is not obvious, as one of the most basic principles of security is Least Privilege, which biases against sharing.

Dev notes that most people do not use managed tools like Graphistry (or his previous company, Dropbox), and instead collaborate over more legacy or ad-hoc methods… which are insecure. Graphistry is building the ability to run managed investigations. Typical investigations are instead unmanaged. For example, many analysts will email executable scripts and notebooks to be run in privileged environments, which is risky. So the mere act of using a platform like Graphistry provides a level of security controls on investigation data over more ad-hoc approaches.

Not all collaboration software is designed to be secure. However, when it is, the Sharing Paradox suggests increasing collaboration will increase security, as it heads off more unmanaged alternatives.

Looking Ahead: Profit centers, teams, & going beyond RBAC

The new sharing panel is already piloting with select Graphistry Hub users and will be made globally available for all Hub Pro and self-hosted users next week. For groups like data science consultancies, it has already been important for workflows like sharing read-only views with different external collaborators. Whether you want to explore the relationships in your data, or give the new sharing panel a go, we encourage you to take a look!

Bigger groups and richer data require rich sharing modes. For example, a table stakes feature for security compliance is Graphistry’s existing support for role-based access control, which enables our separate policies for public users vs. staff (team) vs. admins. Likewise, we are working to bring a more flexible Teams tier to Graphistry Hub. For early access, feel free to reach out.

Looking further ahead, we are firm believers that security should not be thought of as the “Department of No”, but a profit center for the broader organization. The Sharing Paradox is an important case of making security a profit center. Our note’s focus has mostly been about unlocking analyst sharing patterns: the marketing industry loves that just as well. We can even attach dollar tags to it: Companies like Facebook, Google, and Microsoft pay billions for content sharing networks because they value strengths like network effects. Product teams are beginning to use Graphistry as part of accelerating their embedded visual analytics strategy (it’s easy!) and improving client deliverables, so we have been getting a lot of enthusiasm for the new sharing features. Expect more articles on that.

On an engineering note, we have been integrating a new policy engine that enables modes like discretionary ABAC, not just RBAC/ACLs. The Sharing Paradox rears its head again: Most popular policy systems struggle with modern realities like data fusion and multitenancy, so we find graph teams slowed down on achieving reach when they have to manually wrap their data planes and UIs with bespoke data security controls. Google and various VC-backed startups are proposing different kinds of solutions here, so after careful evaluation and prototyping, we were surprised to eliminate only a few seemed viable long-term. Stay tuned for Part II, which will accompany the launch of Hub Teams, for what we have been doing.

On behalf of the Graphistry Team, happy graphing, and swing by our Slack channel for ideas and help!

Get started now! Try on a CSV