NDA Biomedical AI · Internal SaaS · 2023 — 2024

Abstra

How I helped 180 researchers stop losing their time to bad data

Overview

Role Lead Product Designer
Period 2023 — 2024
Teams 3 squads · 4 to 6 people
Type Biomedical Internal SaaS

Abstra is the internal data cataloguing platform I designed from scratch at Owkin. It took three years, 52 interviews, and one consequential pivot to turn chronic institutional friction into something researchers actually trusted and used every day.

Working alongside CPO, PM, and three squads, I led discovery, facilitated workshops, designed the full product, and built the design system from the ground up — all concurrently.

0→1 Product Design UX Research Design System Workshop Facilitation Atomic UX Research Opportunity Solution Tree
Abstra — data catalogue overview · [Replace with platform screenshot]

“We want to unite healthcare researchers and data experts, powering collaboration to maximise the value of their data and accelerate scientific discovery.”

Thomas Clozel — CEO, Owkin
01 Problem

The situation

Researchers couldn't find the data they already had

Imagine you're a researcher at a biomedical AI company and you need data. Not a concept — actual research data. A specific dataset that might already exist somewhere in the company, processed by a colleague last quarter, sitting in a folder you don't have access to, under a name you don't know.

So you do what everyone at Owkin did: you search a bit, identify a potential dataset on the web, look at whatever documentation exists, find nothing useful, then open Slack and post in #data-questions. You wait. Maybe someone replies. Maybe they point you to a Google Sheet last updated eight months ago. Maybe it's the wrong dataset anyway.

This was happening dozens of times a week, across 180 researchers, across five squads. The problem wasn't that Owkin had bad data. The problem was that no one could find it — and no one could trust it once they did.

~40% of a researcher's time spent hunting for datasets
5+ disconnected knowledge sources per project
180 researchers affected across 5 squads

Four issues kept surfacing

73%
All knowledge lived in silos
A dataset cleaned by one team last quarter was invisible to the next. No record, no signal. The organisation had no shared memory.
62%
Finding the right data was a job in itself
Even when researchers knew exactly what they needed, there was no real way to find it. Access rights, ownership, availability — all opaque.
44%
The data existed, not the tools to use it
Annotation, visualisation, federated analysis across sites — none of it was in-platform. Researchers improvised, workaround after workaround.
37%
Built for today, not tomorrow
Every tool was solving the current problem, not the next one. None of it was designed to scale with research complexity.
02 Research

Discovery

52 interviews and a few workshops later, we had 603 findings

When we started this initiative, there was no product and no brief — just a sentence: “researchers need to find data better, and better data.” My first task was understanding what “better” actually meant.

Over two months, I conducted 52 interviews with 19 distinct users: Data Engineers, ML Researchers, Business Developers, and Project Leads. Using an Atomic UX Research methodology, every observation was clustered into insights, then mapped to opportunities. The result: 603 structured findings, each fully traceable to evidence. Nothing floated on intuition alone.

To synthesise that volume of input, we used a custom template combining Atomic UX Research with the Opportunity Solution Tree framework — breaking feedback into facts, insights, and opportunities. This moved us beyond anecdotes and toward recurring, high-impact patterns.

52 interview sessions conducted
19 distinct users across 4 roles
603 structured findings, all evidence-traced

To bridge research and delivery, we ran a User Story Mapping workshop with the PM and development leads. The exercise forced a shared language across disciplines. Mapping the full journey — from receiving a project request to qualifying results — revealed not just what was broken, but where in the workflow each friction lived.

Criticality levels weren't assigned by gut feel. They were derived directly from the frequency and severity of friction patterns surfaced in research. When we entered prioritisation conversations with engineering, we weren't debating opinions — we were reading from a common document everyone had helped build.

Atomic UX Research board — observations clustered into insights · [Replace with artefact screenshot]
User Story Mapping workshop output — shared journey map · [Replace with workshop photo]
03 Users

Personas

Two personas, one shared problem

The research surfaced two distinct profiles operating within the same broken system — but failing in different ways, at different stages, for different reasons. Building for one at the expense of the other would have solved half the problem and created new ones.

Data Scientist
Jonathan Braumer
Bioinformatics Researcher · PhD in Bioinformatics
3–5 Active projects in parallel
40% Time lost searching & cleaning data

“When I start a new project, I want to quickly identify reliable datasets so I can spend less time validating data and more time on bioinformatic analysis.”

Key frustrations
Knowledge lost across teams and tools
Outdated or poorly documented datasets
Lack of standardisation and versioning
Difficulty reproducing past experiments
Biomedical Researcher
Dr. Sophia Jensen
Senior Biomedical Researcher · PhD in Molecular Biology
3–4 Active clinical studies
30% Time on data reconciliation

“I need access to comprehensive, current datasets without manual hunting — and standardised formats to reduce the integration friction that slows every project.”

Key frustrations
Multiple sources require constant manual reconciliation
High-quality datasets are hard to access and poorly documented
Preprocessing consumes time that should go to analysis
No collaborative layer to preserve data context across teams
04 Pivot

The strategic decision

The original brief was wrong. So we pivoted.

The initial vision for Abstra was ambitious: an open research network connecting the entire biomedical ecosystem — a kind of LinkedIn for research data. It sounded compelling in a deck.

The research told a different story. The problem wasn't connecting Owkin to the outside world. The problem was that Owkin's own researchers couldn't connect with each other. The pain was internal, not external — and solving for the wrong scope would have built the wrong thing.

This was the most consequential design decision of the entire project — and it happened before a single screen was drawn. It required aligning leadership on a smaller, sharper scope, which meant presenting the research clearly enough that the trade-off felt obvious rather than controversial. Narrowing the scope didn't shrink the ambition. It focused it.

Original brief
“An open research network for everyone in the biomedical ecosystem”
Revised scope
“An internal data cataloguing platform for Owkin’s research teams”
05 Design System

Foundation

One visual language for five squads shipping simultaneously

A 0→1 product at Owkin meant building the foundation as the building went up. Five squads were shipping features concurrently — without a shared system, visual drift wasn't a risk. It was a certainty.

I built the Abstra design system alongside the product, starting from Owkin's existing brand palette and extending it into a component library every squad could use. Owkin's core colours weren't accessible out of the box, so I spent time refining them against WCAG standards before a single component was built on top of them.

The result: zero visual regression across five squads over 18 months. Engineers stopped rebuilding components from scratch. The system held — not because it was enforced, but because it made the right thing the easiest thing. When the foundation is solid, product velocity compounds on top of it.

60+ components built and maintained
5 squads shipping from the same system
0 visual regressions over 18 months
Abstra design system — component library in Figma · [Replace with design system screenshot]
06 Solutions

Four principles for one platform

Every solution traces back to a documented pain

The four principles — Trust, Reusability, Efficiency, Collaboration — mapped directly to the four recurring failure modes identified in research. Nothing was added because it seemed like a good idea. Everything was added because someone had told us, in their own words, that its absence was costing them time.

01 Trust
Key insights must be legible at a glance. Quality checks, explicit ownership, and freshness indicators became first-class UI citizens.
02 Reusability
Every documented dataset reduces future friction for the whole team. Documentation had to feel lightweight — not a tax on already stretched researchers.
03 Efficiency
Time-to-find is the metric that matters. Every unnecessary click between "I need data" and "I found it" is a design failure.
04 Collaboration
Research doesn't happen in silos. Communities, shared projects, and explicit ownership make Abstra a social layer over data — not just an index.
Solution 1 of 4 Data Catalog

From blindness to awareness

A dataset page that makes trust legible at a glance

Interviews revealed that researchers weren't just unable to find datasets — they didn't know whether to trust them once they did. Ownership, freshness, and provenance were the three recurring unknowns. The research reframed the design question entirely: it stopped being about displaying a dataset and became about what a researcher needs to know before deciding to use one.

We mapped what researchers ask in the first five seconds versus what they need to dig into. That sequence became the page architecture: availability above the fold, detail in tabs, context in a persistent column.

01
Surfacing the trust gap
From interviews: researchers weren't just unable to find datasets — they didn't know whether to trust them once they did.
02
Structuring the hierarchy of need
Mapped what researchers check first vs. what they dig into. That became the page: availability above the fold, detail in tabs, context in a persistent column.
03
Translating complexity into clarity
The Availability section went through more iterations than any other — not to make it comprehensive, but to make it unambiguous across every user type simultaneously.
Data Catalog — dataset detail page with trust indicators · [Replace with product screenshot]
Dataset page — Lineage tab showing full processing graph · [Replace with screenshot]
Dataset page — History tab with full audit trail · [Replace with screenshot]

“Trust isn’t claimed, it’s demonstrated. Every update. Every version. Every person who touched it and when.”

Solution 2 of 4 Advanced Search

From searching to finding

Built around scientific vocabulary — not file system logic

The problem wasn't a missing search bar. It was that the mental model of researchers and the data model of the system had no shared language. Across 52 interviews, the same terms appeared repeatedly: therapeutic area, indication, modality, cohort size. We catalogued that vocabulary before designing a single filter.

Each of the seven filter dimensions maps to a criterion that appeared in at least 40% of interviews. Nothing was added because it seemed useful — only because researchers said its absence was blocking them. Moving aggregate metrics above the results fundamentally changed the research workflow: researchers assess fit before they explore.

01
Mapping scientific vocabulary
Catalogued recurring terms across 52 interviews before designing a single filter. The vocabulary shaped the interface, not the other way around.
02
Defining filters from evidence
Each of 7 filter dimensions maps to a criterion that appeared in ≥40% of interviews. No filter added on assumption.
03
Elevating the evaluation moment
Moving aggregate metrics above results changed the workflow: researchers assess landscape fit before exploring individual datasets.
Advanced Search — results page with faceted filters and aggregate metrics · [Replace with screenshot]
Solution 3 of 4 Integrated Tooling

From scattering to integrating

The hidden cost was leaving the platform to do the actual work

Context-switching wasn't visible in the interview data — it emerged from observation. Researchers had normalised it. Mapping their actual workflows revealed how much time was lost between discovery and use. Rather than building a full analysis environment, we asked: what is the smallest integration that eliminates context-switching for the most frequent cases?

For histology teams, slide inspection was the clear answer. Routing the viewer through the file tree kept discovery and analysis separated while making the tooling accessible. The catalogue's purpose stayed intact. The workflow improved.

01
Quantifying the invisible cost
Context-switching emerged from observation, not interview data. Researchers had normalised it — until we mapped the actual time lost.
02
Defining minimum viable integration
Not what could we build, but what eliminates the most friction for the most users without expanding the product's surface area.
03
Embedding tools in the data structure
Routing tooling through the file tree kept catalogue purpose intact while making analysis immediately accessible.
In-app slide viewer — full-resolution WSI with layer navigation · [Replace with screenshot]
Analytical tools catalogue — launch directly against any dataset · [Replace with screenshot]
Solution 4 of 4 Projects & Collaboration

From isolated to collaborative

Extending the foundation — not duplicating it

After the catalogue shipped, a new friction emerged: researchers were finding datasets but losing the shared context around them — working hypotheses, prior attempts, who else was involved. The decision to scope Projects out of the initial release was as consequential as the decision to build them. Sequencing was itself a design decision.

Extension, not duplication. Projects had to reference the catalogue's trusted data — not copy it, not replace it. That constraint kept the product coherent and the catalogue's integrity intact. The bidirectional model emerged from watching how different researchers actually started their work: neither flow was more "correct", and both needed to be first-class.

01
Identifying the post-launch gap
Researchers found data but lost shared context. Working hypotheses, prior attempts, collaborators — all invisible.
02
Defining the boundary
Projects reference catalogue data — they don't replicate it. One source of truth. Collaboration layered on top, not beside it.
03
Two entry points, one connection
Start from the dataset and push to a project, or start from a project and pull datasets in. Both directions are first-class — neither is a workaround.
Projects workspace — shared environment with bidirectional dataset connection · [Replace with screenshot]
07 Results

Impact

The impact was measurable, and more importantly, felt

Within six months of launch, Abstra had become the default entry point for data discovery across all five research squads. Adoption wasn't driven by mandate — it spread because the researchers who used it first told the ones who hadn't.

The metrics confirmed what we already knew from qualitative feedback: the friction hadn't just been reduced. For most workflows, it had been eliminated. Abstra was described internally as “an unexpected success.”

180 researchers using Abstra daily
60+ design system components in active use
5 squads aligned on one shared system
0→1 product built from scratch in 18 months

“The friction hadn’t just been reduced. For most workflows, it had been eliminated.”

08 Stack

Tools used

  • Figma
  • Atomic UX Research
  • Opportunity Solution Tree
  • Storybook
  • Notion
  • Airtable
  • Fullstory
  • Miro

Next project

Bouygues · ADP