Agentic tables and chat differentiation for scientists workflow
Led a prototype exploring how AI could reshape how scientists interact with research data. Discovered an unexpected usage pattern during testing, validated the concept with scientists and commercial partners, and shaped it into a meaningful direction for the platform.
Establishing the interface model in methods hybrid chat-and-table mvp for BenchSci's Experiment Validation value stream, gave us both clarity and resolved our most urgent needs.
However, there was on-going questions around how our experience could differentiate from the many other chats. That lead me to take my observations of the MVP so far, and ask:
What would it mean if the table itself were agentic?
The observation
This question emerged while designing the core hybrid interface. During user testing, I kept noticing scientists asking questions about the table.
My hypothesis had been that chat and table would split cleanly along diverge/converge lines: chat for broadening, table for deciding. But that's not what happened.
Scientists follow-up questions were almost always table-specific: filtering rows, adding columns, making comparisons, or asking questions triggered by what they were seeing in the results. Often it wasn't a new overarching questions, but rather a question rooted in the evidence in front of them.
Scientists were diverging from within the evidence.
Where scientists actually diverge
Scientists diverge in chat
Follow-up questions happen after reaching a decision
Query
e.g. TLR4 IHC Protocols
Response
AI directs
Tool
Evidence table
Decision
e.g. protocol conditions
Follow Up
New question
"What primary antibodies would work best?"
Scientists diverge at the table
Follow-up questions happen before deciding — driven by what they see in the evidence
Query
e.g. TLR4 IHC Protocols
Response
AI directs
Tool
Diverge here ↓
Primary antibodies performance
Follow Up
New question at table
Decision
Primary antibody + conditions
"What secondary antibodies would work best?"
This behaviour showed that there was a mismatch between the mental model I had designed for and the one scientists had as they were engaging. It was not something our initial hypothesis or our chat was shown to support.
I discovered that for scientists, the previously assumed 'evaluation' process was actually 'exploratory'. Scientists don't stop asking questions when they reach the evidence. Instead, because of their knowledge depth, the evidence tended to be where the more impactful questions to their work actually start.
It also presented a unique differentiation opportunity that would further take advantage of our data moat.
The opportunity
Forcing scientists to context-switch to chat every time they wanted to ask a question about the table imposed both a cognitive cost, and presented an expectation gap for a chat didn't work that way. The chat could, at best, run a new query that might return a new table, leaving the challenge of re-anchoring soley on the scientist.
Scientists scanning a table are holding a lot in working memory: which results they've already assessed, which criteria matter most for this particular decision, which rows look promising and why. Shifting the visual anchor (specific data on that table) would make that scientifically rigorous evaluation much more challenging.
The question became: what would it look like if the table itself was agentic — if scientists could extend, filter, and interrogate their evidence without ever leaving it?
Differentiation opportunity → Agentic Tool
Approach: Build to think
I reasoned that this was ultimately a question around the interaction model and agentic orchestration. Neither of which would be well demonstrated or tested in a static or even clickable prototype.
A Figma prototype of an agentic table would have simulated what the interaction looked like. A coded prototype had to actually behave, which meant I'd be testing real LLM response parsing, latency, response variability, and real confusion points rather than an idealized happy path.
Built with v0 (initial UI and scaffolding), Claude (functionality support), Gemini (API for speed and cost), and Cursor (code editting and iteration).
This interaction model was dependent on the quality of the feedback loop between user action and system response, that difference was what ultimately led me to choose building a functional prototype using v0, Claude, and Cursor rather than Figma.
Step 1: Orchestration: Context Bus
Building with AI taught me to plan dataflow, structure, and scalability upfront, and to clearly define what the prototype needed to prove versus what could stay rough.
For this concept, I was testing interaction depth, not visual fidelity, so I deliberately avoided over-aligning to the BenchSci DS to prevent context bloat. I mapped out the user flow, dataflow, and agentic orchestration early, connecting conceptually to our Search APIs or a bespoke MCP, but using synthetic data to keep complexity manageable.
My original scope was just divergence from a table, but I quickly hit a core challenge: how does an agent maintain context of both the table and the user's original intent? This pushed me to anchor the entire orchestration around the scientist's origin query, surfaced from the start of their session.
Since LLMs are probabilistic, reasoning chains drift. A context bus solved this by preserving the original query as persistent context that all downstream agents could reference.
From there, "adding a column" became the interaction to test, it keeps the table as a visual anchor while enabling meaningful exploratory divergence.
Chat and raw data views were considered but ruled out for adding complexity without serving the core hypothesis.
Mini data pipeline — original intent preserved at every step
Pipeline steps
Outputs
Pipeline steps
Outputs
processQuery(query)The scientist's natural-language question enters and is pinned to the bus for every downstream step.
queryFrom a UX perspective, the context bus is what makes the table feel aware of what the scientist is doing. Staying oriented to their original intent means less re-anchoring, and a freerer exploration.
The tradeoff is that persistent context can be limiting for scientists who genuinely want to pivot mid-session. Ideally, an agent would recognize that shift automatically, but in this prototype, a new search handles it, which is already a familiar mental model.
Step 2: Functional Prototype
Aside from keeping the visual disual deliberately rough, I also decided to under-tune the prompts to allow for greater flexibility of input as I socialized this interaction model.
This meant that non-scientific product, commercial, and engineering partners could input a query from a field they're interested in, and experience the agentic table with relevant information.
The initial query would set up the context bus and create the intial table. The visible pipeline helped to provide users with clear progress to combat the perception of latency.
Adding columns would generate the information based on the row reference and the context bus. This updates the table with the column and in parallel updates the summary.
Viewing a cell generates an insight that is specific to that table and query context. This enables a more personalized experience.
Initial user feedback
I tested this with 10 scientists and 5 internal non-scientists. Given the new interaction model, lack of polish, and visual feedback, I expected scientists to pause and ask questions. This was intentional so that I could dig into how users thought.
Instead, all of the users used the table efficiently, only stopping with a "Wow!" after seeing what adding a column did.
I had underestimated how much expert users love their tables.
In the prototype, scientists immediately scrolled to orient around the columns, and when adding one, instinctively scrolled right expecting it to appear there. The column's dynamic, flexible nature caught them off guard, but as a delightful surprise.
Adding columns, though unfamiliar, fit naturally into their thinking and workflow.
Commercial partners resonated so strongly that they surfaced a new opportunity: template views that could accelerate demos and help drug discovery teams orient around consistent information. This would be similar to a shared spreadsheet, but more dynamic.
An oncology scientist prioritizing reproducibility across cell lines needs different dimensions surfaced than a neuroscientist focused on reagent specificity.
This led to a template button running queries through a simplified context bus, which also opened the door to latency improvements and pre-caching.
Step 3: Re-Converging with Charts
As I learned from the previous methods mvp, evidence was essential to scientist trust.
Tables were an excellent surface for that, but dense. They were good for comparing attributes across rows, poor at revealing patterns across the full result set: distributions, clusters, correlations, outliers.
Scientists were doing that pattern detection manually.
The agentic table helped scientists diverge while staying anchored in context, but it didn't help them converge. To address this, I added chart visualizations matched to that cognitive task, starting with 4 charts for different comparative purposes.
The added visual complexity also required closing the polish gap to keep the overall experience feeling cohesive.
Filtering
Built on the same data as the table, charts could be filtered — and extended to support account-specific filters.
Step 4: Table-aware chat
Our platform's primary interface was chat; this prototype's was a table. Chat's utility was clear, but how,or whether, it should interact with the table wasn't.
I wanted to answer that question. This meant prototyping a chat contextually aware of the table and context bus. This lead me to creating a separate context bus for chat.
Table-aware chat — table state carried as context at every step
Pipeline steps
Outputs
Pipeline steps
Outputs
captureTableState(columns, rows, filters)Snapshots the current table: active columns, visible rows, applied filters into the bus before any chat operation begins.
tableStateI added chat as a table action to keep focus anchored there, intentionally limiting scope but still showing how it could affect the table.
The original chat directed scientists to evidence. Table-aware chat extended that to help them work within it.
Validation
I tested the prototype with my team, scientists, and commercial stakeholders. The clearest signal came from the column-add mechanic where scientists immediately reached for dimensions specific to their work, despite knowing the table used synthetic data.
They were mentally modeling it as a real research tool, despite it being a prototype!
Commercial stakeholders noted the UX differentiation from competing chat products, and the combination of templates and an agentic table also addressed a service design gap in demos and onboarding, ultimately hintsing at an opportunity for a more bespoke enterprise UI model altogether.
Result
What started as an exploration solidified into a coherent paradigm for evidence-first agentic interfaces. With the interaction model and differentiation case validated, it's now on the platform roadmap as an active exploration as the data model matures.
Reflection
Building in code was the most impactful decision I made. A Figma prototype simulates behavior. A coded prototype, however imperfect, actually lets you test the interaction model.
AI coding tools compressed that loop further, letting me follow my instincts quickly. Though every iteration was ultimately shaped by watching what scientists tried to do.
If I did this again, I'd bring commercial stakeholders in earlier to better understand the service challenges alongside the user ones.
The bigger lesson was learning that designing AI products is only partly about the interface.
The dataflow, pipeline, schema, and agentic architecture are equally essential design decisions, about how the AI should behave, what it should know, and what it should communicate about itself.
Next Case Study
BenchSci - Experiment Validation