Aneesh Sathe


Why Every Biotech Research Group Needs a Data Lakehouse

July 29, 2025

start tiny and scale fast without vendor lock-in

All biotech labs have data, tons of it. The problem is the same across scales. Accessing data across experiments is hard. Often data simply gets lost on somebody’s laptop with a pretty plot on a poster as the only clue it ever existed. The problem is almost insurmountable if you try to track multiple data types. Trying to run any kind of data management activity used to have large overhead. New technology like DuckDB and their new data lakehouse infrastructure, DuckLake, try to make it very easy to adopt and scale with your data. All while avoiding vendor lock-in.

Image

The data dilemma in modern biotech #

High-content microscopy, single-cell sequencing, ELISAs, flow-cytometry FCS files, Lab Notebook PDFs—today’s wet-lab output is a torrent of heterogeneous, PB-scale assets. Traditional “raw-files-in-folders + SQL warehouse for analytics” architectures break down when you need to query an image-derived feature next to a CRISPR guide list under GMP audit. A lakehouse merges the cheap, schema-agnostic storage of a data lake with the ACID guarantees, time-travel, and governance of a warehouse—on one platform. Research teams, at discovery or clinical trial stages, can enjoy faster insights, lower duplication, and smoother compliance when they adopt a lakehouse model .

Lakehouse super-powers for biotech #

  • Native multimodal storage: Keep raw TIFF stacks, Parquet tables, FASTQ files, and instrument logs side-by-side while preserving original resolution.
  • Column-level lineage & time-travel: Reproduce an analysis exactly as of “assay-plate upload on 2025-07-14” for FDA, EMA, or GLP audits.
  • In-place analytics for AI/ML: Push DuckDB/Spark/Trino compute to the data; no ETL ping-pong before model training.
  • Cost-elastic scaling: Store on low-cost S3/MinIO today; spin up GPU instances tomorrow without re-ingesting data.
  • Open formats: Iceberg/Delta/Hudi (and now DuckLake) keep your Parquet files portable and your exit costs near zero .

DuckLake: an open lakehouse format to prevent lock-in #

DuckLake is still pretty new and isn’t quite production ready, but the team behind it is the same as DuckDB and I expect they will deliver high quality as 2025 progresses. Datalakes or even lakehouses, are not new at all. Iceberg and Delta pioneered open table formats, but still scatter JSON/Avro manifests across object storage and bolt on a separate catalog database. DuckLake flips the design: all metadata lives in a normal SQL database, while data stays in Parquet on blob storage. The result is simpler, faster, cross-table ACID transactions—and you can back the catalog with Postgres, MySQL, MotherDuck, or even DuckDB itself .

Key take-aways: #

  • No vendor lock-in: Because operations are defined as plain SQL, any SQL-compatible engine can read or write DuckLake—good-bye proprietary catalogs.
  • Start on a laptop, finish on a cluster: DuckDB + DuckLake runs fine on your MacBook; point the same tables at MinIO-on-prem or S3 later without refactoring code.
  • Cross-table transactions: Need to update an assay table and its QC log atomically? One transaction—something Iceberg and Delta still treat as an “advanced feature.”

Psst… if you don’t understand or don’t care what ACID, manifests, or object stores mean, assign a grad student, it’s not complicated.


Work or Play? Ludic Feedback Loops

July 28, 2025

In his substack post today, Venkatesh Rao wrote about reading and writing in the age of LLMs as playing and making toys respectively. In one part he writes about how the dopamine feedback loop from writing drove his switch from engineering to writing. For him, writing has ludic, play-like, qualities.

Image

I have made almost all my “career” decisions as a function of play. I originally started off with a deep love of plants, how to grow them and their impact on the world. I was convinced I was going to have a lot of fun. I did have some. My wonderful undergrad professor literally hand held me through my first experiments growing tobacco plants from seeds. But that was about it. My next experiment was with woody plants and growing the seeds alone took 6 months, and by the end I had 4 measly leaves to experiment with. I quickly switched to cell biology.

This one went a bit better and I stayed with the medium through PhD. Although I was having sufficient aha moments, I knew in the first year that it was still a bit slow. What rescued me was my refusal to do manual analysis. I loved biology but I refused to sit and do analysis manually. Luckily, I had picked up sufficient programming skills.

I could reasonably automate, the analysis workflow. It was difficult at first but the error messages came at the rate I needed them to. I found new errors viscerally rewarding, it was now in game territory. The analysis still held meaning, it wasn’t for some random A/B testing or some Leet code thing. No, this mattered.

Machine learning, deep learning, LLMs, and their applications in bio continue to enchant me. I can explore even more with the same effort and time. I interact with biology at the rate of dopamine feedback I need. I have found my ludic frequency.


On Protocols, Wagons, and Associated Acrobatics

July 27, 2025

Years ago, maybe a decade even, I fell in love with this software called Scrivener. I could never justify buying it because I didn’t actually write. But having that software would represent a little bit of the identity I would like to have, a writer. The Fourth of July long weekend gave me a running start. The plan was to write every day for a month. If I did, I would buy Scrivener. This was going quite well, then I couldn’t write for two days.

I had fallen off the wagon. But hey, I have a wagon. Writing for twenty days isn’t nothing. Like David Allen says, getting back on the wagon is what it’s all about. Falling off happens because life happens. And life, happens to everybody. So, hey I’m back.

I almost wasn’t. I almost said oh well. Then I watched the Summer of Protocols (SoP) town hall talk by Robert Peake: The Infinite Game of Poetry - Protocols for Living, Listening, and Transcending the Rules. The infinite game of poetry is the infinite game of writing. The important bit is to keep playing*.

Being that this is part of the SoP, the question is of course what is the protocol? Robert goes much deeper than just the protocol of writing poetry and being a poet. He gives two equations for doing your life’s work and to build the self. I won’t reproduce those equations here, you should watch the talk.
Here’s the gist of the poeting/writing protocol though:

  • To be a poet is observing the change in self: even when you are not writing you are noticing your inner environment, your outer environment and what you have read.
  • When you start to write, the change in self produces the writing, synthesis.
  • The writing is now part of the change in the self.
  • Sum of all noticing and synthesis is your life’s work
  • The self is is constructed, Robert says on the last day but I think its constructed continuously, through all the iterations of work.

Tyler Cowen, who if nothing else, is a prolific wrote a similar, though not as compact, set in 2019.

Zooming out, this applies to all work not just writing. Showing up and getting back on the wagon is where it all coalesces. But where am I going? To me, building wagons is as important as going somewhere with potential for something new, even if the path is uncertain. Pointing in the direction of maximal interestingness .

This need for exploration and the support from constancy is captured well in the song Life in a Wind :
“One foot in front of the other, all you gotta do, brother
[…]
Live life in the wind, take flight on a whim”


* The Scrivener team seems to understand this well. Their trial isn’t a consecutive thirty days, but thirty days of use :)


Briefing: The State of Explainable AI (XAI) and its Impact on Human-AI Decision-Making

July 24, 2025


This post is a sloptraption, my silk thread in the CloisterWeb. The post was made with the help of NotebookLM. You can chat with the essay and the sources here: XAI NotebookLM Chat


I. Executive Summary #

The field of Explainable AI (XAI) aims to make AI systems more transparent and understandable, fostering trust and enabling informed human-AI collaboration, particularly in high-stakes decision-making. Despite significant research efforts, XAI faces fundamental challenges, including a lack of standardized definitions and evaluation frameworks, and a tendency to prioritize technical “faithfulness” over practical utility for end-users. A new paradigm emphasizes designing explanations as a “means to an end,” grounded in statistical decision theory, to improve concrete decision tasks. This shift necessitates a human-centered approach, integrating human factors engineering to address user cognitive abilities, potential pitfalls, and the complexities of human-AI interaction. Practical challenges persist in implementation, including compatibility, integration, performance, and, crucially, inconsistencies (disagreements) among XAI methods, which significantly undermine user trust and adoption.

Image

II. Core Concepts and Definitions #

  • Explainable AI (XAI): A research area focused on making AI system behaviors and decisions understandable to humans, aiming to increase trustworthiness, transparency, and usability. The term itself gained prominence around 2016, though the need for explainability in AI has existed for decades.
  • Contextual Importance and Utility (CIU): A model-agnostic, universal foundation for XAI based on Decision Theory. CIU extends the traditional linear notions of “importance” (of an input) and “utility” (of an input value toward an outcome) to non-linear AI models. It explicitly quantifies how the importance of an input and the utility of its values change based on other input values (the “context”).
  • Contextual Importance (CI): Measures how much modifying a given set of inputs in a specific context affects the output value.
  • Contextual Utility (CU): Quantifies how favorable (or unfavorable) a particular input value is for the output in a given context, relative to the minimal and maximal possible output values.
  • Distinction from Additive Feature Attribution Methods (e.g., LIME, SHAP): CIU is theoretically more sound for non-linear models as it considers the full range of input variations, not just local linearity (partial derivatives). Additive methods lack a “utility” concept and might produce misleading “importance” scores in non-linear contexts.
  • Decision Theory: “A branch of statistical theory concerned with quantifying the process of making choices between alternatives.” It provides clear definitions of input importance and utility, intended to support human decision-making.
  • Human Factors Engineering (HFE): An interdisciplinary field focused on optimizing human-system interactions by understanding human capabilities and limitations. It aims to design systems that enhance usability, safety, and efficiency, and is crucial for creating human-centered AI.
  • Key HFE Principles: User-Centered Design, Minimizing Cognitive Load, Consistency and Predictability, Accessibility and Inclusivity, Error Prevention and Recovery, Psychosocial Considerations, Simplicity and Clarity, Flexibility and Efficiency, and Feedback.
  • Explainability Pitfalls (EPs): Unanticipated negative downstream effects from adding AI explanations that occur without the intention to manipulate users. Examples include misplaced trust, over-estimating AI capabilities, or over-reliance on certain explanation forms (e.g., unwarranted faith in numerical explanations due to cognitive heuristics). EPs differ from “dark patterns,” which are intentionally deceptive.
  • Responsible AI (RAI): A human-centered approach to AI that “ensures users’ trust through ethical ways of decision making.” It encompasses several core pillars:
  • Ethics: Fairness (non-biased, non-discriminating), Accountability (justifying decisions), Sustainability, and Compliance with laws and norms.
  • Explainability: Ensuring automated decisions are understandable, tailored to user needs, and presented clearly (e.g., through intuitive UIs).
  • Privacy-Preserving & Secure AI: Protecting data from malicious threats and ensuring responsible handling, processing, storage, and usage of personal information (security is a prerequisite for privacy).
  • Trustworthiness: An outcome of responsible AI, ensuring the system behaves as expected and can be relied upon, built through transparent, understandable, and reliable processes.

III. Main Themes and Important Ideas #

A. The Evolution and Current Shortcomings of XAI Research

  • Historical Context: The need for explainability in AI is not new, dating back to systems like MYCIN in 1975, which struggled to explain numerical model reasoning. Early efforts focused on “intrinsic interpretability” or “interpretable model extraction” (extracting rules from models), while “post-hoc interpretability” (explaining after the fact) was proposed as early as 1995 but initially neglected.
  • Modern Re-emergence and Limitations: The term “Explainable AI (XAI)” was popularized around 2016, but current research often “tends to ignore existing knowledge and wisdom gathered over decades or even centuries by other relevant domains.” Most XAI work relies on “researchers’ intuition of what constitutes a ‘good’ explanation, while ignoring the vast and valuable bodies of research in philosophy, psychology, and cognitive science of how people define, generate, select, evaluate, and present explanations.”
  • Focus on Technical Metrics over User Utility: Many XAI papers prioritize “internal validity like deriving guarantees on ‘faithfulness’ of the explanation to the model’s underlying mechanisms,” rather than focusing on how explanations improve human task performance. This can lead to methods that are “non-robust or otherwise misleading.”
  • The “Disagreement Problem”: A significant practical challenge where different XAI methods (e.g., SHAP, LIME) generate “conflicting explanations that lead to feature attributions and interpretability inconsistencies,” making it difficult for developers to trust any single explanation. This is reported as the most severe challenge by practitioners, despite being less frequently reported as an initial technical barrier.

B. The “Means to an End” Paradigm for XAI

  • Explanations as Decision Support: A core argument is that “explanations should be designed and evaluated with a specific end in mind.” Their value is measured by the “expected improvement in performance on the associated task.”
  • Formalizing Use Cases as Decision Problems: This framework suggests representing tasks as “decision problems,” characterized by actions under uncertainty about the state of the world, with a utility function scoring action-state pairs. This forces specificity in claims about explanation effects.
  • Value of Information: Explanations are valuable if they convey information about the true state to the agent, either directly (e.g., providing posterior probability) or indirectly (helping the human better integrate existing information into their decision).
  1. Three Definitions of Explanation Value:Theoretic Value of Explanation (∆E): The maximum possible performance improvement an idealized, rational agent could gain from accessing all instance-level features (over no information). This acts as a sanity check: if this value is low, the explanation is unlikely to help boundedly rational humans much.
  2. Potential Human-Complementary Value of Explanation (∆Ecompl): The potential improvement the rational agent could gain from features beyond what’s already contained in human judgments.
  3. Behavioral Value of Explanation (∆Ebehavioral): The actual observed improvement in human decision performance when given access to the explanation, compared to not having it (measured via randomized controlled experiments).
  • Critique of Idealized Agent Assumption: While explanations offer no additional value to an idealized Bayesian rational agent (as they are a “garbling” of existing information), they are crucial for imperfect human agents who face cognitive costs or may be misinformed or misoptimizing.

C. The Critical Role of Human Factors and Human-Centered AI

  • Bridging Algorithmic Complexity and Human Understanding: HFE is essential to “bridge algorithmic complexity with actionable understanding” by ensuring AI systems align with human cognitive abilities and behavioral patterns.
  • Addressing Unintentional Negative Effects (EPs): HFE provides strategies to anticipate and mitigate EPs, such as designing for “user reflection (as opposed to acceptance)” by promoting “mindful and deliberative (system 2) thinking.”
  • Case Study (Numerical Explanations): A study revealed that both AI experts and non-experts exhibited “unwarranted faith in numbers” (numerical Q values for robot actions), perceiving them as signaling intelligence and potential actionability, even when their meaning was unclear. This demonstrates an EP where well-intentioned numerical transparency led to misplaced trust.
  • Seamful Design: A proposed HFE design philosophy that “strategically reveal relevant information that augments system understanding and conceal information that distracts.” This promotes reflective thinking by introducing “useful cognitive friction,” for example, through interactive counterfactual explanations (“what-if” scenarios).
  • Iterative Design and Stakeholder Engagement: Addressing EPs requires an “iterative approach that allows insights from evaluation to feedback to design,” involving “users as active partners” through participatory design methods.
  • Reframing AI Adoption: HFE advocates for a mindset shift from uncritical “acceptance-driven AI adoption” to “critical reflection,” ensuring AI is “worthy of our trust” and that users are aware of its capabilities and limitations. This resists the “move fast and break things” mentality.
  • Human-AI Relationship in Decision-Making: For high-stakes decisions, AI systems should be seen as “empowerment tools” where the human decision-maker retains responsibility and needs to “justify their decision to others.” XAI is key to making the AI’s role clear and building trust.
  • “Justification” vs. “Explanation”: Some differentiate explanation (understanding AI’s intrinsic processes) from justification (extrinsic information to support AI’s results, e.g., patient history, contrastive examples). Both are crucial for human decision-makers.
  • Mental Models: Effective human-AI collaboration relies on humans developing appropriate mental models of the AI system’s capabilities and limitations. XAI should facilitate this “human-AI onboarding process.”

D. Practical Challenges in XAI Adoption and Solutions

  1. Catalog of Challenges (from Stack Overflow analysis):Model Integration Issues (31.07% prevalence): Difficulty embedding XAI techniques into ML pipelines, especially with complex models.
  2. Visualization and Plotting Issues (30.01% prevalence): Problems with clarity, interpretability, and consistency of visual XAI outputs.
  3. Compatibility Issues (20.36% prevalence): XAI techniques failing across different ML frameworks or hardware due to mismatches.
  4. Installation and Package Dependency Issues (8.14% prevalence): Difficulties in setting up XAI tools due to conflicts or poor documentation.
  5. Performance and Resource Issues (6.78% prevalence): High computational costs and memory consumption.
  6. Disagreement Issues (2.11% prevalence, but most severe): Conflicting explanations from different XAI methods.
  7. Data Transformation/Integration Issues (1.50% prevalence): Challenges in formatting or merging data for XAI models.
  • Perceived Severity vs. Prevalence: While Model Integration and Visualization/Plotting are most prevalent as technical hurdles, Disagreement Issues are perceived as the most severe by practitioners (36.54% rank highest), as they undermine trust and effective decision-making once tools are implemented.
  • Recommendations for Improvement: Practitioners prioritize:
  • Better Documentation and Tutorials (55.77% strongly agree): Clear, structured guides.
  • Clearer Guidance on Best Practices (48.07% strongly agree): Standardized methodologies.
  • Simplified Configuration and Setup (40.38% strongly agree): Easier onboarding.
  • User-Friendly Interfaces and Improved Visualization Tools: More intuitive and interactive tools.
  • Enhanced Integration with Popular ML Frameworks and Performance Optimization.
  • Addressing Disagreement and Consistency: Acknowledge disagreements and guide users in selecting reliable explanations.

IV. Gaps and Future Directions #

  • Lack of Standardization: XAI still lacks standardized definitions, metrics, and evaluation frameworks, hindering consistent assessment and comparison of methods.
  • Limited Empirical Validation: More situated and empirically diverse human-centered research is needed to understand stakeholder needs, how different user characteristics (e.g., expertise, background) impact susceptibility to EPs, and how explanations are appropriated in unexpected ways.
  • Beyond “Accuracy”: Future research should go beyond basic performance metrics to holistically evaluate human-AI relationships, including reliance calibration, trust, and understandability.
  • Taxonomy of EPs: Developing a taxonomy of explainability pitfalls to better diagnose and mitigate their negative effects.
  • Longitudinal Studies: Needed to understand the impact of time and repeated interaction on human-AI decision-making and trust dynamics.
  • Interdisciplinary Collaboration: Continued and enhanced collaboration among HFE, cognitive science, and AI engineering is crucial to develop frameworks that align AI decision-making with human cognitive and operational capabilities, and to address ethical and accountability challenges comprehensively.
  • Benchmarking for Responsible AI: Creation of benchmarks for various responsible AI requirements (ethics, privacy, security, explainability) to quantify their fulfillment.
  • “Human-in-the-loop”: Further development of this concept within responsible AI, emphasizing the human’s role in checking and improving systems throughout the lifecycle.
  • Trade-offs: Acknowledge and manage inherent trade-offs between different responsible AI aspects (e.g., robustness vs. explainability, privacy vs. accuracy).

V. Conclusion #

The transition of AI from low-stakes to high-stakes domains necessitates a robust and human-centric approach to explainability. Current XAI research must evolve beyond purely technical considerations to embrace principles from Decision Theory and Human Factors Engineering. The development of frameworks like CIU and the rigorous evaluation of explanations as “means to an end” for specific decision tasks are critical steps. Addressing practical challenges identified by practitioners, especially the pervasive “disagreement problem” and the occurrence of “explainability pitfalls,” is paramount. Ultimately, achieving Responsible AI requires a dynamic, interdisciplinary effort that prioritizes human understanding, trust, and ethical considerations throughout the entire AI lifecycle, ensuring AI serves as an effective and accountable partner in human decision-making.


AI: Explainable Enough

July 23, 2025

They look really juicy, she said. I was sitting in a small room with a faint chemical smell, doing one my first customer interviews. There is a sweet spot between going too deep and asserting a position. Good AI has to be just explainable enough to satisfy the user without overwhelming them with information. Luckily, I wasn’t new to the problem.

Image

Coming from a microscopy and bio background with a strong inclination towards image analysis I had picked up deep learning as a way to be lazy in lab. Why bother figuring out features of interest when you can have a computer do it for you, was my angle. The issue was that in 2015 no biologist would accept any kind of deep learning analysis and definitely not if you couldn’t explain the details.

What the domain expert user doesn’t want:

  • How a convolutional neural network works. Confidence scores, loss, AUC, are all meaningless to a biologist and also to a doctor.

What the domain expert desires: 

  • Help at the lowest level of detail that they care about. 
  • AI identifies features A, B, C, and that when you see A, B, & C it is likely to be disease X.

Most users don’t care how a deep learning really works. So, if you start giving them details like the IoU score of the object detection bounding box or if it was YOLO or R-CNN that you used their eyes will glaze over and you will never get a customer. Draw a bounding box, heat map, or outline, with the predicted label and stop there. It’s also bad to go to the other extreme. If the AI just states the diagnosis for the whole image then the AI might be right, but the user does not get to participate in the process. Not to mention regulatory risk goes way up.

This applies beyong images, consider LLMs. No one with any expertise likes a black box. Today, why do LLMs generate code instead of directly doing the thing that the programmer is asking them to do? It’s because the programmer wants to ensure that the code “works” and they have the expertise to figure out if and when it goes wrong. It’s the same reason that vibe coding is great for prototyping but not for production and why frequent readers can spot AI patterns, ahem,  easily.  So in a Betty Crocker cake mix kind of way, let the user add the egg.

Building explainable-enough AI takes immense effort. It actually is easier to train AI to diagnose the whole image or to give details. Generating high-quality data at that just right level is very difficult and expensive. However, do it right and the effort pays off. The outcome is an AI-Human causal prediction machine. Where the causes, i.e. the median level features, inform the user and build confidence towards the final outcome. The deep learning part is still a black box but the user doesn’t mind because you aid their thinking.

I’m excited by some new developments like REX which sort of retro-fit causality onto usual deep learning models. With improvements in performance user preferences for detail may change, but I suspect that need for AI to be explainable enough will remain. Perhaps we will even have custom labels like ‘juicy’.