Aneesh Sathe


Beyond the Dataset

July 11, 2025

On the recent season of the show Clarkson’s farm, J.C. goes through great lengths to buy the right pub. As with any sensible buyer, the team does a thorough tear down followed by a big build up before the place is open for business. They survey how the place is built, located, and accessed. In their refresh they ensure that each part of the pub is built with purpose. Even the tractor on the ceiling. The art is  in answering the question: How was this place put together?

A data-scientist should be equally fussy. Until we trace how every number was collected, corrected and cleaned, —who measured it, what tool warped it, what assumptions skewed it—we can’t trust the next step in our business to flourish.

Image

Two load-bearing pillars #

While there are many flavors of data science I’m concerned about the analysis that is done in scientific spheres and startups. In this world, the structure held up by two pillars:

  1. How we measure — the trip from reality to raw numbers. Feature extraction.
  2. How we compare — the rules that let those numbers answer a question. Statistics and causality.

Both of these related to having a deep understanding of the data generation process. Each from a different angle. A crack in either pillar and whatever sits on top crumbles. Plots, significance, AI predictions, mean nothing.

How we measure #

A misaligned microscope is the digital equivalent of crooked lumber. No amount of massage can birth a photon that never hit the sensor. In fluorescence imaging, the point-spread function tells you how a pin-point of light smears across neighboring pixels; noise reminds you that light itself arrives from and is recorded by at least some randomness. Misjudge either and the cell you call “twice as bright” may be a mirage.

In this data generation process the instrument nuances control what you see. Understanding this enables us to make judgements about what kind of post processing is right and which one may destroy or invent data. For simpler analysis the post processing can stop at cleaner raw data. For developing AI models, this process extends to labeling and analyzing data distributions. Andrew Ng’s approach, in data-centric AI, insists that tightening labels, fixing sensor drift, and writing clear provenance notes often beat fancier models.

How we compare #

Now suppose Clarkson were to test a new fertilizer, fresh goat pellets, only on sunny plots. Any bumper harvest that follows says more about sunshine than about the pellets. Sound comparisons begin long before data arrive. A deep understanding of the science behind the experiment is critical before conducting any statistics. The wrong randomization, controls, and lurking confounder eat away at the foundation of statistics.

This information is not in the data. Only understanding how the experiment was designed and which events preclude others enable us to build a model of the world of the experiment. Taking this lightly has large risks for startups with limited budgets and smaller experiments. A false positive result leads to wasted resources while a false negative presents opportunity costs.

The stakes climb quickly. Early in the COVID-19 pandemic, some regions bragged of lower death rates. Age, testing access, and hospital load varied wildly, yet headlines crowned local policies as miracle cures. When later studies re-leveled the footing, the miracles vanished.

Why the pillars get skipped #

Speed, habit, and misplaced trust. Leo Breiman warned in 2001 that many analysts chase algorithmic accuracy and skip the question of how the data were generated. What he called the “two cultures.” Today’s tooling tempts us even more: auto-charts, one-click models, pretrained everything. They save time—until they cost us the answer.

The other issue is lack of a culture that communicates and shares a common language. Only in academic training is it possible to train a single person to understand the science, the instrumentation, and the statistics sufficiently that their research may be taken seriously. Even then we prefer peer review. There is no such scope in startups. Tasks and expertise must be split. It falls to the data scientist to ensure clarity and collecting information horizontally. It is the job of the leadership to enable this or accept dumb risks.

Opening day #

Clarkson’s pub opening was a monumental task with a thousand details tracked and tackled by an army of experts. Follow the journey from phenomenon to file, guard the twin pillars of measure and compare, and reinforce them up with careful curation and open culture. Do that, and your analysis leaves room for the most important thing: inquiry.


Nothing Ventured

July 10, 2025

The wave towered over me. Then the sound filled my ears. Not the calm breath of the waves;  but it was surf music. I was maybe 3. Song names and artist names were beyond me. There was only the blue-green wave and the twang of the guitar.

Image

I have chased music all my life. Just had to figure out the tools. The record player and the giant speakers taught my first lesson: pressing buttons was joy.  In my pursuit I learned in about records, tapes, CDs, mp3, flac, streaming, Napster, torrents, Winamp, VLC, blanks, CD-R/RWs, compression, bit rates, conversion, transfer, backups, VPN, networking, impedance matching, DACs, amplifiers, calibration, ARC, fibre, buying, licensing, and streaming in approximate order.

Image

I discovered that they were called The Ventures by accident. Late in the college years I watched Pulp Fiction and wanted all the music. This one wasn’t quite home but it was the right street. It was surf music.

Image

The hunt was on. Only a notion of the song and the confidence that I would know it when I heard it. I didn’t know the name of the album only that it had a big wave on the cover. It took me the better part of 6 months, on slow DSL, trawling all the sources I knew. Listening for that drum fade-in. Then one day I found it.

Image

It’s been decades since the record player stopped spinning. I’ve moved a dozen times, the records were lost. I am the default A/V guy and love the role. Now I live in one of the surfiest places on the planet, the current still pulls but I walk, don’t run.

Image

Rejected In Paris

July 9, 2025

I got told off by The Paris Review today. Maybe it wasn’t necessarily directed at me, but as they say in the, now old, new lingo, I felt attacked.  You see, recently, drawing on the well of inspiration that is history I succeeded in writing a poem, but not just any poem. I wrote a ghazal.

Those who know me for any amount of time are made aware of my taste for writing poetry. It’s usually pretty bad but I persist, cause why not. The OG is long gone anyway. The ghazal is an especially ambitious type of poetry to be taken up my someone with my modest talents. To make matters worse, as I learned today, the ghazal is really well suited for the Urdu. For all practical matters, I know only English.

Image

For anyone with any little interest in love and romance, being born in South Asia is a special kind of blessing. We are lucky to have had Urdu poetry reach its peak here. Urdu is perhaps the perfect medium to transmit mischief, passion, pain, longing, and the myriad other emotions which are handmaidens to big Love. Not any kind of expert, but all my life I’ve consumed shayari, sher, ghazals, whether in mainstream Bollywood or in sparkling corners of the internet.

Armed with the internet, full of inspiration, my trusty editor, Mir ChatGPT, in the other tab. I decided it was time to go all in. The Ghazal was to be written. It was, it follows all the rules, I even make a self reference in the last couplet as is the tradition, but it lacks oomph. A good sher, a good ghazal, should pierce you and make you blush for it’s andaaz, mischief and audacity.

Mine… well, you can read it here yourself, don’t forget to play the tiny desk concert, it is lovely.

Definitely read The Paris Review article for it’s a great take of view from a writer who transfers the styles of poetry in one language to another.


Social Internet – Lost and Hungry

July 8, 2025

When printing was invented, Europe suddenly had access to all the books that had existed until that point in history.  This included everything from mystical texts to astronomical observations. Having no guides to judge quality, some people went off on the deep end. Giordano Bruno is sometimes referred to as the forefather of modern cosmology. He was not. An extreme case, he took mystical click-bait, mixed it with the then-contemporary Copernican theories, and, without any data, invented the infinite universe. Eventually, culture adapted and people started to compare and organize all the data. This act of orienting and place-making led to the scientific revolution.Printing created too much information and we had to learn how to handle it. Today we are in a similar position.

Image

Still in the early days of the internet we sometimes lost the ability to tell signal from noise. Recently Hank Green posted this video where he makes his thesis that we aren’t addicted to content, but are instead starving for information. This strikes me as true.

The companies behind the social internet drown us in noise with just enough signal to keep you coming back. That signal, that hit, is a hint at information that provides orientation. Opportunities for conversation and belief challenging interactions are difficult to experience. As explored in a previous post, as humans are geographical creatures. Phones and the internet are a real part of our environment. Without sufficient places for orientation, we are left glassy eyed, lost. To see why that ‘information hunger’ feels so visceral, consider the simple ladder that links raw signal to a basic survival drive:

Signal → Information → Orientation → Biology

  1. Signal is any pattern in the environment—visual, auditory, textual—that stands out from background noise. On social platforms this might be a headline, a notification badge, or an unexpected data point.
  2. Information is signal that has been parsed and interpreted. Your brain (or a community) attaches meaning and relevance: “This headline matters to my work,” or “That data point contradicts my belief.”
  3. Orientation is what information enables: a clearer, updated internal map of “where I stand and what to do next.” It answers “How does this fit with what I already know?” and “Which way should I move—intellectually, emotionally, physically?”
  4. Biological need is the evolutionary pressure behind all of this: organisms that build accurate mental maps survive. Humans feel discomfort when our maps are fuzzy (disorientation) and relief or pleasure when new information sharpens them.

A few years ago, my corner of the internet got into waldenponding and promptly logged off. Just kidding. The failure of modern waldenponding makes it clear that this move of turning away from the social internet is not the answer. That would be like giving up on books because there were too many of them. The internet and the social internet in general do provide opportunities Instead, engaging with curiosity allows us to orient ourselves. Having an information shaped content diet opens up a path to a healthier mind. While society learns to put on the right kind of controls as we have on sugar and tobacco, how can we learn to have fun on the internet?

The hunt for knowledge and discovery, even of trivia is immensely enjoyable. Socratic problem solving is a team sport. Everyone has narrow views of the world and our thinking may be based on shaky knowledge. Social internet has so far made our eagerness to win the top emotion in online discourse, Socratic inquiry can transform that into collaborative inquiry. To arrive at better knowledge we must be willing to talk, listen, challenge, and accept. It is only by comparing notes that we open up a topic, a space, for exploration. Each of us and our thoughts are a place in the world. Places create orientation and orientation has the potential to create progress. While progress may not be guaranteed, not engaging in inquiry guarantees disorientation and formlessness.

While printing turned information into data, the social internet has turned information into noise. Social internet companies have tuned our culture to produce low signal-to-noise “content”. As Hank Green put it, we do hunger for information. We hunger because information is orientation. Orientation is a primal biological need to help us navigate our physics-virtual environment. The internet is a place where people share freely and welcome warm interactions. To turn away from the internet because of the culture tuning is the wrong move. The internet has too much to give, engaging from a posture of inquiry is the way.  Inquiry satisfies that inner need for place creation and orientation.

The dialogue is the real post.


The Mind as Semi-Solid Smoke

July 7, 2025

This post continues the series on Socratic Thinking, turning the space-and-place lens inward to examine the mind itself. Human minds can be thought of as an imperfect place with the ability to create their own insta-places to navigate ambiguity.

Image

Exploration in any real or conceptual space needs navigational markers with sufficient meaning. Humans are biologically predisposed to seek out and use navigational markers. This tendency is rooted in our neural architecture, emerges early in life, and is shared with other animals, reflecting its deep evolutionary origins 1,2 .  Even the simplest of life performing chemotaxis uses the signal-field of food to navigate.

When you’re microscopic, the territory is the map; at human scale, we externalise those cues as landmarks—then mirror the process inside our heads. Just as cells follow chemical gradients, our thoughts follow self-made landmarks, yet these landmarks are vaporous.

From the outside our mind is a single place, it is our identity. Probe closer and our identity is nebulous and dissolves the way a city dissolves into smaller and smaller places the closer you look. We use our identity to create the first stable place in the world and then use other places to navigate life. However, these places come from unreliable sources, our internal and external environments.  How do we know the places are even real, and do we have the knowledge to trust their reality? Well, we don’t. We can’t judge our mental landmarks false. Callard calls this normative self-blindness: the built-in refusal to saw off the branch we stand on.

Normative self-blindness is a trick to gloss over details and keep moving. Insta-places are conjured from our experience and are treated as solid no matter how poorly they are tied down by actual knowledge. We can accept that a place was loosely formed in the past, an error, or is not yet well defined in the future, is unknown. However, in the moment, the places exist and we use them to see.

Understanding and accepting that our minds work this way is a key tenet of Socratic Thinking. It makes adopting the posture of inquiry much easier. Socratic inquiry begins by admitting that everyone’s guiding landmarks may be made of semi-solid smoke.


1Chan, Edgar, Oliver Baumann, Mark A. Bellgrove, and Jason B. Mattingley. “From Objects to Landmarks: The Function of Visual Location Information in Spatial Navigation.” Frontiers in Psychology 3 (2012). https://doi.org/10.3389/fpsyg.2012.00304

2Freas, Cody A., and Ken Cheng. “The Basis of Navigation Across Species.” Annual Review of Psychology 73, no. 1 (January 4, 2022): 217–41. https://doi.org/10.1146/annurev-psych-020821-111311.