WatchTree-19

Follow

WatchTree-19

Follow

1 follower · 2 following

Achievements

Achievements

WatchTree-19/README.md

Hi, I'm Sandeep

This is my open research page, feel free to reach out by email, if you feel I could help out! (Columbia University, alumnus), UK/US-based.

I place myself as an Methodological researcher with interest in idealogical expansion of emerging areas.

[ What I'm building

Independent writing on AI evaluation methodology, observability, and the structural overlap between quant trading and LLM eval.
Asymmetric-information solutions in ML evaluation, surfacing what labs know internally about benchmark noise and drift.
Calibration tooling for benchmark drift, distinguishing genuine model improvement from eval movement.

[ Currently working on

A foundational essay on production observability for LLM agents.
A weekly paper digest series on alignment, evaluation methodology, and AI safety research.
"Benchmark crowding": mapping factor decay in quant finance to benchmark saturation in LLM evaluation.

[ Around the web

Google Scholar: scholar.google.co.uk/citations?user=GF8g3_QAAAAJ
Email: sandeeprai_dsp@hotmail.com
Substack: substack.com/@sandeeprai1

Pinned Loading

llm-judge-calibration llm-judge-calibration Public

Measure how much your LLM judges actually agree. Inter-judge agreement metrics for LLM-as-a-judge evaluations.

Python
Tracer-Cloud/opensre Tracer-Cloud/opensre Public

Build your own AI SRE agents. The open source toolkit for the AI era.

Python 7.7k 1k
UKGovernmentBEIS/inspect_ai UKGovernmentBEIS/inspect_ai Public

Inspect: A framework for large language model evaluations

Python 2.3k 583
EleutherAI/lm-evaluation-harness EleutherAI/lm-evaluation-harness Public

A framework for few-shot evaluation of language models.

Python 13.1k 3.4k
pola-rs/polars pola-rs/polars Public

Extremely fast Query Engine for DataFrames, written in Rust

Rust 38.9k 2.9k
pixie-io/pixie pixie-io/pixie Public

Instant Kubernetes-Native Application Observability

C++ 6.5k 503