This is a description of my work on some data science projects, lightly obfuscated and fictionalized to protect the confidentiality of the organizations I handled them for (and also to make it flow better). I focus on the high-level epistemic/mathematical issues, and the lived experience of working on intellectual problems, but gloss over the timelines and implementation details.** The Upper Bound**One time, I was working for a company which wanted to win some first-place sealed-bid auctions in a market they were thinking of joining, and asked me to model the price-to-beat in those auctions. There was a twist: they were aiming for the low end of the market, and didn't care about lots being sold for more than $1000."Okay," I told them. "I'll filter out everything with a price above $1000 before building any models or calculating any performance metrics!"They approved of this, and told me [...] ---Outline:(00:27) The Upper Bound(02:58) The Time-Travelling Convention(05:56) The Tobit Problem(06:30) My TakeawaysThe original text contained 3 footnotes which were omitted from this narration. --- First published: October 1st, 2024 Source: https://www.lesswrong.com/posts/rzyHbLZHuqHq6KM65/three-subtle-examples-of-data-leakage) --- Narrated by TYPE III AUDIO).