I've been wrestling with sensor data pipelines for months, and every approach I tried ended up eating RAM like it was going out of style. The usual path was pandas into some columnar format, but I kept hitting this wall where loading a week of high-frequency data would just balloon the process to swap territory. I looked at the benchmarks everyone quotes, and honestly, most of them are designed to make the author's choice look good. They pick operations that favour one library, then act shocked when it wins. So I actually sat down and modelled the real cost: my typical workflow (rolling windows, resampling, some aggregations) against three stacks. DuckDB with Parquet files came out ahead on memory by a factor of four, and the query times were competitive enough that I wouldn't be waiting around. The setup cost was minimal (spent maybe six hours refactoring my pipeline), and now I'm offloading computation to SQL instead of pulling everything into RAM first. The trade-off is I lose some of pandas' flexibility for quick exploratory stuff, but that's fine because I do exploration on a sample anyway. Total time investment feels worth it when you measure it against the infrastructure headaches I was creating. Running the same test suite now, and the memory profile is stable. Maybe this finally fixes it.