Arcana's First Overnight Run: 83 Million Trades and Counting
Last night, Arcana ran its first real overnight test. Twenty hours of processing. Six months of ETH-USD trade data from Coinbase. Over 83 million trades ingested, persisted, and ready for analysis.
The framework did exactly what I designed it to do.
Data backfilling worked well: persistent saving, idempotent operations, no duplicate processing. The ingestion pipeline pulled historical trades, saved state along the way, and could resume cleanly if interrupted. That’s not flashy, but it’s the kind of engineering that separates a weekend project from a real system.
Where things got interesting was bar construction. Arcana builds 11 different bar types, everything from standard time-based bars (5-minute, 1-hour) to information-driven bars inspired by Marcos Lopez de Prado’s work: tick imbalance bars, volume imbalance bars, dollar imbalance bars, and their run-length counterparts.
The standard bars came through clean. Time bars processed in seconds. Tick, volume, and dollar auto-calibrated bars each built thousands of bars in under nine minutes. But the information-driven bars, particularly the run bars (TRB, VRB, DRB), exposed computational errors in the E0 calibration formulas. The expected bar lengths were off, producing too few bars from the dataset.
Five patches went in to address this:
The core fix was recalibrating E0 using geometric run length instead of the imbalance formula, a closer implementation of Prado’s methodology. On top of that: a --rebuild flag for clean bar reconstruction, auto-calibration for tick and volume thresholds, a TOML config system for bar specs, and smarter E0 initialization across all information-driven bar types.
The best part, and this is the real payoff of building the system right, is that all 83 million trades are already saved. The backfill doesn’t need to re-run. Only data from after last night’s successful ingestion needs to be caught up. The bar construction patches get applied, bars get rebuilt from persisted data, and we move forward. That’s the scientific approach: preserve your dataset, iterate on your analysis.
The daemon is running now, pulling new trades every 15 minutes, though I’m already looking at tightening that down to every minute. But we move things brick by brick. Arcana is alive.