Back to Blog

Information Bars, Part 1: Why Standard Market Data Falls Short

ArcanaData EngineeringTrading

This is the first post in a four-part series about information-driven bars. What they are, how they work, why they broke in production, and how I fixed them. If you’ve been following the Arcana project, you already know I’m processing millions of trades from Coinbase. This series gets into the weeds of what happens after the data lands.

What Are Bars, Anyway?

Imagine you’re watching every single trade of Ethereum happening on Coinbase. Thousands per minute. That’s way too much data to analyze directly. So we group trades into bars, think of them like candles on a stock chart. Each bar summarizes a chunk of trades: the opening price, closing price, highest price, lowest price, and total volume.

The question is: when do you close one bar and start the next?

The Simple Way: Standard Bars

The obvious approaches are the ones everyone uses:

  • Time bars: new bar every 5 minutes (this is what normal stock charts show you)
  • Tick bars: new bar every N trades
  • Volume bars: new bar every N units of ETH traded
  • Dollar bars: new bar every $N of notional value traded

These work fine. In the pipeline, they all produce roughly 50 bars per day. No drama, no surprises. Time bars are especially popular because they’re easy to think about. Everyone understands “5-minute candles.”

But there’s a problem hiding in that simplicity.

The Blindspot

Standard bars treat all market conditions the same. A 5-minute window during a flash crash gets the same one bar as a 5-minute window at 3am when nothing is happening. That’s like taking photos of a car race at fixed intervals. You’d get crisp shots of the parked cars and blurry nonsense during the actual racing.

What you really want is a camera that shoots faster when things get interesting and slower when they don’t.

Standard bars use fixed intervals. Information bars adapt to what the market is actually doing.

The Smart Way

Marcos Lopez de Prado wrote a textbook called Advances in Financial Machine Learning that changed how I think about this problem. His argument is straightforward: when smart money is moving the market, you want more bars (finer resolution), and when nothing interesting is happening, you want fewer bars (less noise).

His solution: watch the imbalance between buyers and sellers. When one side overwhelms the other, that’s a signal. Close the bar.

This produces what he calls information-driven bars. There are six flavors, split into two families:

Imbalance Bars:

  • TIB (Tick Imbalance Bars)
  • VIB (Volume Imbalance Bars)
  • DIB (Dollar Imbalance Bars)

Run Bars:

  • TRB (Tick Run Bars)
  • VRB (Volume Run Bars)
  • DRB (Dollar Run Bars)

All six of these are implemented in Arcana. And all six of them broke in production. But I’m getting ahead of myself.

Why This Matters

If you’re building any kind of automated trading system, the bars are your foundation. Everything downstream, feature engineering, labeling, model training, inherits the properties of whatever bar type you chose. Feed a model time bars and it learns on a dataset that oversamples boring periods and undersamples interesting ones. Feed it information-driven bars and the signal-to-noise ratio improves dramatically.

That’s the theory, at least. In the next post, I’ll walk through exactly how imbalance and run bars work. The actual math, step by step, with none of the hand-waving you usually see in blog posts about this topic.