Back to Blog

Arcana: From Raw Data to Real-Time Decision Making

ArcanaData EngineeringOpen Source

It was about a week ago that I soft-launched the Arcana GitHub repository. It’s been through a few iterations over the years, but ever since I picked up Advances in Financial Machine Learning by Marcos López de Prado, I’ve been thoroughly combing through each chapter and digesting his models. The most important piece of this multi-phase project has to be the data acquisition and the data enrichment.

What does that mean? Think of it like a restaurant. If you have about 20-30 customers a day, you can manage everything pretty well with a small staff. But let’s say some superstar is doing a show in town, and the doors bust open with a line of hundreds of people. You would have to attend every guest at the door, provide a wait time, seat them when a table opens up, and ensure every order is precisely recorded, cooked, and delivered without flaw. That’s the scale problem Arcana was built to solve.

The idea is that retrieving and storing large quantities of data needs to be fault tolerant in case of outages or crashes. It should be able to pick back up from where it left off and ensure that the next order is ready to go. That part is largely performant now, and I’m very excited to see it thriving. I processed roughly 83 million records going back six months from Coinbase, monitoring the ETH-USD trade pair.

Since then, the secondary function of Arcana has been to slice and dice these raw ingredients and organize them for Position5’s sequential phase, “Sigil.” Sigil will fulfill the end of chapter 2 all the way through chapter 5 of the source material. This is the data enrichment phase, ensuring that all data types provided meet the standards required to consume the dataset for near real-time decision making.

Ultimately, this type of data is very expensive to attain, both computationally and storage-wise. One of the main focuses of the project was open sourcing Arcana so that anyone interested in creating an automated trading algorithm (or ATA) could at least start off where Arcana leaves off. Sigil and sequential projects will not be open sourced due to the sensitive nature of these implementations. I do not want to be responsible for anyone bold enough to build their own trading algorithm and end up losing money. So I will continue developing, delivering updates, and publishing case studies on this project.

Since going live, I’ve been fine-tuning the processing step to ensure the command line interface (CLI) is clean and easy to understand from a UX standpoint, and making sure users can run the same scripts I have to jumpstart their own projects. A great thing to understand here is that you don’t actually need years of data to begin a project like this. Six months is sufficient to test theories and implement source materials.

I think I’m on my last iteration of the Arcana bar-building steps. The final piece will be summoning the daemon so that it’s processing data contiguously, with minimal impact to the performance and runtime environment, leaving room for some of the heavier and more expensive processes down the line. Stay tuned.