Listen in on Jane Street’s Ron Minsky as he has conversations with engineers working on everything from clock synchronization to reliable multicast, build systems to reconfigurable hardware. Get a peek at how Jane Street approaches problems, and how those ideas relate to tech more broadly.
In Young Cho thought she was going to be a doctor but fell into a trading internship at Jane Street. Now she helps lead the research group’s efforts in machine learning. In this episode, In Young and Ron touch on the porous boundaries between trading, research, and software engineering, which require different sensibilities but are often blended in a single person. They discuss the tension between flexible research tools and robust production systems; the challenges of ML in a low-data, high-noise environment subject to frequent regime changes; and the shift from simple linear models to deep neural networks.
In Young Cho thought she was going to be a doctor but fell into a trading internship at Jane Street. Now she helps lead the research group’s efforts in machine learning. In this episode, In Young and Ron touch on the porous boundaries between trading, research, and software engineering, which require different sensibilities but are often blended in a single person. They discuss the tension between flexible research tools and robust production systems; the challenges of ML in a low-data, high-noise environment subject to frequent regime changes; and the shift from simple linear models to deep neural networks.
Welcome to Signals and Threads, in-depth conversations about every layer of the tech stack from Jane Street. I’m Ron Minsky. All right, it is my pleasure to introduce In Young Cho. In Young has been a researcher and trader here at Jane Street for almost a decade now, and we’re going to talk a little bit about what that experience has been like and also about In Young’s more recent move to join the research group here, where she’s leading a lot of the work that we’re doing in the machine learning space. Thanks for joining.
Thank you for having me.
To kick things off, it would be interesting to talk about how you got to Jane Street in the first place.
My path to Jane Street is actually pretty standard for a lot of the traders and researchers that we have here in undergrad. I was very confused about what I wanted to do.
Like many of us,
I was convinced that originally that I wanted to be a doctor following my mom’s path, but I had also been quite active in competition math in high school. So I had a very strange blend of biology, organic chemistry, and advanced math classes in my freshman year of college and for various reasons that are probably not that interesting to go into, I realized that medical school was not going to be a great fit for me and I had a bit of an identity crisis where I tried to figure out what I wanted to do. So maybe that’s not that standard. Fast forward a little bit with some time off in Korea where I explored a variety of different internship paths via some short-ish internships in various industries where I concluded that I absolutely wanted to never consider industry. I ended up back in sort of the recruiting cycle back in college where a lot of my smart friends told me that they really enjoyed the application and the interview process at Jane Street and that I should just check it out. The interview process was extremely challenging. I remember emerging from the onsite day that I had where my brain was just completely dead. I don’t know if you ever have this experience where if you’ve been thinking really hard, your head overheats. I had that in the middle of the day.
Literally overheats.
It just literally overheats. It’s like a computer. I kind of strangely liked that experience and even though I didn’t really know what finance is or what to expect from this mysterious proprietary trading company, I wanted to give the internship a shot. So I guess that was pretty long-winded. In short, I came straight out of undergrad after interning at Jane Street in my sophomore year and despite not knowing anything about quantitative finance when I came in, I’ve really enjoyed the everyday of the job here. I feel like I learn something new every day.
I’m kind of curious about the internship itself. What was it about the character of the internship that made you think, oh yeah, this is the thing I want to do now?
Just how dynamic it is. I would just come in every day and I was basically tasked with a new type of puzzle or a new type of skillset that I needed to learn. There were people who were learning these new things with me and I think there was just a tremendous support network of people who were very invested in making sure that I was having a good experience and that I was making the most of my own human capital. So that says nothing to the actual work itself. I mean the work itself was pretty interesting as well, and I think more interesting than I had thought just coming into the internship, but the majority of what drove me was just how well-meaning and how smart and also intellectually humble the other interns were and also what I thought would be the support network that I had once I joined full-time.
So when you started at Jane Street full-time, what kind of work did you end up doing? Where did you land in the organization and what was your job?
The title that I had was as a quantitative trader and that does not adequately capture it at all. I think the job of a trader is very varied and so in my first year I did all sorts of things ranging from learning how to use OCaml, which is the language that we use in-house; learning how to use VBA as the backend programming language… to Excel
Another programming language we’ve used a lot in-house. That’s true
Learning all about bash and SQL and about financial terms, learning how to pick up the phone and respond to brokers and that was terrifying. I have many, many mishaps trying to use the phone at work. A funny story was there was a call coming in from an external client and somebody, folks on my desk said, you’ve been here for a month, you can pick up the phone. And I picked up the phone and I froze and eventually I said, “ah, this is Jane Street” rather than “Jane Street—This is In young” and I needed to grab a hold of somebody else and I basically needed to pass the phone to somebody else on the desk. I was told that I was supposed to park the phone and I had no idea what that meant. And at that point I basically went with it and thought park was a magic code word. I put down the phone and yelled out the word park.
So you hung up on them?
No, no, no, because yelling out the word park doesn’t do anything. And my mentor sitting next to me kindly said, there’s a button on your phone called Park, that’s what you press. I said, okay, I will go and do that. So there’s that kind of stuff. Learning about the basics of trading.
Can we go back to the broker thing? I feel like people’s mental model of Jane Street is often very focused around having heard things about the technology. And so they imagine a room full of people where all they’re doing is sitting down and writing programs and then the programs do trading. And so the whole thing of getting a call from a broker, why do we take broker calls? What is even going on here at the business level?
So that was not very standard for Jane Street. It was a new business that I think came about in 2013. I think prior to around 2013, the main way that we interacted was anonymously through electronic exchanges. Of course we would call people up or if we had to talk to our prime brokers or to various people where we needed their services, we would of course talk to people in person, but it wasn’t the case that we were explicitly interfacing with customers or clients or anything like that. So one model that you might have of the financial markets is that everything is kind of anonymous and everything happens on exchange and that should lead to fair and efficient outcomes. And for the most part I think this is true, but there are cases where it’s helpful to be able to know who your end counterparty is and to be able to negotiate the trade or the service that they would like from the market prior to them having to go out into the market and buying a stock or selling an ETF themselves.
So one example is if you have a pension service for some state and they have a quarterly rebalance that they need to do, pension services can be quite large. And so even if what they’re doing is doing a pretty vanilla rebalance of buying an index fund, the dollar quantities involved might be quite large. And you can imagine as a market maker on an electronic exchange, that being on the opposite side of a flow that is millions of dollars can be quite disconcerting. You’ll have on your computer program that is designed to interact on the exchange, sold a share of XYZ stock, and then you’ll continue to sell on the exchange and you’ll basically say, what am I missing? How large is the eventual size going to be? And in that uncertainty, a thing that you might say is, well, it might just be the case that I’m getting something very wrong and I’m going to make sure that I do something a little bit more conservative while I try to figure things out.
And this I think ends up with less good execution for the retirement service or the pension service who’s trying to do a trade that is quite benign and quite innocuous. So if we know from the get-go who we’re interacting with, it’s much easier for us to have confidence and to also give financial advice of how you might go about executing a trade in the index fund. So I think back in 2013 we started more earnestly building up a business where we were onboarding clients and in the flow of being able to make a market directly to the end client, there’s many types of people that we would pick up the phone for, but you can imagine that most phone trading is with these counterparties that we know and have onboarded where there’s some bilateral agreement that we will be out there making markets for them and also in non-live cases offering advice on how to think about the executions that they might want.
All of this highlights the odd role of information in trading. You might think, oh, the world’s anonymous electronic markets, that’s the best way to execute things because it’s a fair and equal platform. But the pension service is a great example where there’s an extra piece of information which is actually useful to all involved to know, which is that the pension company is not a highly informed trader who’s making a trade with you, they know stuff that you don’t know. But it’s just someone who’s making a big transaction because it’s part of their ordinary business. It doesn’t reflect some information that’s private to them about the markets. And so there’s a weird way in which knowing where it’s coming from can allow you to provide better services—in particular just like a better price—in a very simple way, you can charge them less money because who’s coming and the size of the trade that they’re doing. Okay. So that says a little bit about why this whole system of people calling each other on the phone even exists in the world’s modern markets. And so why we took part in it, your whole work wasn’t just answering the phone for brokers. What other kinds of things were you working on?
Yeah, as I kind of mentioned, I learned how to code poorly and wrote a lot of different systems and apps on the desk that were very useful to my day-to-day work. It might have to do with making it easier to visualize the types of trades that I’m doing on a day-to-day basis, whether that’s on the market, open the market close or during the day or tools that will allow me to sort of go back and analyze the trades that we’ve done or the opportunities that we had to see if there’s more trades that we could propose based off historical patterns that we’ve seen. And so at a high level, a way that you could describe these two kind of workflows is like software engineering and research and very crudely the picking up the phone I think people might describe as live trading, but in terms of the day-to-day these would all sort of blend together and it was pretty cool that I got to basically hone my skills and learn a lot about each of these individual items and see how it all ties together.
And on that point about live trading hard of, it’s about literally answering the phone and talking to a broker, but I think of that as a little more broadly about live realtime decision making and some of that has to do with some individual trade with an individual concrete person, but a lot of it has to do with thinking about all of the systems and all of the different ways that we have of trading and understanding what’s going on in the markets and figuring out how to adjust and influence the behavior of those systems. The intermixing of those three things that I think are really critical part of what makes Jane Street good at trading and what it means to trade in a modern way in the markets.
Yeah, it’s very confusing that we conflate all three of these things in the same role and title of trader and it is I think a very, very common source of confusion for clients interfacing with Jane Street for what a trader is here. I think we’ve embraced it and think that the fuzziness allows us to make sure that we’re placing individuals in the role that they’re going to most shine at or be best at. But it’s certainly very confusing if you’re just trying to get a sense of what Jane Street is before you’re familiar with us.
And it’s not like the terminology we use aligns with the terminology of the rest of the industry. When people say trader, they often mean very different things.
It’s not just us though. I think a lot of places have very bespoke and confusing names and titles for all of these roles and honestly it does make it pretty hard to navigate the overall landscape.
So a lot of your work today is focused on research and today you’re actually on the research desk, but I think research, as you said, has been a part of your work the whole way through. Can you give us a little more texture of what does it mean to do research on a trading idea?
Yeah, so I’m going to attempt to break down the process of research into a few steps. So I think there are roughly four stages of research. One is the initial exploration stage. So what might be important here, you need to have highly interactive tooling that you’re familiar with that can display information in a very rich, performant and easily understandable way such that you can rapidly iterate through various hypotheses and ideas that you might want to explore further. And
When you say tools like tools for what should I imagine like a Bloomberg terminal for pulling up the candlesticks on the stock, what are we doing with these tools?
Let me set the stage as maybe you want to be able to do trading research where you are trying to look at how to best interact with this pension service that we mentioned. And so one of the things that you might want to know is what are some of the trading ideas that you might have? What is the appropriate market that you should show to the orders that this client might have and what can we do infrastructurally in terms of our understanding of financial markets to better equip ourselves to be able to quote this order for the initial exploration stage here? I guess this would correspond to having a Bloomberg terminal and basically saying, who is this counterparty? What does their portfolio look like? Can we get a sense of how often their portfolio churns? And one thing that is pretty handy on a Bloomberg terminal is that you can look at people’s 13F or 13G or 13D filings and tjust get a sense quarter over quarter how people’s positions have changed. And so you have some sense of the order flow that you might be seeing from this pension service. You’re probably not seeing all of it, but you can also from the data that is available from the tooling like a Bloomberg terminal, get more data points for what happened for instance on days where we weren’t the people who were showing the end order to the client.
And you basically want to know what their positions are and what the flow might be like that tells you something about what the likely market impact is of the trading. The bigger the impact is, the more you have to move the price and essentially charge them for the fact that all of this trading is going to move the market a lot.
Yeah, that’s right.
Great.
Cool. So a second interrelated step is maybe you have a hypothesis that the majority of the trading that this pension service does is pretty benign. It doesn’t move markets a lot, but they have one position in their portfolio that is a little bit weird and whenever they trade it moves markets a lot. And a thing that you might want to explore as a hypothesis is when Jane Street isn’t the person who receives this order, there’s more outsize impact or we might be able to anticipate without getting the order itself when the client might be executing and to basically seek to give a better end price to the pension service. And so an important step here is data collection. Things that you need is the dates that the client might have traded, the sizes that they might have traded, as well as the times that they traded and the returns during those times where there were large trades. There are obviously other pieces of information that you might need and you need a way to robustly collect this information in a way where you’re going to be able to do a study or build a model on it.
Right? There’s a ton of infrastructure you need around this to do it well because the underlying data sets kind of suck. There’s all sorts of weird things. There’s a stock split and a ticker name change and just all sorts of other problems and errors in the data that you need to figure out how to adjust for and paper over as part of generating these high quality data sets that let you do research. And if it’s just full of these mistakes, then garbage and garbage out, your research isn’t going to get you anything good.
Absolutely, and you can start with the simplest possible case where in the data generation phase here, you could do it all manually and you could check things very, very carefully. And I think that’s actually a great way to build intuitions about all the ways that you could experience errors instead of starting from the get go from a dataset with millions if not billions of rows, where it’s difficult to get an intuition for what sorts of checks matter. But I agree that data collection is a very tricky and very, very important task. So once you have the data, you kind of get to do the step that I think most people think of as research and is kind of the most fun. You basically say, I have this data, there is a thing that I’m trying to predict. And you also have to be kind of creative about what sorts of data or features matter and what sorts of things you’re trying to predict. I guess at Jane Street we often refer to these as responders. I think in the external world these might be sort of referred to as dependent variables. I think responders is kind of a term that is not often used outside of Jane Street’s walls.
I think predictors and responders is I think just older statistical lingo. I think I was using predictors and responders 20 years ago when I was doing bad trading related research and I do not think I made it up, but
Cool. There’s a fun mailing list at Jane Street that I refuse to subscribe to called social polls and it was very divisive like what you should refer to these predictors or features, what is the proper nomenclature? So I think at Jane Street there are many camps of beliefs on what the correct terminology is. So you have the data and you want to make some predictions and you have to basically figure out the way that you want to transform the data in order to make predictions. You have a lot of options available to you here for the toy case that I mentioned where you’re working with a hand scraped dataset with a handful of data points in terms of the complexity of the model and the expressability of the models versus how much you’re likely to overfit. You probably want to go as simple as possible.
And so a technique that you might want to have is something like a linear regression, but for different types of data sets and especially as the feature set grows or the interaction effects that you’re trying to model get increasingly more complex, a linear regression is probably not going to cut it and you’re going to have to rely on more sophisticated machine learning methods, including the deep learning methods that I think comprise the majority of my puzzling and thinking these days. I mentioned that it’s four steps for research, and so I think I’m cutting some steps out, but I would be remiss if I didn’t mention I think a very, very important step in research, which is the productionization step. Once you have a model in predictions, that’s great and it’s really cool that you’ve been able to get the idea that you had all the way to the stage of having predictions that hopefully you’ll believe out of sample, but in a lot of ways the battle is only halfway done at that point.
You need a way to make sure that you have a way to express these predictions in the markets. So you need to kind of think about what is the most effective way to express these beliefs and you also need to make sure that you have the tech available in order to reliably execute the trade that you thought that you were executing. And so there’s a lot of pitfalls along the way here and the more complex the trade gets the more work you need to put into make sure that you don’t run into various issues in this productionization process.
Especially if what you’re doing is you’re doing research using one set of tools and you’re doing your actual production implementation with a completely different set of tools. The research toolsmight be an Excel spreadsheet or a Python notebook or some other thing, and then your production system is maybe some trading system that’s written in OCaml that has a totally different way of doing the mathematics and expressing the configuration and all that. I feel like this kind of from the very beginning, we’ve had two different styles of research that we’ve used. One has been one where you do the productionization almost from the beginning where you have more or less a kind of automated trading system. You have the code for that, and then the process of research really looks like simulation where you go and build some kind of simulated framework to run something that looks like that final study and you can sort of see what at least the simulated behavior looks like. And then the other approach, which is mostly what you’re talking about is where you have some kind of backed off abstracted simplified version of the kind of data and problem from what you would get in real time and trying to build and validate your hypothesis there. And then later you figure out how to do the transformation into the thing that you run in production. I’m kind of curious how you think about the trade-off between these two different approaches.
I think it’s difficult if you’re going into a completely new area of trading to have conviction on what the end production system should look like and to build something with all the bells and whistles and all the features that you need that you’re ultimately going to need in order to be able to trade at scale. And in those cases where you’re doing a lot of exploratory work or the various ideas that you have are somewhat low conviction and unlikely to succeed, it just makes sense to prioritize flexibility and versatility of iterating through hypotheses above having robust production systems. Now of course, this bites us, has bitten us in many ways where a lot of the studies that we’ve done where it’s been fast markets and quick and dirty studies that were with tooling that was never meant to be used for production has led to challenges in terms of how much robustness we can expect in production.
And this is versus cases where we have a very mature system and the types of research that we’re doing are more what you might expect from robust data science. You’re running experiments, you’re trying to add a couple parameters to the hundreds of parameters that control a very complex trading system that you have. And oftentimes when you are operating in this way, it can be pretty frustrating because you’ll have to learn a whole library ecosystem that exists for the production system in order to make even a very small change. And I wouldn’t say that one style of research or one style of work is inherently better or worse. I think it just kind of depends on the maturity of the work that you are sort of adding onto as well as the newness of the market that you’re exploring and also kind of the style of research that you’re doing. Yeah, I mean I think that’s been a pretty difficult tension to navigate at Jane Street because there isn’t an obvious answer of should you preference a very, very robust system that’s going to be a little bit hard to learn and have a steep learning curve to or should you have a system that’s going to be a nightmare to administrate, but gives a lot of flexibility to the people who are doing the exploratory analysis.
And it’s something that I think different desks just operate differently on. There are some desks that are highly algorithmic and have under their hands every day all the time the code of the systems they’re working on. And then for at least some of the kinds of research you want to do, doing an incremental change to that system is the easiest most natural step to evaluating some new idea. If you have control over the trading system and the feature engineering tool set, I want to add a new feature, I’m going to write the code that actually extracts that information in a direct way. And then you have a thing which sort of has that tight connection to reality there. Even in that world, I think doing all of your research by simulation has a lot of downsides. This is a pure performance problem of there’s an enormous amount of raw data that needs to be clawed through in real time.
And so having ways of taking that complex dataset and distilling it down to something which you can just in a more lightweight way operate with is really important. I think a thing that’s maybe obvious but I think really essential to research going well is having a short iteration time. You really need a short time to joy, whether it’s like you have some idea you can try it out and just at runtime, it has to happen reasonably fast so you can try new ideas and see how they work and discard the bad ones. In research, most of the ideas are bad. And you’ve been involved in research as part of the trading work you’ve done for a long time now. I’m curious how you feel like the character of that research work has evolved over the 10 years that you’ve seen it here.
Yeah, so the image that I have in mind is, I don’t know if you’ve seen that meme of the, well, it’s not really a meme. You have various stages of drawing and you might be trying to draw a horse and at the initial stages it’s like a very bad outline of a horse and you can recognize it but it’s not filled in. And as you go you’ll add some colors and textures and by then there’s a beautiful painting of a horse and you just need to kind of go through all the stages. And I can’t say that I’ve been through it all, but I feel like I’ve been through many of the stages that we have where when I started at Jane Street, there were sufficiently I think few people and our infrastructure while developed was sufficiently developing that we were kind of in a stage where the types of studies that we were equipped to do were a lot more crude.
But as a result, you could actually understand all stages of the research process very well. You would understand what ideas you’re trying to explore. You would have a very, very expert command of the data analysis tools that you had on hand, which was, I think at the time mostly just a Bloomberg terminal. You would understand very deeply what each column in a dataset meant and what the issues might be and make sure that you have as high quality of a dataset as possible. You would note precisely how to implement a model or what was going into the regression technique that you’re using. And if push came to shove, I think you could just implement that model from scratch and for the production process you would be intimately involved in sort of writing the code to be able to monetize the trade and be able to trade in the market.
And for me, I think it was actually extremely educational and nice to be able to be so hands-on with all parts of the process. Obviously I did not do a very good job with any of the parts of this process, and it’s pretty embarrassing if I go back and look at some of the work that I was doing versus what we’re capable of doing now. But just kind of understanding at a very, very deep level what each of these steps takes I think was a very, very good framework for how to understand things as things have gotten a lot more complex and I think capable at Jane Street.
So I’m curious now that you just said there are ways in which you learned and the in some sense brutal simplicity of the tool set and the bounds on the complexity of the things that you could do that made it a more useful kind of educational experience for understanding at an intuitive level what was going on with all of this. How do you deal with that when you are responsible for training new researchers and traders who now have a much richer and more complex tool set at their disposal? How do you help them develop that kind of tactile understanding of the markets?
I think it’s hard. I do think that in the training and education program that we have, we do try to build it up from the basics as much as possible. I think it’s just very, very difficult for you to understand things at the maximal level of generality from the get-go. My opinion has always been, if you’re trying to understand the concept, the best way to do that is not to read a textbook or read a paper. Oftentimes the notation that you see for math papers is like the most condensed form that you might have or an abstraction of a concept that might have taken many examples to motivate that will allow you to remember that concept if you see this formula. But it’s not the right way to experience things from the get-go. So when we’re teaching new traders about how to think about trading, we intentionally don’t throw them into a, here’s some simulated dollars, please go trade in the market and try to make a good strategy.
We intentionally try to distill things down into components and to build up an understanding of why trading dynamics might arise and then incrementally add more parts. So similarly for training up new researchers, anytime that we’re introducing a new topic, we’ll say, Hey, this is a data visualization library that we have. Here is an example plot. Here are some interesting features that we think you should be able to observe with the plot. Really spend 30 minutes with us and try to build some intuition for how to interpret the curves that you might be seeing from this data visualization tool. And I mean, I think that initial trying to orient yourself with a plot can feel pretty slow and very confusing, but down the line when you have to read hundreds and hundreds of these plots rapid fire later on, you kind of need everything to be second nature in terms of how you interpret the results. Having a slowed down fundamentals-based introduction to the tooling is really critical for you to have a fighting chance at keeping up with the rapid stream of information that we have here.
I think in general of Jane Street is a place that has absorbed the lesson that tools are really important. In my own work, this has showed up a lot over the years in thinking about tools around software development and we’ve spent an ungodly amount of time just building tools that we think make it easier and better for people to write the software they need to write. You sort of talked about some of the limitations of the early tools of Bloomberg terminals and Excel, which we still use of course those are still important tools. What do you think of tools that have gained more prominence more recently? What are the central tools around research these days?
Oh wow. I feel like in every step of the research process that I mentioned, we just have so much more robust tooling that has so many cool features and I just feel very spoiled every time I try to do a research study. Now, anything that I could have done in a week 10 years ago I think would take 10 minutes. Now it’s just wild. Obviously this comes with a complexity burden, but things like if you’re trying to collect data, the types of data that you might want in a financial study includes things like returns or price data where you have thought carefully about corporate actions and what might happen when a company changes its business objective or its ticker or it decides to undergo a stock split. What happens? How do you know what the set of stocks that existed 15 years ago was a classic trap that you could fall under is survivorship bias where you basically say, I’m just going to look at the stocks that exist today, nix out the ones that IPO’d between 15 years ago and now and just assume that’s the universe and that would be a pretty bad way to go about it.
But if you were told, I want to know all of the stocks that were trading on the US stock exchanges in 2005, that’s actually a pretty non-trivial task that would take you a while. And we have tooling that can just get it for you with all the annotations and notes that you might want and with some guarantees about accuracy. That’s very cool. You might also want to be able to annotate your dataset with other metadata. What was the market capitalization of this company? What is the volatility of this company? You might also want weird other data that we have that we’re ingesting into our walls that you want associated at the right times with the right symbols.
Right? For example, we have our own historical, we’ve computed what we think the values of things are and we’ve just recorded that and made that available for joining into the data sets.
Yeah, exactly. And I think we just have a robust network of tooling to be able to do all of these things where the vast majority of traders and researchers here just don’t need to think about the details, and that’s very scary. I think people maybe put a little bit too much faith into the idea that all these tools have been vetted and these are blessed values and people always say the mantra of trust but verify. You should assume that people are doing the best to give you the highest quality data possible, but you should also do your own checks and help make the tooling even better if you find anything that’s a little bit suspect or off with the data that you’re getting.
So a lot of the tooling you talked about there is about dataset generation and some of that’s about sourcing data from the outside world and this enormous amount of work we do around that. And some of it’s around infrastructure for slicing and dicing and combining that data in useful ways. There’s also a bunch of stuff in the form of visualization tools. What kind of capabilities are important there?
Just things like what events happened in the market at a fidelity and granularity that is greater than what you might get on Bloomberg, which is already pretty granular, but you’re just never going to be able to inspect things at the sub-second level in a particularly satisfying way on Bloomberg. And so we have tools that help us visualize market data and that’s pretty cool. There’s other types of visualization, tooling of graphs and plots are great. I think they’re a lot more easily digestible than numbers on a spreadsheet a lot of times. And a thing that you might want to be able to do is to be able to make arbitrary plots that are both historical or real time of anything that you might ever want to plot. I think having a flexible toolkit for this is an example of a general framework and a tool that exists that I think people have used in very, very creative ways.
I think old school Jane Street was like you’d get your data in some Excel spreadsheet and you better not have more than a million data points. You could only fit a million rows in an Excel spreadsheet, and then you literally use the button that gives you a graphing insert a graph tool after you’ve manipulated the data. And we have all sorts of clever recipes that people would learn about, oh, if you wanted this particular kind of plot to visualize this thing, here’s the good way of setting it up. And then you do that and it takes a bunch of manual stuff. And now we have all sorts of really snappy, highly interactive tools where you can dynamically change the parameters and see the plots move in front of your eyes and just makes it really simple and really lightweight and embeds without people having to know it explicitly it makes a bunch of sensible decisions about how to set these kind of visualizations up.
Yeah, absolutely. It’s pretty cool.
So there’s data tools, there’s visualization tools, there’s also programming tools. Why is Excel great? One of the reasons Excel is great is it’s a data arrangement tool and it’s a programming tool and it’s a visualization tool, but even so it does all those things, but also with some exciting and terrible problems about how it handles each part of that triad. What does programming around research look like these days?
An example of, I think things that we’ve sort of built up over time is when you’re trying to do modeling with a dataset, you might want easy access to different types of off-the-shelf models. Maybe these are tree based models or maybe they’re deep learning models or maybe they’re just some artisanal regressions that you’ve cooked up with
Carefully biased.
Sure. And with various techniques, hacks, things in the loss function that other people at Jane Street have thought about. And so we’ve built up a pretty large and robust library of functions and classes that you can just kind of import and just apply to a dataset that you have that’s in a regularized format for you to be able to iterate through lots of hypotheses of what is the best modeling technique to use on this particular dataset. And that’s pretty cool
Where our tool for this used to be mostly Excel. Now our tool for this is largely like everybody else, Python and Python notebooks.
Yes. And I think Python notebooks are really great. I think the interactivity and the ability to annotate various cells with things like why the code was written in the way that it was written or dividing things up into sections, I think has just made it a lot easier to share research. It’s also kind of not a framework that was meant for a production system. And so there’s been a lot of, I think, frictions in terms of how to figure out how to bridge people’s enthusiasm in using a Jupyter Notebooks versus their extreme difficulty in wrangling in the end Productionization case.
I find Python notebooks kind of great and also super frustrating. One of the things that I understand why people like it, but it also drives me bonkers, is the evaluation model of Python notebooks. Things get evaluated in just whatever damn order you click the buttons, and that’s really bad for reproducibility. It means that you can have had a notebook that looks like it did a thing. If you do it in the order in which things are listed, it just doesn’t work. How did that get there? It’s like, oh, somebody sort of wrote stuff in some reasonable order and at first did things in the right order and then they well made a change to part of it. They didn’t want to rerun the whole thing, so they just went back to the part where they changed things and clicked that button. And it’s just easy to get the notebook into an inconsistent state and there’s no tooling for really knowing whether your notebook is an inconsistent state.
That stuff is really awkward. And actually when I think about things that I feel like are pretty painful about, I think our tooling, but maybe research tooling in general is having really good stories around reproducibility it seems like. I think that in some ways feels much better set up on the software engineering side of things like, oh, you’ve got a big repo and there are tests organized in a particular way, and every time you run your continuous integration, it makes sure that all the tests still run. It’s not that I think the people designing this stuff are dumb or something that the python world doesn’t know what it’s doing. There are fundamental reasons why it’s harder on the software side when we do tests in this kind of reproducible way, generally speaking, the tests are pretty fast. It’s okay to run them over and over and as we do more and more trading research and do it at larger scales, you just can’t lightly rerun all the different pieces. It sort of conflicts directly with this desire to have things be lightweight and easy to use. I find the whole ecosystem frustrating, but also don’t know what to do to fix it.
It helps when there’s people who kind of straddle the roles. I feel like a caricature that people might have is researchers are these bad scientists with no sense of organization just putting ingredients here and there and throwing things on the wall, and eventually something sticks and then we say, Hey, we want to be able to productionize this thing. And that comes off as extremely frustrating to the folks who at the end of the day help make sure that we get an exciting strategy out into the real world. On the flip side, from the researcher’s point of view and interacting with the developers, it might be like, why is this thing that seems like it should be so easy? I mean, I can just add a column to my dataframe. Why can’t I do this in this production system? And it might feel just weirdly inflexible in a way that just doesn’t really make sense. And I think it’s the tension that we alluded to earlier of having flexibility in being able to explore hypotheses versus having a performant and robust system and that tension is going to remain.
Although I think you need that robustness for sure. You need it when you’re going from the, I did some research to I want to productionize it, but also I claim you need it for the research too, because you get to the point of we built this model and we trained it up and we saw what its predictive power was. Oh, and now I want to make one small change to it to see if this change makes a difference. And if you don’t have a good reproducibility story, it’s often very hard to know if you made one small change or if you made six changes because other stuff changed elsewhere in the code base, somebody threw in some more columns and data sources mutated under your feet. So even just purely for the research process, I think this kind of discipline around reproducibility is really important.
Totally. And I think this is the type of lesson that needs to be, well, it doesn’t need to be, but it seems like empirically it needs to be learned through hard-won experience. And I think it’s also helpful from an empathy point of view and just us having better tooling to have people who straddle the boundary between doing this research and this robust productionization. And I feel like that’s kind of one of the strengths of Jane Street, which is that we don’t distinguish between these roles in the hard and fast way, and we do have a number of individuals that straddle these roles and help bridge the gap, so to speak, and make sure that things don’t get into the territorial or tribal kind of state.
Sure. So part of the story we’re talking about here is the transformation of the kinds of research we’ve done over time where early on tends to be brutally simple techniques, certainly lots of linear regression, and then over time we’ve ended up moving in a direction that looks a lot more like the modeling techniques you see in other places in many ways, and certainly where you’re doing lots of neural nets and tree based approaches and things like that. I’m curious in that environment, we fast forward to the kind of modeling that we’re doing today. What are the challenges you see that are different in the trading context versus what someone who’s doing ML in some other context might see?
That’s a good question. I think there’s actually two ways to answer things. So one thing that’s pretty difficult from the point of view of a practitioner is as a person who built up research intuitions from these brutally simple sort of frameworks, there’s some intuitions that you build up about what sorts of things are important. You learn that having priors is important and in cases where you have very low amounts of data, which is often the case when you’re doing some of these simplistic studies, the trade-off that you have is because you are so data limited, you would prefer to not leave that many data points out of sample and just be very careful about the number of hypotheses that you test such that you have a result that you’re going to believe later on that utilizes as much of the data that you have available as possible.
In some ways, you can think of this as sloppy data science, but it’s kind of imperative in a lot of the cases of high noise, low data regimes that we’re often operating under in finance. As you go into sort of these larger data sets and more complex modeling techniques, being sloppy with data science is fatal. And so a lot of times people might also be lazy and say, Hey, I understand how these models work in a scenario where this external event happens. Well, we don’t have that many data points to figure out what we would expect to happen as a result of that event. I guess that’s maybe a little bit too vague. Maybe you have a model of how the world is going to work and then something like the global pandemic happens, and how does your model adapt to that? If your model is simple enough and you really understand the story behind why the effect exists, you might have some directional thoughts on how your model should change.
And you might even be confident enough to change some of the parameters or coefficients in your model to keep up with this macro event that has happened. And the more you’re dealing with very complex models where you can’t understand the data that you’re putting in, much less sort of like the interaction effects that you are modeling with these very capable and high capacity models, you basically just need to be extremely disciplined about how you approach holding a data set out of sample, having a train set versus eval set and making sure that you’re also not duplicating hypotheses across groups. You might be doing very good and robust data science and believing that you are only believing out of sample results, but if you have a lot of folks at Jane Street who are exploring similar ideas, you might inadvertently be leaking or not doing proper data science in a way that’s kind of scary.
You described two approaches. I’m not sure one is less disciplined than the other, it’s just the discipline is in different places. One of them is disciplined in the model space where you’re like Occam’s razor, you want to restrict the universe of hypotheses you’re willing to contemplate in exchange for being able to operate in a reasonably reliable way in very low data circumstances. And then in the high data case now you have much less understanding and introspectability of what’s going on in the models. And so you need to use discipline around data and evaluation in order to control the system.
Yeah, that’s a really good way to put it. And one approach is not fundamentally more disciplined than another, but it’s a pretty hard switch for people to make. To the bones I think traders and researchers at Jane Street really want a model to make sense and to be able to have intuitions for how the models are going to change as a result of some event. And unfortunately, the more complex the models get, the more difficult this task is. And I think that’s just kind of like an uncomfortable, I dunno, realization that we’ve had to sort of navigate as our models get more and more complex over the years.
And this comes down in some sense to a question of what people often talk about in machine learning context is interpretability, right? And people sometimes talk about interpretability as do I understand what the model is doing? Do I understand why it’s making the decision it’s making? And that’s important. I like your point about going in and maybe you by hand adjust the parameters, which is what I feel like should be called something like interventional interpretability, understanding things well enough so that you can look at a circumstance in the world and look at the model and figure out how do I change the model in order to update to this event? Pandemic’s kind of a great example, like a once in a lifetime, hopefully,
Hopefully
Thus far once in my lifetime kind of thing. And therefore you don’t get lots of shots. You have an N of one where you have to come in and figure out how to kind of make that change. I mean, in some sense writ large, it’s not really an N of one because there’s a family of different deeply unusual events and you can get good at the process of figuring out how to respond to these deeply unusual events, the need for realtime decision making, being layered into the modeling process. That’s a good example of a kind of unique challenge in the context of trading because trading is so much about realtime decision making and you need the modeling to cooperate.
Yeah, I guess we can just go back to your original question of how is machine learning different in a financial context versus in other settings? And I dunno what to tell you, but machine learning in a financial context is just really, really hard. A lot of the things that you might learn in school or in papers or something, they just don’t work. And why doesn’t that work? I think ultimately it boils down to a few pathologies or quirks of financial data. I guess one mental model that I have that I’m borrowing from a coworker that I really liked is you can think of machine learning in finance is similar to building an LLM or text modeling except that instead of having, let’s say one unit of data, you have 100 units of data. That sounds great. However, you have one unit of useful data and 99 units of garbage and you do not know what the useful data is and you do not know what the garbage or noise is. And you need to basically be able to extract a signal in this extremely high noise regime with techniques that were designed for much higher signal to the noise ratio application domains.
Why does financial data have so much noise?
I think it’s just a very complex system. It’s kind of like millions of people interacting in a simultaneous way to determine what the price of thousands, if not tens of thousands of products is going to be. And so the idea that for any given set model or set of features that you have that you’re only going to be able to explain a very small amount of the variance that you’re trying to explain in which case in this case would be kind of like the returns of a particular instrument. I think it just shouldn’t come as a surprise to you.
Isn’t there also though a structural thing where the structure of markets basically incentivizes people? When you see regularities in the behavior of prices, you’re incentivized to trade against those and that beats them out of the market. And as things look more and more like martingales or whatever, they look more and more like pure noise. And so in some sense the whole system is designed, the pressures of the system are squeezing out the data and sort of inherently what you have to do in a financial context is look through these enormous streams of data that are mostly now random and find a little bit of remaining signal that you can use. It’s kind of like an efficient markets thing.
Yeah, absolutely. And I think you’ve referred to this as the anti inductive property of markets, and I thought that was a great way to describe it
Where you sort of go and take the things about the past and remove them from the future actively.
Yeah, exactly. I mean, another thing that is quite challenging, if you’re trying to do something like image recognition or digit recognition, the fundamental task that you have there or the dataset that you have doesn’t really change over the decades in financial markets. I guess one thing that I’ve been shocked by is that a financial crisis seems to occur roughly every year. Sorry. When people refer to black swan events or these stock returns are not normally distributed, and there are I think lots of events after which the distribution of features or the returns that you might see in your data just kind of changes and dealing with those regime changes. And it’s not just dealing with days where you’re suddenly dealing with 10 times more data, although you have days like that. It’s just cases where the market just works differently. The world is different. I think being able to deal with those changes while also being able to robustly train a very large model, I think that’s like a challenge that seems pretty unique to the financial domain.
The large model thing also connects to a different kind of challenge we have, which is just there’s a lot of data. I mean, I think I was looking before, we get a few tens of terabytes of market data every day. As you said, in some sense, most of it is kind of noisy garbage, but also you have to look over all of it. And being able to process all of that stuff both realtime and historically is challenging. And there’s also the realtime aspect of this is interesting too, because you have to make predictions at various different time horizons. There are trading systems we build where we sweat the performance details down to a range of 10 nanos, more or less is like a material part of our overall time budget. And there are decisions we make which involve humans in the loop thinking about a thing. And those are on much coarser grain. So you have many, many different orders of magnitude at which you have to make predictions. How does that complicate the modeling process,
Especially at the shortest timescales? The importance of the eventual inference system that you have and just physical limitations that you have from the speed of light starts to matter a lot. And so one thing that’s been a learning experience for me, I think over the last couple of years is a paradigm that existed I think for research before me, is that roughly the difficult part was the modeling. And that eventually you’d be able to figure out a system where you’re going to be able to produce the predictions that you need. You don’t need to think that hard about the limitations of the hardware that you have. That is just patently not true for some of the models that we’re training these days. Both due to the latency requirements as well as throughput requirements. And I think there’s more and more cases where the way that we approach modeling is not what is the best model for the job, but in first understanding what are the limitations of the hardware that we have and the hardware serving options that we have, and given these constraints, what is the best model that can satisfy these constraints, that can best model the data that we have?
I think it’s been fascinating as a person who is somewhat computer illiterate, hardware illiterate to learn about all these systems and to be kind of forced to think about all the interaction effects that matter a lot.
Yeah. And the diversity of hardware you’re thinking about is quite high. You might be thinking about evaluating a model on a CPU or on an FPGA or on a GPU, or in fact many different kinds of GPUs and using multiple different kinds of compute units within the GPUs and within the GPU, multiple different kinds of memory and different kind of interconnects between the memory and the host memory and interconnects between the GPUs and each other and the storage systems that you’re using, like the constraints show up all over the place. In some sense, I think of this as a thing that is different in the details in trading, certainly we care about latency at just very different timeframes than the vast majority of people who think about modeling problems, but the general spirit of the hardware matters an enormous amount. I think that’s actually become very broad over the last decade in the kind of machine learning world.
When you think about what’s important about the transformer architecture, one version of the story you can tell is the thing that’s really important is it was the thing that evaluated really efficiently on TPUs and GPUs at the time because we learned all these things about scaling laws where as you scaled up the amount of data and the size of the model and the amount of training time, you could just increase the predictive power of the resulting model. And so more than any individual thing about how I would carefully set up and bias and use my priors to make this model match the domain, the thing that matters is can you scale up? There’s a famous paper about the Bitter Lesson in machine learning, which is about all of this stuff about how you carefully set your things up just kind of stops mattering at scale.
I’m not like fully “Bitter Lesson”-pilled when it comes to all of the things that we do in research around trading. I still think there are lots of places where you’re relatively data poor and I think thinking about priors and the setup of the models and the way in which, in some sense in the places where you’re very data poor, you’re less data poor than you think in that you are drawing intuitions and ideas from other trading situations that you’ve seen over time. But the ability to do that kind of transfer learning where you learned in other places how markets work and you apply it to a new domain is super important. But there are more and more places where we can really make use of scale. And in those cases, yeah, now you suddenly have to have a completely different optimization problem of how am I going to shape the model to match the hardware and even how do I shape the hardware to match the kind of modeling that I want to do?
It’s a complicated back and forth, and I feel like the things you were saying about how there’s lots of fluidity between the different domains here and people, both individuals who kind of switch hit and operate in multiple different modalities and just close connections between people who are thinking about different parts of this becomes really important to doing this at a high level. So we’ve talked a bunch about what research was like a long time ago, what research feels like today. I’m kind of curious, what are you thinking about and looking forward to in terms of the future directions of trading related research at Jane Street?
It’s very hard to predict. I mean, in a lot of ways I feel like the next couple of years is just going to be especially tumultuous given the developments in AI and the external world. And I think we’ve had developments that are somewhat orthogonal but running in parallel to sort of the developments that you’re going to hear about in the major news outlets. And so I kind of fully acknowledge that whatever predictions that I have about 2025, much less 2026 are just going to be absolutely wrong. That said, I’m very excited about the roadmap that we have for machine learning and research at Jane Street. So over the last few years, we’ve just made tremendous strides in our machine learning capabilities, our understanding of the models and what works in the financial context, our understanding of the hardware that’s required and amassing the fleet of GPUs that we have, which kind of is over the mid thousands of very high end GPUs that we have access to. And that’s very exciting.
And by the way, that whole thing requires its own predicting the future, right? One thing that’s hard about buying hardware is you have to make all these guesses about what hardware is going to be in the world in a few years, what are our models going to be shaped like in a few years? And one thing I found exciting about this area is how kind of incompetent we are all the time. Five years ago we were an organization that didn’t know what a GPU was and it’s just dramatically different and it’s kind of an exciting spot to be where you are both landing models in our system that are making a real difference driving a significant amount of P&L. And also you can tell that you don’t know very much because every week you learn a bunch that you didn’t know last week. The learning rate’s incredibly high.
It’s very interesting and also very humbling for sure.
Yeah,
We’re building our modeling capabilities. We continue to consume voraciously and record tens of terabytes of market data as well as other metadata per day. And just I think helping us set the stage for the scale of models that we can fit. And so I think the story of the last couple of years is that we’ve really advanced our capabilities in really being able to make predictions with complex deep learning models, deep neural networks with a variety of data including market data. And I imagine that we will continue to grow in our capacity to make predictions from these models. And I’m pretty excited to see that trend through at least for the remainder of this year, if not further. I think there’s a lot of challenges in terms of domains. What are the ways that we can transfer the understanding that we’ve developed in the asset classes that we have a lot of experience and competence in, and how do we expand that to other domains where due to the relative scarcity of data, the same techniques might not apply?
How do we apply transfer learning there and how do we also work with different modalities of data? How do we interleave things like images or text or market data or what have you to really be able to make use of all the information that we have to ultimately be able to make the highest quality price predictions sort of in a financial setting. And the focus of Jane Street I think is in a first class way going to be in terms of how can we be better at trading? How can we provide a better service as a liquidity provider and a participant in the financial markets? But it’s also kind of been cool to see the lessons that we’ve learned in our context of doing research on financial data and how this translates to some of the efforts that you might see in next token prediction like LLM modeling in the world at large. And I’m also curious to see how that’s going to dovetail over the next couple years.
It’s maybe worth saying the way in which we use machine learning models varies depending on the kind of thing we’re doing. There’s some kind of stuff that we’re doing where we are trying to extract data that influences trading from the kinds of data sources that people in the outside world routinely build models for and publish data about. And so you might use LLMs or BERT models or whatever to extract data from text and you might use different kinds of models for getting stuff out of geospatial data and other things to get things out of images and whatever. In all of these cases, you can extract data and find ways of making that relevant to your trading decision-making process. And then sometimes what we’re doing is we’re looking at the unique time series structured data that comes out of the markets themselves where there aren’t that many papers published about how to do it because mostly the people who know how to do it don’t publish papers about it.
And yeah, I think that there’s a lot of excitement these days around new stuff coming out in the LLM space and all this new work on high compute RL and reasoning-style models and being able to take that stuff and see in a kind of multimodal way how this stuff can kind of fit together into models that more comprehensively let you uncover more of the overall trading process. I think the thing that’s maybe not obvious to someone on the outside is in sometimes how all inclusive trading can be. Sometimes you are staring at time series of market data, but you’re also sometimes just trying to think about how the world works because how the world works affects what the asset prices of things should be, and that’s going to affect what our behavior should be. And so as we get more and better models that can do more and bigger things, there’s just like a bunch of opportunities that open up of how we could pour this stuff all into that process.
Another thing I’m excited about, which is not about trading exactly, but there’s a ton of work that we’re doing around leveraging ML to just make people more efficient in general, right? We have lots of bizarre and unusual programming languages we use and all sorts of unique internal tools and data sets, and being able to build assistants and agents that can operate on this stuff and help people here be more efficient and get more done is something that’s very exciting from I think all of our perspectives just because the single most important resource here is the people who are coming up with new ideas and doing things, and anything that you can do to grow their scope and what they can achieve is just incredibly valuable. So there’s a lot of ML work of that kind, which is not exactly the thing we’re thinking about in the context of trading related research, but I think helps the organization as a whole a lot.
I mean, I think the potential value there is on par with the potential value of some of the other stuff that we’ve mentioned. And I think the pathway to how we’re going to leverage this tooling to really put the most powerful toolkit possible into the hands of Jane Streeters, that path in itself is just, it’s unclear what is the best path to leverage some of these foundation models to fine tune them. What is the set of tooling that we develop in-house versus what we partner with others in? What do we build first? What is the data that we need in order to build these tools? These are all very interesting, I think, and also complex problems that we’re thinking through
And in the trading space, we are the people producing, the models we’re using, and it’s a more complicated story in this assistant space where we get to see the cool things that vendors on the outside are doing. And a lot of the work we’re doing in that place can often just be about figuring out how to best leverage and integrate and drive value from those things. And it’s kind of exciting seeing all the new things that are coming down the pipeline from the upstream people who are working on these models as well. Definitely. Alright, well maybe that’s a good place to stop. Thank you so much.
Thanks for having me. Again,
You’ll find a complete transcript of the episode along with show notes and links@signalsandthreads.com. One thing I wanted to mention is that, as you may have noticed, the pace of releasing new episodes has slowed down a bit. Don’t worry, we’re not going anywhere. In fact, we’ve got a bunch of episodes planned that I’m really excited about. Things have been busy, and I do expect the pace to be a bit slower going forward. Anyway, thanks for joining us. See you next time.