Listen in on Jane Street’s Ron Minsky as he has conversations with engineers working on everything from clock synchronization to reliable multicast, build systems to reconfigurable hardware. Get a peek at how Jane Street approaches problems, and how those ideas relate to tech more broadly.
Ella Ehrlich has been a developer at Jane Street for close to a decade. During much of that time, she’s worked on Gord, one of Jane Street’s oldest and most critical systems, which is responsible for normalizing and distributing the firm’s trading data. Ella and Ron talk about how to grow and modernize a legacy system without compromising uptime, why game developers are the “musicians of software,” and some of the work Jane Street has done to try to hire a more diverse set of software engineers.
Ella Ehrlich has been a developer at Jane Street for close to a decade. During much of that time, she’s worked on Gord, one of Jane Street’s oldest and most critical systems, which is responsible for normalizing and distributing the firm’s trading data. Ella and Ron talk about how to grow and modernize a legacy system without compromising uptime, why game developers are the “musicians of software,” and some of the work Jane Street has done to try to hire a more diverse set of software engineers.
Some links to topics that came up in the discussion:
Welcome to Signals and Threads, in-depth conversations about every layer of the tech stack from Jane Street. I’m Ron Minsky. It is my pleasure to introduce Ella Ehrlich, who is a software engineer here at Jane Street, and who’s a real Jane Street lifer, she’s been here for about a decade. And we’re gonna talk today about the task of engineering legacy systems and a particular system that Ella has worked on for a long time called Gord, which has been here for about five years longer than she has. And we’re gonna talk about some of the challenges and interesting work that comes out of all of that.
But to start with, Ella, can you tell us a little bit about how you came here?
Hi. Yeah, I actually took an internship at Jane Street after my junior summer. I had no interest in finance. I did not think I was going to come here full-time. I thought I wanted to do game design. And I was deciding on my internship between Jane Street and going to EA Games, and a combination of two things happened. One, is I was like, “Well, I’m definitely gonna be in the Bay Area after I graduate from college, so let’s spend the summer in New York and I’ll go to some Broadway shows, I’ll eat at some good restaurants.” Like, “This Jane Street thing, whatever.” And I came and I had such a great summer, and I learned so much. And I felt like I grew so much during that summer that I decided, “You know what? This seems like a cool place to start my career, and, you know, maybe in a couple of years I’ll go out and do some game design stuff. But, like, that’s fine, I’ll go back to school and get a master’s.” And then, uh, you know, 10 years later.
Do you still spend real time on games these days?
I spend a lot of time playing games, and thinking about games, and watching games. I am a big pro-League of Legends fan. Go EG! But it’s now, I think, a thing that I’ve learned is that I’m very happy that it is my hobby, and I think the more I’ve learned about the gaming industry, the more I’m like, “Jane Street is the right job for me.” It is the place I’m very, very happy to have spent the past decade of my career, and a place I continue to be happy to come to work every day. And I think that gaming is a great hobby, but I wouldn’t want to do that full-time because, you know, I think to some extent, based on talking to friends who are in game development and in that space, to some extent I think game developers are a little bit the musicians of software. Most places at software companies, like, it’s, like, you know, a nice, cushy job. And game development is such a competitive environment where there are so many companies and they work long hours. The amount of growth and investment that Jane Street has been willing to put into me, and the people in Jane Street have been willing to put into me, I think it is harder to find that in the gaming industry. Again, based on the experience of talking to my friends who are in that space.
Yeah. Hopefully people in the gaming world aren’t too angry at us. I’ve heard similar things, that I feel like being a game developer is a little bit like being a violinist. It’s a hard thing and lots of people love it and want to do it, so it can be very competitive.
That being said, they make amazing things and I am so grateful that so many people are passionate about that because I benefit from it.
It is indeed very impressive. Okay, so let’s talk a little bit more about your actual work life here at Jane Street. I mentioned the system Gord before, can you tell us a little more about Gord and what it does and how it fits into Jane Street’s ecosystem?
Yeah. Gord is the system that takes in all of the trading data of our firm’s activity that we do on trading on platforms and exchanges all around the world, normalizes it, and then distributes it as a service within the firm so that we can do things like calculate our positions, upload our trades to clearing firms, settle our trades and make sure, you know, that the money and stuff actually change hands at some point in the future.
So you just said a bunch of words that are probably unfamiliar to people. Just to explain one of them, what do you mean when you say position?
Yeah. So the idea is, right, suppose that you buy a hundred shares of Apple, you can think of it now instead of having some number of dollars you make out, you have that hundred shares. And it’s important to know that because you need to be able to understand, like, the value of all the stuff you’re holding, which means you need to know all the stuff you’re holding. If you buy a hundred shares of Apple, you’d say your position is a hundred shares. And if you sell 20, your position is 80. So it’s just tracking all of the stuff that we’re holding that’s, like, not just the money.
Right. And maybe a thing that’s not obvious to someone who isn’t in this world is there are many different things that you need to do with this data. So you might in real-time on a trading desk want to know, like, what your positions are, what your risks are, how much money you have made, all of that, as you’re going throughout the day. At the same time, you might want to do firm-oriented worst case risk analysis, based on the same data. Also, you need to make regulatory decisions. It turns out, for example, a common one is that when we sell something, a weird thing that maybe people don’t realize is — sometimes you sell things you have, sometimes you sell things mainly that you’ve borrowed, and that latter thing is called a “short sale.” And you have to know what your positions are to know whether a given sale is a regular sale or a short sale. And there are regulatory marking requirements.
So your moment-to-moment, real-time interactions with the exchange are driven by and affected by this data because you have to change what you say to the exchange based on positions. And then, there’s all this often end-of-the-day reporting, uploading, synchronizing your information with various other players in the market. And all of these things are driven off of this one central source of data.
Yes.
Okay. So that’s what it does. From an engineering perspective, like, what are the requirements out of that system?
So it has a bunch of interesting requirements that other systems might not have. So one of the ones that obviously matters is performance. We are very actively trading all of the time. And it’s very important for Gord to be performing to its peaks because when trading is the busiest, that is when it is most important for us to have accurate information to make good decisions. And so we need to make sure that we are able to perform to, you know, 2 or 3×. And what we’ve seen on the most recent busiest day ever, because the next time it comes along, our busiest days tend to blow our previous busiest day out of the water by large margins, rather than an occasional, like, you know, “Oh, we get a little bit bigger, and a little big bigger, and a little bit bigger over time.” It tends to be the case that busy days are significantly busier than what we’ve seen in the past, and so we need to be always thinking about how much overhead we have in the system, and can we keep up with much busier than what we’ve currently seen.
And what’s, like, the rough multiplier between a regular day and, like, the busiest day of a year?
I think that it could be as much as, you know, 5× is pretty reasonable. I’m trying to think back to some historical ones. One of the things that’s interesting is over the last five years, our standard days have become what our busiest day ever was when I started at the firm, or something like that. Even probably more than that. And so the most recent busiest day ever we saw was about two X, but also previous comparison points have done five X or sometimes more than that. And so we are always waiting with bated breath for what the next multiplier will be, but we try and keep in mind at least two X and probably more than that as a regular thing.
2× more than the worst thing you’ve ever seen is something you want to be able to tolerate gracefully.
Yes.
And maybe a thing just to keep in mind is, like, all of these services that are back-ended off of Gord means if Gord is down, the firm isn’t trading.
Yeah. And in addition to that, or, I guess, as a result of that, we have very hard uptime guarantees. As you said, if Gord goes down, the entire firm stops trading automatically because we can’t make good decisions if we have no idea what trading we’ve done. So we work incredibly hard to make sure that does not happen basically ever. The system is designed to be able to fail in pieces in constrained ways that will hopefully avoid, you know, any sort of catastrophic failure. An individual exchange might have problems, but the system is designed to make it so that if that exchange were to have problems, it would not affect the rest of our trading.
In addition to that, whenever we’re rolling the system, we have extensive and somewhat arduous testing procedures that we go through to ensure that we haven’t made any of these sort of fatal flaws that might cause a catastrophic outage. We also always run, in addition to having the sort of real production version of the system called the primary, we have a standby which we also consider production and is just processing all the same activity and, you know, keeping up with everything in real-time, but is in an entirely different data center and on, you know, fully separate as much as we can make them, so that if we were to have a network partition and lose a data center, or just lose a box, which with the number of actual physical machines we have, happens with a surprising frequency, we could gracefully failover as quickly as possible to continue trading and be ready to go, you know, and keep things up and running.
So when I think about Gord, one of the things that I think makes it challenging is that the system is old, right? As I said, you’ve been here about a decade, it’s been here for five years longer than that, and it’s, you know, to say kind of frankly, it was written when we were less good at writing software and knew less what we were doing. And it’s kind of a sign of that. Like, I was involved in a bunch of the early design decisions around Gord, some of which I continue to regret to this day, and so a lot of the kind of story of the last few years of working on Gord has been about extending, and growing, and hardening the system and figuring out how to engineer out of problems that were built in and baked into the original design. So maybe as a kind of way of looking at that, it’s worth talking about what Gord looked like when you first came to the project.
When I started, Gord was…sort of the core of what it is is very similar. So the pipeline we have is there are sort of sources in the world which are, like, a data feed from an exchange. And there’s a process called a parser that gets a data feed by SSHing to the box and tailing a file on disk that is in a pipe separated number equals value format. Maybe someday we’ll change that. But, uh, that’s not one we’ve had, uh, actually a huge amount of success changing thus far, though we do have some plans on that front.
But…so we SSH to the boxes, we tail files. And then we need to take that data. And one of the things that I was I think a decision that was made early on was that Gord would deliver, quote unquote, correct data, all the time. And what this means is that Gord actually does a lot of making up of data when we get bad data from exchanges. So for instance, there was a promise that was made that Gord would deliver an order before it saw a fill. A fill is another word that we might use for trade. You might also hear me use exec interchangeably. I will try to be consistent. But, uh, we use fill and exec very interchangeably within the team, and so sometimes you may hear that.
So when we get a fill, if we have not seen an order, Gord makes one up. We think, in retrospect, this was a mistake. But this is baked into a ton of things, and a lot of things assume that when they see a fill from Gord they will have previously seen an order. So we’ve been sort of dealing with the process of unwinding all sorts of legacy decisions like this.
In addition to that, Gord is going to always deliver normalized symbology. And so maybe you have the stock Apple US, and you’re trading that on the New York Stock Exchange. The way they represent that might be something called a Reuters code, which is something that might look like “AAPL.P” or some other thing, depending on the extension they need. You might be trading this on an exchange that uses Bloomberg, which will look something like “AAPL US”. Bloomberg, when I say Bloomberg, I mean, like, Bloomberg symbol, one of the standard symbology formats in the industry.
Yeah. And it’s maybe worth saying, Bloomberg, not just the Mayor of New York, also this company that makes an extremely pervasive terminal that all sorts of financial professionals, including us, use to access all sorts of data. And they’re kind of an excellent data provider of last resort. Like, you might want to get direct data from a particular exchange to get, like, the most granular or most timely version of something, but Bloomberg has everything. And so they’re a kind of foundational data supplier. And they have their own symbology. And, like, actually this is, like…I think summarizes a kind of terrible thing about this whole financial world that’s a big and weird source of complexity, is that just understanding what the names of things are is actually incredibly complicated because you have different sources of different names from different places. The exchange names things one way, data providers name things other ways. And actually, the US is relatively tame, US equities markets in particular are relatively tame in this regard. But if you look at foreign exchanges, you look at all sorts of different asset classes, it gets exponentially more complicated. The task of just figuring out what things are called is a surprisingly complicated normalization job that Gord is involved in.
Yeah. It’s one of those standards problems that everyone has some way of describing a thing, and they look at all of the ways people describe symbology in the past and they say, “Oh, I’m gonna do something better.” And then you have, you know, 18 million standards and every one of them is bespoke, and every one of them has edge cases and corner cases that are weird and hard to deal with.
But yes. So, one of the things we have to do is normalize symbology. We also normalize a ton of other information, like usernames are an easy thing where you probably want to know which trader at Jane Street was the person who did the trade. And if they’re trading on a different platform, it might, you know, represent that with their email address. Or they might have a specialized log in. So we need to take all of those different bits of information, have a mapping that turns them into the actual Jane Street username that we use internally, and do this for a really large number of fields.
One of the things that we’ve been sort of working towards over the years is moving more of this normalization into the sources where it is…that information is more naturally contained, but this is still a problem that we absolutely have to deal with. And, as with sort of all of the things, because it’s a legacy system which has a lot of reliability guarantees, a common thread will be we’re moving slowly in this direction, but it’s difficult to make progress because you can’t break the system out from under anybody.
And this…by the way, this particular bit of complexity is very old because it goes back to the original design and purpose of Gord, where you said, “Oh, we have all these data sources that we’re changing so that they do the normalization first, and Gord doesn’t have to.” But when Gord was first written, that was not an option because the original version of Gord was just reading out…like, we would get data that we just, morally speaking, just downloaded from exchanges and brokers, and dumped into a file. And then the task of Gord was to slurp all this data up together and normalize it. And there was no other intermediary that could do that.
Now, today, most of the trading activity we get is not something that comes from some third-party system. We have some system that’s directly interfacing with and understands the details of the particular place that’s being traded with. And the software developers and people who are working with that system are well-positioned to understand what’s going on and think about the normalization. So, that’s a kind of shift that you’re talking about, of going from one system in the middle that has to get everything right, to a kind of more distributed thing where people in different parts of the organization understand what they’re doing and can solve the normalization problem locally, rather than, like, shoving all this crazy normalization on one small harried team.
Yes. There is still a surprising amount of trading that we do that comes from these sort of weird direct third-party sources, but the majority of our trading is now done through systems we write. And we’ve introduced another step that sits in the process between just that direct download of data from the weird third-party place that can also add in a normalization layer there that can, and hopefully, get that, you know, specialization into a better spot for it.
Got it. So it sounds like one of the core responsibilities of Gord is normalization, and some of that normalization is what you might call kind of field normalization. Of, like, there is some piece of data and we have to say it in the right language. And some of it is kind of more almost, like, protocol-level enforcing of invariants. Making sure that messages come in the right order and have the right meanings, and translating the transactions you see from the other side into a single consistent language of transactions.
Yeah. And I think a number of choices that were made on that second one especially, are ones that we’re trying to unwind or avoid, or fix going forward. Easy example of this is, like, the sort of core types that Gord knew about when I joined the team. There were orders, there were outs, which is basically we’re saying an order’s closed, and there were fills. And if we wanted to encode anything else, we had to shove it into one of these types, often in ways that didn’t fit well. So for instance, maybe you send a message to an exchange, and rather than getting filled, you get a reject. Rather than adding a first-class reject type to represent what just happened, this was represented in Gord by making up an order when we saw a reject, and then making up a closed message. And so there was a field inside the order that was like, “It’s a synthesized from a reject,” and to figure out, you know, “Are we getting a lot of rejects?” You had to go look at the internals of the fields, as opposed to having a first-class type that represented this.
And this sort of thing happened over and over, where it was, “Oh, we need to represent a new thing, let’s just figure out how we can kind of shove it into the existing data model.” And one of the reasons that we had to do it this way is because the original way that clients connected to Gord was directly over TCP, and so the thing that we did is we had basically a type that represented our messages, and we used a protocol called bin_io, which just turned it into a binary format. And they had to be able to read that on the other side.
And the versioning story for this was: there wasn’t one. The way that you added fields to Gord was we had a string map of fields because if you wanted to add a new field at the sort of top level type here, you would actually need to make all clients roll their system forward to be able to understand that.
Just to jump in there, when you say a string map, what you’re saying is the basic data representation, instead of being something that has a fixed clear type, it was just like a bag of fields. Key value pairs, and you threw them together so you could freely add new fields, and take away fields, and change the composition of fields without having to change the format. On the face of it, what that description sounds like, “Ah, this is great.” You want to evolve the schema, you want to evolve the way in which you represent the data. There’s nothing in the data format that gets in your way, so what got in your way?
Yeah. So, first of all, things that are difficult here are, it makes it harder for people who are using things to know what fields they should be using and how they should be using them. Again, we were able to represent new things by being like, “Well, here are more things we can put into our bag of fields,” but we couldn’t actually change the top-level representation. So when we wanted to do something new, we had to simply figure out how to represent it as just more fields attached to a thing that was increasingly less representative of what the original design of, like, an order was. When you see something that’s an order, surely that’s an order. And the answer is, “No, sometimes it’s a reject,”which is a confusing one to explain to people.
Another example of this is when you see a fill. A thing that might happen to your fill is it might be corrected or you might get something called an allocation, which is basically when we do the trade initially we don’t necessarily know what account it’s going into for some complicated reasons. And later you get a message saying, “Put that in this account,” and that’s called an allocation. And, again, all of these things were just represented as adjustments to the fill by sending the full fill message again, so you would say, “cancel this fill and rebook it.” You had no ability to do, like, smaller adjustments on top of your data in any way.
Got it. So, in some sense, there was this, like, gooey, extremely flexible dynamic representation of the data. Which you might think would free you up, but actually it locked you down because what happened was people on the other side who were consuming the data just implicitly in their code depended on the structure that was embedded in there. And so, despite the fact that the file format, or the message format, lets you change things however you want, in practice, you couldn’t change things without breaking your clients.
That’s right. And because of where Gord sits in the firm and the role that we play, we have hundreds of clients and many of them are incredibly critical to trading, and so breaking our clients is not really an option for us. A thing that we used to do in the previous version, which I’ll explain what we do now, but when we wanted to do a fundamental change to the version protocol, we had to basically add upgrading or downgrading inside of Gord and know the version our clients were connected to, and then do the downgrading ourselves for them. And this is actually just kind of expensive. It turns out, one of the most expensive parts of Gord is serializing information. And when you’re sort of downgrading things for clients, again, the way this was initially done was clients would open up a direct TCP connection to Gord, and they would get the messages sent to them. And the downgrading would happen server-side. And there was some stuff baked in to try and make it, you know, reuse stuff appropriately. But if you had four versions of your clients, you’re now serializing the message four times. And then when you’re doing this and trying to send data to hundreds of clients, this actually gets very expensive.
So we avoided this by trying to make the type as gooey and flexible as possible, and not really having the ability to change it at all. And this, to be fair, worked for a very long time. But as it expanded to cover more things, and as we, as a firm, have gotten into more new and different kinds of trading where we wanted to be able to represent more interesting message types.
One thing you said in there was that this gooey representation didn’t make changes at all. But that’s not quite right, in the sense that there are some kinds of changes that you can make freely, anything where you’re just like, “Here’s an extra optional piece of information, and I just want to add it on.” The gooey, I just have a collection of key value pairs. Yeah, you can just add a new key value pair, and anyone who didn’t know about it will continue to not know about it, and anyone who wants to see the data can and can interpret it. So some kinds of changes were easy. And I guess some kinds of changes were hard. And maybe an example of the kind of change is if you want to add a new kind of transaction, well, that’s a thing that everyone who consumes the feed needs to understand, and that’s the kind of thing that has to be one of these breaking changes.
Yeah. That’s true. And the sort of core types we were representing things with were relatively large. And so, one of the things that’s expensive about Gord is serializing all this information. And so, as we were starting to do new kinds of trading that required many more messages, but most of the information in them was very, very small, we did not want to encode that in our existing types because it would be very, very wasteful, in terms of the amount of overhead we would have to include. So we made a new format that we used to represent things.
We’ve talked a lot about the data and kind of at a…in some sense, what was Gord’s responsibility, in terms of getting, and normalizing, and distributing that data? But what was the system architecture itself like? You said this was…Gord is like a…the global order database, which isn’t exactly an order database, though it is global. But what is it? How is data distributed in Gord?
Yeah. So I mentioned we had a parser, and then there was a system called the DB that basically collected the information from the parser, maybe made up some more extra fake information around if we had a fill without an order, add in that information. Then, it would distribute it to a normalizer in each office over TCP. And the normalizer’s job was to apply all of those normalization passes, in terms of, you know, fixing symbology, fixing the users, all the things we kind of mentioned there. And then, it would distribute it to clients, again, directly over TCP. And so one of the biggest problems that Gord had at the time was fan-out because we had hundreds of clients, and so we were sending the same messages for our full data stream out to all of them. And that was a pretty expensive process.
So you talked before about the contract that Gord had with its clients, i.e. the things that were consuming data from it. What did the contract between Gord and the data sources it had, the things that were feeding data into it, the exchange connections and things like that, what did that contract look like?
That contract was mostly, “You will give us things, and we will keep up, and we will never fail.” Um… (laughs).
(laughs)
…it was a hard contract to maintain. So, yeah, we didn’t actually have a spelled out contract with our sources of, like, how fast it could give us information. We had things that, you know, hand waves, like, maybe a Gord dev had agreed with a friend dev when they first wrote it. Friend being one of the automated systems that writes activity to Gord. But maybe two developers had at some point agreed to some value, but that was certainly not written down anywhere and there was nothing that enforced it in any way. And if Gord ever fell behind, again, we would halt things, so our policy was we didn’t fall behind. This was obviously kind of an impossible thing to maintain as we grew, especially with a lot of the parts of the architecture.
So we had to figure out, like…when I started on the team, there was no performance testing that we did on a regular basis. I had to build performance tests to know what our maximum rates were and how close to them we were. We were lucky at the time that the firm’s activity was small enough that we could keep up with it on a regular basis, but we had to learn, like, at what point do we require our upstream sources to shard their activity into multiple sources. Because one of the things that is really nice about Gord’s architecture is it parallelizes incredibly well. A single pipeline of parser DB normalizer can keep up with about…it can do 50,000 messages per second, but there’s some things on the end now that cause that to be slower, for good reasons. So we’re gonna say 25,000 messages per second is, like, allowable.
But, we can parallelize that across many boxes and many copies of this pipeline, so the actual throughput rate of Gord scales arbitrarily. And we have not run into real issues with, you know, just kind of continuing to expand horizontally. But understanding what our sort of one single-source pipe throughput was a thing we did not know, and we’d had no promises about at the time.
I guess one of the things that backs this is that the basic architecture is what you might call another context eventually consistent, meaning you have all these data sources that are providing data, and the ordering guarantee when you consume data is that the data from a particular source will come in the order that it was entered into the system. But you can get a kind of shearing between different data sources. Things might come in different orders for different consumers, and that allows you to build a system where there aren’t a lot of tight dependencies by, in some sense…in exchange for having lighter guarantees that you provide to the users. You make the scaling story simpler and easier.
Yes, that is a fundamental thing that we kind of can’t build the system without because trying to keep up with the actual full throughput of Gord in a single thread or a single process would be well beyond what especially our architecture of OCaml at the time could handle. Even now that we have put a lot more work into understanding how to make OCaml, it would still be a very difficult task to handle the sort of entire stream in one single thread or in one single process, or even on one single box. But, it is the case that because basically none of our users actually care about the hard ordering guarantees between things that happen at different exchanges, because, frankly, you kind of can’t care about that because, again, they happen on different exchanges, so maybe there’s a delay from the exchange sending it to us, rather than from our own internals, you can’t actually guarantee that you’re going to see every event in the order that happened in the real world, ‘cause that’s kind of not a meaningful statement. So we just make promises within a single upstream flow through.
Even if, like, the ordering isn’t externally meaningful, you could have a thing where the system agrees upon a single ordering and everyone always sees things in the same order. And I think that has some upside. It lets you do replication and things like that in a simpler way. But it also has a massive downside because it limits the ability to which you can recover. Like, one thing that I remember us thinking about a lot is, well, it turns out we are a distributed operation, we have offices in New York, and London, and Hong Kong. And if we lose a connection or have a degraded connection between two different offices, we might want to be able to kind of continue operation concurrently in both of those offices, even though the information from the other office is somewhat degraded, right?
Yeah.
And so that’s the example where you, in some sense, are relying on the fact that you have this kind of softer guarantee to give you better availability than you could get otherwise.
Yeah. Like I said, very, very, very useful for us. If we gave up on that guarantee, we would be very sad. Or, that lack of guarantee, perhaps.
Okay. That’s kind of a tour of what Gord was like when you first came to it. What are the things that you’ve worked on since then to kind of grow the system and kind of repair some of the problems that were there in the original design as you saw it?
One of the big things that was being worked on when I joined the team was trying to change this process of all of our clients connecting to us directly over TCP. This was a very difficult scaling problem for us that was causing real problems because every time you added a new client, you had to serialize the data to them. And so the amount of work you had to do as the sort of Gord service scaled with the number of clients. And as the firm was doing more things and getting bigger, this was just unsustainable. So what we did is we actually looked externally. We saw Kafka, which is a thing developed by LinkedIn, which is a distributed message queue that has really good properties for scalability. And in addition to that, one of the properties that was really, really nice is rather than being a push-based system, it is a pull-based system for clients.
So one of the things is when Gord was delivering activity to its users over TCP, it had to basically just, like…clients would open up a pipe, and then Gord would just send them messages whenever it had them. And if the client was slow, Gord would have to buffer data because it has to get buffered somewhere. And so because we did not want to hold arbitrary amounts of messages and memory per client, if a client was too slow, we would just kick them off. And, it turns out, while many systems do care about being highly performant, there are a lot of other systems where their performance just doesn’t have to be that high. For instance, if you’re an end of the day regulatory reporting thing, you don’t actually need to keep up with Gord at its busiest. You just need to make sure that by the end of the day, you’ve been able to process everything.
And so, this notion of people having to be able to perform to Gord’s peaks to write a Gord client was very painful. And so, Kafka has the property that clients ask for messages when they’re ready for them, and it just stores them in memory and can give you an arbitrary message out of the stream that you want if you have the right sort of pointer to it. And so, this allowed us to sort of change the dynamic, which meant that if a client was slow, they would fall behind the tip of the stream, and we might be able to monitor and alert them about that, but we wouldn’t actually make them die. And, again, if you’re a process that has to actually process every message and you can’t keep up, well, when you die, all you’re gonna be able to do is restart and just be sad. So —
And make the upstream service sad again.
And make the upstream service sad again.
You’re just creating more work.
Yeah.
There’s something confusing you said before, where you said, “Ah, we had this problem where we opened all these TCP/IP connections, we had to do a bunch of work per client as a result of that, of sending the data to them over TCP/IP, and then we switched to Kafka, and that made things better.” But Kafka also uses TCP. So, like, where’s the magic?
Yeah. Really what was going on before is not just that we were using TCP, but that we were doing a very naïve thing with our use of TCP. So when a client connected to us, we would simply buffer the data for them, and if they fell behind, we would just continue buffering the data in memory.
And just to interrupt for a second, when you said, “We would,” right? Because this is…like, a part of Gord that is a very old mistake, we just, you know, like morons, picked up the operating system APIs and used them without thinking very hard. Until we said, “Ah, there’s a client, they have a TCP connection. We send them a message, we send them a message, we send them a message.” And the buffering is then done by, like, some combination of the operating system and the low-level systems code that manages the individual TCP/IP connection. So, in some sense, separated from what you think of as the application layer of Gord itself, there’s all this behind the scenes work in buffering that occurs.
Yeah. And we weren’t really thinking about that. We did think a little bit about the fact that serialization is expensive, so we tried to serialize only once, but there was still a lot of stuff with this buffering that we were just accepting what the operating system did and not really thinking about it. And this turned out to get just, you know, more and more expensive as we had more clients.
Kafka solves this problem a lot more intelligently because it’s in the sort of architecture of the system. So when you write the message into Kafka, it, like, stores it on disk, and when a client is connected, they ask for messages when they’re ready to handle them, rather than being push messages. And so if a client is not keeping up, then Kafka is not trying to buffer the messages in memory that they’re going to need eventually. They’re there on disk if the client wants to ask for them, but it’s not holding them and trying to be ready and also give them to the client actively.
As we were saying, this question of what’s push and what’s pull is kinda subtle. At a low level, TCP is kind of doing something like this, where there’s, like…there are receive buffers that the operating system can end when the receive buffer is full, it is in fact going to stop pulling data in. But, at least from the APIs that we surface to programmers, we gave people an API, we’re like, “Oh, here’s a TCP/IP connection. Just send data that you want to get there, and eventually we’ll make sure it gets there.” From the point of view of the sending program, it was push oriented. And we kind of weren’t keeping track and weren’t thinking intelligently about, like, how the resources and buffering of data was happening in between in that case.
And at the level of the TCP/IP connection, it’s kind of too low-level to do anything smart. Like, it can’t understand that they’re sharing at that level without having something that’s explicitly binding all those things together and understanding them as a kind of single logical message queue.
Yeah. One of the other things that was really expensive when a client connected to Gord that I haven’t really touched on here, but also Kafka made much better for us, was the idea that when someone connects to Gord, they often want to know everything that had happened up until that point. So we would generate a snapshot for them of all of the activity. And that generation of a snapshot had to happen per client because, you know, people connect with it at random times. And so, again, Kafka allowed us to basically just have a sidealong process that was consuming the stream and occasionally snapshotting, because it is nice for people to be able to catch up more quickly. But, again there, the snapshots could be stored in a ready-to-go format, and then you would just read from the latest snapshot and then pick up some point in the pipe after that. As opposed to snapshotting per client that connected.
In some sense, the answer to my question of where is the magic, is there is no magic. There’s just a bunch of engineering that you need to do in order to build a message queue. And rather than have this kind of message queue functionality be sitting in a deadly embrace with the rest of the functionality of Gord, what you did was separate them out. Use some standard, well-engineered, already existing piece of infrastructure for doing the message queue part of the work. And then, focusing the work that we did on Gord on the actual part where there was a unique value-add, which is the core part of the pipeline of, like, consuming and normalizing all of this data.
Yes.
So that’s, like, one big move you guys made to improve the architecture of the system. What else have you done?
One of the other ones that was a pretty big deal I mentioned briefly. So, the way that our upstream sources wrote us messages was using this protocol called FIX, which is how we represent internally for the files that Gord was consuming as a, like, tag equals value pipe-separated thing. And there were no constraints on this. So a common problem that we ran into is someone wanted to write a new source to Gord, and they would have to contact us and be like, “So, there’s, like, a hundred different fields I could provide, what ones do you need to provide, and why, and how, and, you know, what’s relevant?” And we, as Gord devs, kinda had to know, “Oh, you know, transaction time is required for all execs. And for bonds execs, you must provide a settlement date. But for other kinds of execs, that’s actually an optional field.” And there were a number of these invariants in how our upstream sources had to write us information to deliver that information in the way Gord would interpret properly that were not anywhere and we sort of provided no help in people creating these.
So, we’ve made a new library that we are currently in the process of working on and continuing to iterate on. This is a slow process to get it into a polished state where if someone wants to write us a message to Gord, they have a library where they can say, like, “Here’s the type I want to fill out,” which is, you know, a record with all the fields that are required for a bond exec, or a wholesaling fill, or, you know, a different kind of message that they might care about, and we tell them what fields they need to provide to us. And then, they don’t have to think about the translation into how Gord is actually going to consume it, which also gives us the ability to hopefully someday change that format from being this, you know, a files on disk thing. Which, you know, we have some stuff in the process we’re hoping to do there. But, that’s a big one where it’s, like, making it easier for people to write correct data to Gord with less iteration on our part.
So when we are thinking about that, in some sense the lack of types or schemas, or something like that, in the story, right? Both when people are handing it over to you and when people are consuming. Gord is acting as this very complicated rendezvous point. There’s all these people who are handing data over to it, there’s all these different sets of people and hundreds of applications, hundreds of code bases that are consuming and reacting to that data. And the Gord team is sitting in the middle, needing to understand important things about all consumers and all the producers, and try and guide all of those pieces together. And you’re kind of in this middle step where what you’re doing is providing a library which is at least a place where Gord devs and the people who are providing data can collaborate and think at the level of logical and highly structured transactions, right? The other can say, “Here is the type that represents the thing that’s happening.” And then, the Gord dev team can think about how to translate that into the, like…the gooey internal representation and think about its place in the rest of the pipeline.
And the longer term game plan is to actually get rid of the gooey internal representation and go to something that has more structure the whole way through. Because the amazing thing about type systems, in some sense, and there are type systems that sit inside of programing languages, and there are type systems that you have that are on message types that span across different applications, but they have this lovely property of, like, locking systems together and translating force across them. If someone wants to make a change and they need to modify the types, and you see how those type changes flow through, you can understand by that process of flowing the type changes through the effect on a broad set of the infrastructure. And it helps that we have everything in this big monorepo world where all of the different pieces of code, at least the most up to date versions that you would get if you rolled a new one, are all sitting together and compiled together. And you can have these invariants that cross widely separated systems. In some sense, the world you’re moving towards is one where you get to leverage types kind of more consistently in the way that you build systems.
Yeah, yeah, for sure. Another big one that we’ve been, again, working towards over the course of many years, is when I first started on the team, you could consume the entire Gord stream. Or you could consume just the fills. And those were kind of your two options. Around when I was joining the team, we added a new stream called monitoring, which was a place where people could put sources that didn’t affect our bookings and our positions. But, like, it turns out this process Gord was providing of normalizing data was more useful than just for the things that we were making sure we had to actually track in our position to upload to clearing firms. There were more use cases people might want to have for it, so we introduced the monitoring stream.
And it turned out two things. Either the real thing, (or sort of all of the real thing), or all of the fake thing — which were actually insufficient as we continued to grow and do more things. It was useful for people to be able to subscribe to all of the information about just options, or all of the information about some of our client-facing trading, and some of other things like that. And so we’ve been starting to be able to express meaningful subunits of activity people might want to subscribe to, and give people pointers to options data or other things like that. Rather than, if you want the options data, you have to subscribe to the full stream and just filter out the stuff you care about.
So is this a kind of static segmentation of the flow? Or a more dynamic system where you get to express, “Here’s, like, a predicate that represents the data that I want,” or, “Here’s some kind of logical specification that I want,” and you get that? Or is it just, like, a kind of physical breakdown of it into different sub-streams?
It is currently relatively static, in that we have to know ahead of time where these things are going to end up. However, we have some dreams about making that a little bit more dynamic over time. But that is something that is still quite a ways out. But even so, the pre-ordained breakdown still provides a lot of value to people and still makes it much easier for people to consume parts of the information in smaller subsets. But obviously we’d like to get to a world where you could be like, “Give me information about things with this symbol,” which would be very nice. But that is still a ways out.
So you’ve talked a bunch about ways in which you’ve worked on extending Gord, in some sense kind of at, like, the core functional level. Changing the APIs and changing the way data flies around. Are there things that you have had to do at the engineering process level to deal with just the extreme combination of growth and criticality of the system, right? As it grows in complexity in the number of things you have to support and the amount of data, and yet still the thing that the entire firm depends on for all of its trading. What have you done at the level of trying to build a process that lets you reliably make changes and officially make changes?
When I started, there were no in-code tests. Not quite none. There was, like, maybe one or two files of very small amounts of things, but there were virtually no in-code tests for anything in the system. In addition to that, the way that we verified that we had not broken anything was we would roll our system to dev and we would just look at the diffs in the output from production and the thing we were testing over the course of many days, to just try and make sure we had seen enough variety of things that we’d confident that weren’t going to, like, fundamentally break anything.
In addition to that, there was a really wide range of testing procedures that we did, and many of them we still do, when we’re rolling production, to ensure that we don’t break the system in ways that we…breaking the system would, again, halt all the trading. We don’t want to do that. So we had a lot of arduous procedures. When I joined the team, we rolled the system a few times a year.
(laughs).
Yes. One of the things that we are now at is we roll about once a month, which is still quite slow. And indeed, for a firm that does as many new things as Jane Street, and on such a frequent basis, this can be a pain point. We work really hard to try and make sure we know what’s coming in advance and that we’re well-prepared so we get things out in time, but it is still a painful thing. And we would love to get to a spot where we can roll the system more frequently. And we’re doing a lot of work in that direction, and have done a lot of work.
One of the early things I did, I mentioned a…we had no performance test. I added a performance test so that we could measure performance of the system every time we rolled, to make sure we hadn’t introduced a concern in performance regression. In addition to that, we built an in-unit test that tested the entire architecture of the system so that we could, you know, have our input message types all the way through to our output types. The way that we tested things when I started was you would literally spin up an entire dev version of the system, and then you would manually construct the activity that you wanted to run through it, and then you would run it through. And this process was, like, slow and pretty painful to set up. So you basically just had to trust that all the other developers on the team were following the procedure and were doing all the things appropriately. And you would look at, like, a few sketch notes in their testing of the feature. You know, they checked all the right things. Once we had in-unit in-code tests, we could actually add a test for the behavior change you intended and demonstrate that it had the effect you intended. This allowed you to make changes a lot more confidentially and faster on that note.
There are a number of things about how we roll that were relatively painful processes that we have done a lot of work as a team to improve. When I started, the way that we rolled Gord was there was a symlink in basically our virtual file system that pointed to the binary of the current production, Gord. And so the way that you flipped Gord to make a new version primary was you went in and you manually changed that symlink. Honestly, there are things that are…could be worse with that. But the thing that made this really painful was Gord is global, so today is not today everywhere. So what would happen is around 3:00 PM New York time, which is around 3:00 AM Hong Kong time, Gord would start up for tomorrow. So you had to change the symlink for Hong Kong between when Hong Kong Gord shutdown, which was 9:00 AM, and 3:00 PM when it started up again in New York. And then, at around 5:00 PM New York time, which is around 10:00 PM in London, and you’d have to change the symlink in London at 5:00 PM before London Gord started up. And then, when Gord shut down at 9:00 PM, you had to go in and change the symlink in New York.
And if you failed to do any of these things, then Gord would be considering different Gords primary in different offices. And so clients would be getting weird data, and, like, this would cost many, many major problems. And, indeed, this was a thing that had survived because the people on the team were excessively careful and thoughtful, and, like, it was, you know, a thing that survived right up until my second roll.
(laughs)
(laughs) Um, I have ADHD, and remembering to do specific tasks at very controlled time windows is a thing that I am bad at. And so, I think this was actually my second roll on the team that I was doing. I rolled Hong Kong at 10:00 AM, and then I went home for the day, and at 8:00 PM realized in a moment of horror that I had not changed the symlink for London. And so, at 8:00 PM, London had already started up. So we then had to do a relatively painful rollback procedure, and it turned into a relatively large production incident all over, forgetting to change a symlink in a particular two-hour window. So we fixed that.
Now, it is the case you basically can stage a change and none of the symlink stuff is used anymore. There’s a process that tracks which Gord is primary on which day. And it’s just a config file that you can change in advance. You can say, “I would like Gord to flip three days from now,” and it will just do it for you. Everything is fine. It’s very automated. There’s none of this sort of…it understands a notion of date, and then Gords just figure out what date they’re currently running on and use, “Am I primary or not?” As opposed to the thing we had before, which was not awesome (laughs).
Right. Any time you have something that depends on people just being repeatedly careful over and over, you’re eventually gonna run into a problem. The story also highlights just a weird thing about our infrastructure, and I think, in some sense, about the trading world, which is the notion of day, right? Like, you have a trading day, and lots and lots of applications start up at the beginning of the trading day and shut down at the end of the trading day. And that’s historically, early on, how we did kind of everything. And the complexity of Gord is it was kind of straddling those two worlds of, like, we have basically three regions in which we operate, centered in New York, and London, and Hong Kong for North America, and Europe, and Asia. And each one of those Gord instances had its own trading day, and it would start at one point and end at another point. But also, they communicated with each other and shared information, and so you ended up with this kind of weird handoff and overlap, and shutting down and bringing back up. And this is increasingly not how we operate.
More and more, Jane Street just operates 24 hours a day. And more of our systems have moved over to this model. But it’s just kind of a good example of the kind of complexities that are, like, somewhat unique to the historical story about how the system evolved.
Yeah. One of the things that we want to do is make it so that Gord is more 24/7 available. But there’s also a lot of really nice things that come out of being able to shut your system down, like having to reboot windows and being able to archive things at a clean time when nothing new is being written into the system. We are working towards a world where we will be able to set up our boxes dynamically and fold boxes sort of in and out of being used on a dynamic basis. And then eventually hopefully get to a point where we actually can have Gord be 24/7 without all of these sort of pain points. But because Gord wasn’t initially architected with this as the plan, it is, again, a relatively arduous process with many steps along the way to get from point A to B.
Okay, so let’s switch gears. We spent a lot of time talking about software engineering, which is the core role that you have. You also do a lot of other things at Jane Street. And one thing you’ve been involved a lot in over the years is with recruiting and with our internship program, and a lot of things surrounding that. And you’ve also done a lot of work specifically on recruiting of underrepresented groups of various kinds. I’d be curious if you could tell me a little bit more about, like, the kind of efforts you’ve been involved in in that part of the world?
Yeah. When I started at Jane Street, I was the second female developer at the firm. Jane Street had already been around for many years, and so this is kind of sad. And people recognized that this was sad, but they didn’t really know what to do about it. And so I, uh, relatively early on, got involved in helping us figure out how to solve that problem.
One of the things that was a big issue is that there is a view in the world that finance is a pretty bro-ey and unpleasant place to work as a woman. And while I have not found this to be true at Jane Street, I think the culture is excellent and I’ve had really great experiences with all of my colleagues. As I mentioned at the very beginning, I had no plans on working in finance when I graduated. And had I not done an internship at Jane Street, there’s no way I would have even considered it. So a big part of what we’ve done is we’ve done a ton of outreach programs aimed at finding folks that might be actually a really good fit here and might really enjoy their time at Jane Street, and giving them little tastes of what it’s like to work with us. What the culture is like. Getting them to meet a bunch of people.
A lot of the way we’ve done this is through a bunch of external recruiting programs we’ve built. One of the first ones of these that was created was actually happened the year after I was an intern, I started helping out with it relatively early on when I came back, this was called Women in STEM, which was aimed at bringing a bunch of folks between their senior year of high school and freshman year of college just to come see Jane Street and meet some of the people, and hear a little bit about what we do. And put it in people’s brains that this was, you know, maybe a thing to consider when they were applying for internships down the road.
Indeed, my very first intern was someone that we found through Women in STEM, a woman named Haohang who now helps me run a bunch of the programs that I’m going to further describe. But it really was the case a lot of this is about finding folks that just would never have applied to us in the first place and getting them kind of into our pipeline.
One of the things that really struck me about Women in STEM when we started doing it is it was a good example of playing the long game. Trying to put together a program and tell them about us, but also tell them about finance and teach them interesting things about technology. But this was planning out pretty far in advance, right? These are people who were seniors, or in high school who were about to go to college. And it was gonna be some time before they were people who might apply for jobs here. But we’ve seen it work out, like there are plenty of people who went through that program who ended up eventually coming here. And it’s also…it sort of worked I think directly on people who we’re trying to recruit, and also has, like, a larger brand-building effect.
Yeah. One of the things is a lot of the folks that we might be really excited to have come to Jane Street, if they have a friend who does an internship in finance and has a bad experience, if they haven’t had sort of something good to put against it, they just won’t even necessarily consider a place like us. We found starting early and also having stuff, again, spanning sort of all of folks’ college years has been really effective at getting people in the door as often as possible.
So one of the programs that we built relatively early on that has been one of our most effective is called INSIGHT. And what this program is, is we bring a bunch of sophomore women to Jane Street for a week, we teach them a bit about finance, we teach them some OCaml, we just have them spend a lot of time interacting with full-time Jane Streeters. And through this, we are able to get a lot of those folks to apply to our internship, and then hopefully, you know, people come as interns, and then come back full-time, and the whole pipeline goes from there.
INSIGHT has been sort of so successful we’ve actually expanded it. We have expanded this because the number of women at Jane Street has grown a ton since I’ve been here. I went from being developer number two, to having so many female developers, I no longer know them all. I don’t even know the number anymore, which is, like, again-
(laughs)
… a thing I was tracking for so long ‘cause, you know, it was like, “Okay, we’re at two. Okay, we got another female developer, amazing. We’re at three. We’re at five. We’re at eight.” And, like, you know, I cared about the individual numbers so much. And now, it really does feel like the case that we are actually pretty good at recruiting women into our internship.
Our dev intern class this summer is about 25% women, which is, frankly, not where I’d want it to be. I’d really love that number to be higher. But at the same time, compared to where we started, it’s a massive improvement. And I really want to celebrate the progress that we’ve made, but we’re also not satisfied. We’re going to continue doing more programs, and more events, and trying to recruit more women to Jane Street, and make sure that Jane Street continues to be an excellent place for women to work.
Yeah. Even if it’s not perfect, we’ve made a lot of progress.
That’s right.
So when you run a program like INSIGHT, how do you think about the kind of dual problem of how to attract people, how to get people to apply to it? And then also, among the people who apply, how do you pick people who in a way, they’ll maximize the likelihood of success?
Yeah. A lot of this is amazing work done by our recruiting team to build connections on campus with women in CS groups, and building connections with some of the bigger women in tech organizations like the Anita Borg Foundation. So many years I went to Grace Hopper, I had a booth, and, like, just would grab anyone walking by and be like, “Hey, let me talk about Jane Street and come hear about this thing.” And so a lot of it was done through sort of the work of the recruiting team of finding people and getting our name on to campuses.
And the thing that we’ve found very effective is once we kind of get our foot in the door to school, like, we can get our reputation to spread there reasonably effectively. So a lot of it is figuring out what club we need to reach out to, or how we kind of find an avenue into a school. And this is one of the things that our recruiting team is really, really excellent at. They are excellent at many things, but this is one.
That points at another problem, which is maybe a different kind of diversity, which is diversity in terms of the schools that you reach out to. Can you talk about how you get in and you get a reputation at a school, and suddenly now you have access to a really interesting stream of candidates, but that requires specific investment in a particular school. And there’s a ton of schools, and there are great people scattered across all of these schools. Is there anything that we’ve done in this to try and reach out to, like, people in the long tail of schools that we don’t have time to build particular relationships with that school, but would still be interested in seeing at least some subset of the people there?
I think the answer is the recruiting team does a lot of stuff here, but I don’t actually know.
I don’t know what the answer is in the context of INSIGHT. I know in general we try to do all sort…I mean, just to give a dumb example, like this podcast, is a way of trying —
Yeah.
…to reach out in a way that’s orthogonal to the particular efforts at school. And also, for that matter, reaches out to people who already have jobs and are already employed in places.
Yes. I mentioned the recruiting pipeline for our interns, being things like INSIGHT, and some of the other programs in that space. But one of the things that we are still working on trying to make better is our recruiting pipeline for lateral women. Our intern class is 25% women, but only about half of dev hiring is done through the internship, and the other half is done through laterals, which are, you know, folks who have been in the industry working at places. And the number of lateral women we hire is significantly lower, in terms of percentages. And this is a thing that we’re very actively working on. We’ve been trying lots of different things in this space. I would say we haven’t found a single silver bullet that works, but we’ve tried lots of different things, and some of them had some success, and some we’re sort of still iterating on.
One of the ones that is my favorite that we put together, and I really hope that we can now bring back as the pandemic recedes, is shortly before the pandemic we started hosting brunches for women in tech in the city, just to hang out with a bunch of Jane Street developers and have brunch. Because what happened was we were throwing out ideas for how we might do some reach outs to women across the city who might be interested in a job at Jane Street. And we were like…people were talking about doing a tech talk, or doing a day where we brought people to Jane Street. And I told them at the end… Casey, another female developer, and I were like, “We wouldn’t go to that. What would we go to? We’d go to brunch.” (laughs) So —
(laughs)
…so we tried that. And things like that, where it’s just kind of reputation building, we didn’t think we would want to go to something that was incredibly sponsored and, like, you know, very pitchy and whatever. And so we were like, “Let’s actually just approach this from a relatively altruistic angle. Let’s try and help women in tech in the city, build a community, and approach that as the angle. And then, if it is the case that, great, maybe they’re not looking for a job, but they have a friend who they know who is, this now might be a thing that they might consider being like, ‘Oh, have you heard of this place?’” Again, playing the long game of, like, we’re not doing this as a, “Oh, you have to be looking for a job,” or, “Oh, you have to be, you know, uh, having this thing.” It’s like, “Actually, let’s just try and do a thing that we think would be fun for women in the city in tech. And hopefully, there are good knock-on effects from that over the long haul.”
And, you know, frankly, I just really enjoy meeting a bunch of other female software developers around the city. That was a really fun part of the brunches, and I think was achieving the goal of helping build community, helping build connection. And I think that is a thing that I hope that we get to bring back now that people are maybe willing to be indoors at a brunch again. Or maybe we’ll do them outdoors.
(laughs)
I don’t know.
That’s amazing. Yeah, I actually hadn’t heard about these brunches at all. Sounds like a great idea and something we should totally start up again. And I think your point about trying to generate something that’s legitimately helpful to the people involved is, like, a really important thing and a thing that showed up in a lot of the recruiting programs that we do. Like, we have a tech blog. And it’s obviously, like…the subterranean purpose behind all of that is, like, we want to hire great people, but we also try really hard to make the things that we write and publish, like, legitimately interesting and provide real value for those who are reading.
And I think that’s kind of a thing we’ve done in general. Like, Women in STEM similarly, like, we had this program where the long-term goal is hiring people, but we also tried to make sure it was legitimately helpful and taught people things that would be useful to them, whether or not they came. That kind of playing the long game and trying to make investments that are gonna pay off down the line, and do it in a way that feels legitimate, and connected, and useful to the people involved is core to how we approach this for a long time.
Yeah. One of the things sort of in this vein that I’m incredibly excited about, that we’re running this summer for the first time — I’ve talked a lot about our recruiting of women to Jane Street — another area I’m involved in is trying to recruit more underrepresented minorities in tech in general to Jane Street. This is also an area where we are woefully underrepresented and we want to get better.
So a program that we’re running this summer is for folks who are from underrepresented backgrounds in tech between their freshman and sophomore year. And we built a program that was designed to be purely educational. Part of the thinking about this was how to help people who come from a place where Jane Street does have a strong recruiting pipeline to begin with. Schools where Jane Street already does a lot of recruiting, we have people who have gone through our internship or our interview process, they know how to prep for that and know how to prepare for the interviews. And that just does give you a leg up in the interview process. So our goal was to build something that could help level the playing field a little bit for folks coming from places where they just don’t have that preparation. We’re trying to find a whole group of folks who are relatively early in their careers, and build a program that is designed to be 100% educational.
We’re gonna teach them OCaml for the first several weeks of the program, and then we’re gonna work with them on an opensource project so that they’ll have something they can put on their resume. Because, again, it is often hard to get an internship if you haven’t had something on your resume, like an internship, already. So kind of building a thing aimed to both hopefully help these folks in their sort of… furthering their CS education, and also getting them prepared to be able to get internships during their sophomore and junior years, hopefully a bunch of them with us. But also just in general, our aim of the program is to provide a service and provide a program that will hopefully help all of these folks. And, you know, again, we always have the sort of long game of, like, hopefully some of those people become Jane Street developers. But our aim in building the program is hopefully they’ll get something out of it, whether they come to Jane Street in the long term or not.
One of the interesting things about that program is it’s really connected to opensource work that we do, right?
Yeah.
Because one of the ways we’re making that happen is by having them do projects where they don’t have to, like, come inside Jane Street’s walls. They can work on stuff that we’ve open sourced already, and that simplifies and smooths out that story.
Yeah. And it makes it much easier. Again, if the goal is kind of helping them have something that they can talk with the world about, it’s much easier for something they can point to. One of the things that is interesting about Jane Street is we don’t have a product that you can, like, point to and show to your mom. You can’t be like, “I did this thing.” And, you know, maybe looking at a bunch of opensource code is not necessarily something everyone’s mom would understand, but it is a thing that, like, you can put on your resume and a recruiter can look at and be like, “Oh, there’s a GitHub link. Okay, cool, I can see a thing that this person did.” And I think those things are hopefully really helpful in helping them, again, as they’re progressing in their CS education, and also in their careers.
Do you have any kind of feedback or a sense of how effective these programs have been?
So I think for, um, INSIGHT and then also the version of it we run for underrepresented minorities in tech, called IN FOCUS, they have been incredibly effective. The percentage of women who go through this program who then end up in one of our internships or then come back full-time is something like 20%, which is, for a one-week program we’re running, quite high. If we could have our interviews be 20% effective, that would be awesome.
Are you saying 20% of the people who go through the program eventually come to the Jane Street internship?
Something like that.
That is astonishing.
Yeah. One of the things that’s amazing is a group of people who are not me who have taken over the program since I was involved…so I’m not sure what the all-in numbers are these days, but at least when we were involved in the program I think from our very first one, I think it was 25 students, and five of them ended up in the Jane Street internship over the next few years.
That’s amazing. That’s really a lot of impact to have on the firm’s hiring.
It’s nuts. From the very first IN FOCUS we ran, we had, I think, something like 15 students, and three of them ended up at Jane Street that next summer. Again, we were running this for the first time in the fall, so a lot of people who came to it already had internships. So our hope is that some of those people will be in, like, this year’s internship and next year’s. So getting from just one year of turnaround, it’s been awesome. And, again, we would love all of these numbers to be larger, and we’re, we’re hoping to continue to make them so. But the impact of these things is really large. And the number of people who come through these programs who say, “I never would have considered applying to Jane Street if not for them,” is also very large.
You know, I think one of the women who helps run the programs now is Grace Zhang, who was actually in that first class of INSIGHTers. And, yeah, she said when she was applying, like, “Yeah, like, this isn’t a thing that I would have done before.” And, like, I’m so grateful we have her ‘cause she is amazing. I think the best selling point Jane Street has is just introducing full-time Jane Streeters to people because we are, in general, a very inclusive, welcoming, and friendly place to work. And I think that the people here are fantastic. When people are, you know, asked in interviews, “What is your favorite thing about the place?” People have to struggle to come up with an answer that isn’t, “The people,” because everyone’s answer is, “The people.” And so, that gets a little repetitive, but it’s just so true.
All right. Thanks so much for coming and joining me. This has been great.
Yeah, it’s been awesome. Thank you.
You’ll find a complete transcript of the episode, along with show notes and links at signalsandthreads.com. One thing I wanted to mention is that, as you may have noticed, the pace of releasing new episodes has slowed down a bit. Don’t worry, we’re not going anywhere. In fact, we’ve got a bunch of episodes planned that I’m really excited about. But things have been busy and I do expect the pace to be a bit slower going forward. Anyway, thanks for joining us, and see you next time.