Listen in on Jane Street’s Ron Minsky as he has conversations with engineers working on everything from clock synchronization to reliable multicast, build systems to reconfigurable hardware. Get a peek at how Jane Street approaches problems, and how those ideas relate to tech more broadly.
A conversation with Laurent Mazare about how your choice of programming language interacts with the kind of work you do, and in particular about the tradeoffs between Python and OCaml when doing machine learning and data analysis. Ron and Laurent discuss the tradeoffs between working in a text editor and a Jupyter Notebook, the importance of visualization and interactivity, how tools and practices vary between language ecosystems, and how language features like borrow-checking in Rust and ref-counting in Swift and Python can make machine learning easier.
Welcome to Signals and Threads, in-depth conversations about every layer of the tech stack from Jane Street. I’m Ron Minsky.
Today, we’re going to have a conversation about the use of Python and OCaml at Jane Street with Laurent Mazare. Jane Street is pretty widely known for the fact that we use OCaml, which is a statically typed, functional programming language similar to languages like Scala and F# and Swift and Haskell, but the story’s more complicated than that. We don’t only use OCaml, we use some other languages, too, and one that we use that’s pretty important is Python, and we mostly use Python for data analysis and machine learning tasks, but it also extends somewhat beyond those domains. So, the topic of the conversation this morning is about how Python fits in with OCaml and how it fits in to the broader infrastructure at Jane Street. One of the reasons I’m excited to have this conversation with Laurent, it’s both that he has done a lot of work inside of Jane Street, both working with Python and on making the Python tool, in itself, better, but also, he has broader experience working with different languages in different contexts, and so, has some perspective on this for more than just his time at Jane Street. So, Laurent, to start us off, can you tell us a little bit more about your experience outside of Jane Street and how that led you to working here?
Yeah. So, my first work experience outside of academia was at a small company called LexiFi, where we were using OCaml to model financial contracts. So, in that context, OCaml was used mostly as a domain specific language to represent complex financial payoffs. I spent a couple years working on the overall infrastructure that we were selling to banks and asset managers. After that, I started working in 2010, at Goldman Sachs in London, as an equity strategist, and there I was using the in-house programming language called Slang, which one can think of as a variant of Python. It’s mostly a scripting language, it was fairly nice to actually quickly see the output of what you were doing, but of course, efficiency was a bit of a concern. I spent a couple years working at Goldman, and then in 2013, I joined Jane Street as a software developer. I worked mostly on trading systems at Jane Street, so back to using OCaml on mostly everything, and there I enjoyed a lot the functional programming aspects and strong type system, when it comes to designing production critical stuff. After four years and something at Jane Street, I actually left to go work for DeepMind, mostly because I was passionate about machine learning at that point, and I wanted to work at leading edge of the domain. At DeepMind, I used mostly Python to do machine learning and with of course, TensorFlow, and I also got to use Swift a bit as a replacement, potentially, for Python. I spent a year working there, and then I came back to work at Jane Street, where I focus more on researchy bits nowadays. So, I would say that I spend half of my time working in OCaml and half of my time working in Python, but these two halves are very different.
Another part of your background is you’ve also done some interesting work on the open source side, can you say a little bit about that.
So, I try to be active in the open source community, and through that I’ve had the opportunity to work on a couple of packages. The most notable one is probably ocaml-torch, which provides OCaml bindings for the PyTorch framework. PyTorch is this amazing thing developed by Facebook, that lets you write some deep-learning algorithms in Python, and leverage your GPU and the power of auto-differentiation. Via these OCaml bindings, you can do that from OCaml, too – so, you add type safety to the mix. I also worked on Rust bindings for it, so, it’s kind of the same thing; in this context, you have bindings in Rust, so rather than writing Python code, you end up writing Rust. I also worked a bit on Clippy, which is a Rust static analyzer, and there, you try to analyze Rust code and find legitimate errors. There are plenty of other OCaml packages that I looked at – bindings for TensorFlow, bindings for Apache Arrow, I tried writing a data frame library for OCaml… and a couple other things. All of them can be found on GitHub.
So, this all reflects that you’ve spent a bunch of time working in a bunch of different places using different languages and ecosystems. You mentioned along the way that here, about half of your time is spent working on Python and half it’s spent working in OCaml, and those two halves are very different. Maybe you can say a little bit more, how are those two different?
Yeah, the first half of my time is spent using Python, and that’s mostly for research purposes. In that context, I will use Python in a notebook. So, for people that don’t know what a notebook is, it’s a kind of web UI, where you have some cells where you input some lines of your programming language, most of the time Python, and you can evaluate each cell in order. The development mode is very interactive in this context. So, you look at the actual output, tweak your code a bit, then evaluate it again, then tweak your code again a bit because you notice that you’ve made some mistakes, and start over again, and again. So, it’s some kind of interactive development. What is fairly nice is that you have a very quick feedback loop between editing the code and seeing the actual outcome. And of course, it plugs very nicely with the very large Python ecosystem of plotting libraries. So, you can actually run your algorithm on some data, and then plug the output, check that it matches your intuition, if not, try to debug, tweak your code, and start over. So, that will be mostly what I use Python for, and on the other side, there is OCaml, and there, it’s for more [general] development work and production critical development, I would say, where you tend to build systems that are fairly resilient. What I’m often amazed by is that you write some OCaml job, you kind of spend a week on it, deploy it, and then you come back to it a few years later and maybe even you’ve left and come back along the way, and you notice that your job is actually still there and still working, which is pretty amazing. Of course, on the Python side, it’s a bit more rare that things would keep on working for a very long period of time.
Say more about why that is. What do you think it is about OCaml, that makes it easier for writing these kinds of more robust tasks, for catching bugs, all of that?
Yeah. So, I think there are two aspects actually to it. One, is about OCaml, and the other one is about the general engineering practice that we have around OCaml. So, it turns out that when you write OCaml code, you usually write proper tests for your things, you think a lot about all the error cases. Again, part of it, because the type system will enforce that, and each function will tell you, oh, actually, I might return a result, but I might also return an error, and you have to think about what is the error is and how you want to handle it. Your goal is, build something that is a bit resilient, so you also are in a state of mind, where you think a lot about corner cases, whereas in the Python world, it’s more of about runtime exceptions. So, you don’t spend much time thinking about what the function actually outputs, and most of the time you use a library for which you don’t really know what the function outputs, it turns out that you kind of guess it, try it, most of the time it works, and you go on with that, you just check kind of interactively that it’s what you would expect.
I’m really struck by your description here. There’s a mix of differences between the two languages, and some of them are fundamental things about the language. Some of them are things about the ecosystem around the language, and some of them are issues about the practices that Jane Street as an organization has around the different languages. So, maybe we could just dig into those somewhat separately. Let’s first talk about the ecosystem. What is it about the Python ecosystem and the OCaml ecosystem that make them different?
Python, I think, nowadays, is kind of the de facto standard for everything that is about machine learning and data analysis. So, if you want to be able to plot your results, again, in a notebook, you have plenty of libraries that do that, plenty of tutorials online that will help you in finding the right way to do it, and if you run into any kind of issue, again, you can just Google things, you will get some results super easily. There is also a wide variety of machine learning libraries available, all the major modern deep-learning frameworks have Python frontends and are kind of released in a Python first manner, so the API that the developers will focus on is the Python one. I don’t think that there originally was any very good reason for Python to be more successful than another language that’s also a bit similar [like Ruby], in that domain, but it turns out that the more you have an ecosystem, the more it attracts people and it snowballs at some point. On the other side, when it comes to a OCaml, there are a bunch of libraries. I think that, in general, libraries tend to have a bit of a better quality but you have far fewer of them. One way to get around this, is that you can bind to other languages, so that’s what we did for TensorFlow and PyTorch, but you can even bind to Python. So, we actually have some Matplotlib bindings when you want to plot something in OCaml, this will call Python which itself, will use this very well-known Matplotlib library to run those plots.
And I think it’s really notable that the first thing you talked about when you talked about the ecosystem advantages of Python is you talked about visualization, and I think the visualization part is incredibly important, and I think fundamental to why notebooks are such a valuable way of working. The basic mode in doing research is exploratory, you’re going off and trying something and looking at the results and trying something else, and looking at the results again, and having a quick workflow that lets you do that, and lets you embed the visualizations in your workflow in a straightforward way… I think that’s one of the reasons why notebooks are such a compelling way to work, and it’s very different from the traditional workflow in which software engineers work, which involves plain text and text editors, and just doesn’t have the same visualization component.
Indeed, though, I would say that these two modes are a bit similar in a way, that on both sides, you want to do fast iteration. In one case, you will have this fast iteration in the notebook. So, you will iterate quickly over the data, get some plot out, and get to actually visualize the output of your algorithm, check a bit some corner cases and iterate again. And in the OCaml world, you tend to do the same thing about fast iteration, but it’s about type safety. So, you save your file, and your build system will tell you, oh, this type doesn’t match, this type doesn’t match, this type doesn’t match. In the end, you would expect your OCaml code to pretty much work once you’ve managed to get things to compile. Whereas, on the Python side, you don’t have this type aspect. So, you’re kind of relying more on experiments to tell you if things are working properly.
I wonder if there’s more of this experimental back and forth on the OCaml world than maybe you’re giving it credit for. A place where I’ve encountered this a lot is in the work that we’ve done into building what are called expect tests. So, expect tests, which is a thing that’s not all that much known outside of the Jane Street context, although actually it was originally stolen from an external idea, the basic idea actually came from the testing tools that the Mercurial version control system uses – but anyway, the basic idea of expect test is you have this kind of unified way of writing code and tests, which in some ways feels a lot like a notebook. You write some piece of code, and then you write a special annotation, which means capture the output of the code up until here and insert it into the text of the code, and then you write some more code and have another place where you capture output. And then when the test runs, it regenerates all of the output and if there any changes, it says, “oh, the output is changed, do you want to accept the new version?” and then you can hit a key in your editor and load those new changes in, and we use that a lot for all sorts of different kinds of testing, but also for various kinds of exploratory programming. So, for example, a thing that I’ve done using this is to do web scraping: there’s some bit of web data you want to grab and analyze and transform and pull some information out of. And that’s very exploratory. You’re often not trying to get it right in a kind of universal way for all time, you’re trying to process some particular kind of documents, and so, you write some transformations, and you apply them and you look at the output and see if it makes sense, and the build system actually gives you a relatively good and fast loop. Where you go and you make a change and the build completes and you get to see the new output. One of the big differences, there’s no graphical component, everything is plain text. One of the big advantages you get out of the notebook style workflow is the ability to be in a webpage and actually display graphical stuff.
That’s right. I kind of agree that expect tests give you some kind of interactive experiments. There, I think you focus more on corner cases, and hard bits for you our algorithm, whereas, most of the time on the notebook experiment in Python, you’re not writing some code that is there for a very, very long time. So, you actually want it to work on your data rather than on any potential data. So, even if expect tests are a bit interactive, I think that it’s still a bit of a different experiment.
So, I’m curious, to what degree you think this is an issue of the programming language or an issue of the tools and ecosystem around the language. If, for a minute, we imagine a world where the tooling for using OCaml in a notebook was all really well worked out, you had all of the nice things you’re used to, from the editor integration experience we have with a language like OCaml, (so like, you can navigate to a piece of code and look up the type of a given expression or navigate to the definition of a given value, all of those things, all of those IDE-like features worked really well), and at the same time, you had all the benefits of using Python in a notebook in terms of the interactivity, and the ability to visualize things, and you had all of the machine learning and data analysis libraries at your disposal… At that point, is there still a reason that you’d want to use Python? Is there something about the language itself that makes it a better tool for some of this kind of exploratory research work?
Yeah. It’s a very interesting question. It’s, of course, a bit harder to answer definitively, because you don’t know what it will look like if you add everything possible on the OCaml aside. Still, I feel a bit that the dynamic aspects of Python have some advantages that you will not have, on the OCaml side. A typical thing that you will do in Python is you write your small algorithm, it actually works for integers, and you’re happy with it, but it turns out that later, you actually want to use floats. So, the fact that in OCaml, to make the type system happy, you will have to modify all the operations will be kind of annoying to you. Whereas, in Python, you would expect things to work most of the time. You will still have this problem that if there are some errors, you might not detect them, but on most reasonable outputs, the checks that you’ve done should bring you some confidence that it’s working reasonably well. And all these dynamic aspects are fairly nice. It lets you try pretty much any kind of algorithm on any data and relying on duck typing, as long as your object implements a proper method, things will work out without you actually knowing what’s taking place behind the hood.
I guess one of the big upsides of working in a language with a static type system is that types provide a lot of guarantees, guarantees that you can’t get from any single run of the program, you can’t get by any amount of testing, because the guarantees of type systems are essentially universal. They apply to every possible execution of your program. So, that’s an incredibly powerful thing, but it comes at a cost. Getting those guarantees requires you to effectively fit your program in to the structure required by the type system. Type systems are good at capturing some kinds of rules and not good at capturing others. So, there’s essentially a kind of negotiation step where you figure out how to play nicely with the type system. And in a language like OCaml, that actually works out pretty nicely. The type system is quite lightweight, it doesn’t require lots of type annotations, and it’s flexible. So, you don’t have to bend your program too far out of shape to make it fit in and to get the benefits of types, but the nice thing about dynamically typed languages is that you get to skip that negotiation step entirely. If your program works in the particular context on the data that you happen to be running on, then you don’t have to think about how it works in a more general context, and when you’re doing exploratory research, writing little bits of code that you get information from and then maybe never run again. I guess the tradeoffs are really different. At that point, being able to operate without types has a lot of appeal.
Yeah. So, the type system in OCaml would indeed give you a proof that your code is correct, and that the code is correct for all possible inputs, but in Python, it’s kind of the default that you don’t get any proof that your code would run correctly. You will only get exceptions at runtime. It turns out that the fragment of inputs that you’re caring about might be very small. So, the times that you would spend looking for corner cases on the OCaml side and fixing them, is actually not necessary on the Python side, because yourself, you know that you won’t have that kind of inputs. Whereas, of course, if you have a typer and a compiler to ensure type safety, the compiler cannot assume anything about that. So, you will have to teach it, “oh actually I don’t really care about this variant, nor about this variant” and so on.
I feel like we should be a little less grand in the sense that it’s not that the type system gives you a proof of the overall correctness of the program, but it does capture certain elements of correctness, and nails them down pretty well. And there’s certain kinds of programs where that gets you alarmingly far, where large classes of bugs just totally go away under the scrutiny of the type system. A thing that’s also maybe worth mentioning is that type systems like OCaml’s actually do relatively little to capture bugs in numerical code. One of the hard things about numerical programs is when you get the algorithm wrong, things look a little worse, in fuzzier ways. If you didn’t do your linear algebra decomposition just right, then it’s not that you get a totally crazy answer, you get an answer that’s less optimal, or doesn’t converge as fast or it doesn’t converge at all. And separating out those bugs, finding those bugs is actually quite hard, and I suspect if OCaml had a type system that was really good at finding that kind of bug, people would be really eager to use it in data analysis contexts, but in practice, the hardest bugs in numerical analysis often just kind of slip by without the type system helping you all that much at all.
Indeed, the bread and butter of Python is numerical algorithms, and in that case, pretty much everything is a float or an array of floats or matrix of floats. So, having a type system that checks that the types are correct might not buy you that much, as you mentioned, as problems are more likely to be the float value is not the correct one rather than the float is not a float, but it’s actually a string. So, in that context, the OCaml type system definitely helps less. You mentioned that you might want some specific type system to catch that kind of error, but that turns out to be quite tricky, even in being able to decide if a float is going to be a NaN, or is going to be an actual float, is not a straightforward problem, and yeah, that’s the kind of thing that by experiment, you will actually catch by looking at the data that you’re taking as input and trying your algorithm, but if you have to enter every possible data, you probably don’t want to check for floats, at any point in your function, it would be too much of a pain.
Right, and I feel like machine learning is split between two kinds of work, some of which is all about fast iteration, and some of which is all about painfully slow iteration, which is to say some of the work you do is taking some big model that you want to train and shipping it off to a farm of computers, and GPUs and such to do a bunch of heavy lifting, and then get the results back at the end, and that seems like a case where you would really, really like to not discover that after like three days of chomping on your numbers, that some trivial bug at the end causes the whole thing to fail, and you realize you were just wrong. That feels like a case where you’d actually rather have more checking up front, if you could get it. Is that a thing that comes up? And is that something where you think, OCaml can be useful in a machine learning context?
Yeah. I think that’s the kind of place where OCaml could be could be nice. Turns out that what will you tend to do in Python is you run your model first with very small input sizes, so that it runs quickly, and you run the whole thing, and you check that it’s able to write the model file in the end, or to write some intermediary snapshot files. But you can still miss some branching in your code, or whatnot, and so you don’t see that, at the end, your model would explode, and you might just not get anything out of multiple hours/days of compute, and in that case, you’re pretty sad. So, having a type system that tells you “oh actually here you say that the filename was supposed to be a string, it turns out that you’ve given me an int somehow,” it’s pretty good to be able to detect that kind of failure. Then it raises a bit the question of what is the ideal system, maybe the actual model should be in Python, because it’s the numerical bits, and you want to iterate quickly on building this model, but once it’s come to more either production usage or training for real, which is kind of a production thing, then you want an actual infrastructure around it, that is resilient about lots of things, so about some computer being down in your compute farm, about the file system having issues, and of course, about silly mistakes that you might have had in your code.
Are you suggesting in this world, you would keep the modeling in Python the whole way through, but just build the infrastructure around it in OCaml? Because taking the thing you wrote in Python and rewriting it again in OCaml doesn’t sound like an enormous amount of fun.
Yeah, and you might have discrepancies between the two things. I think it’s a bit clearer even when it comes to productionizing things. So, you write your model in Python, and even if you train it in Python, at some point, you’re probably going to want to use it on real life things. For Jane Street, it will mean that you might have a trading system that is actually using your thing, and this trading system is likely to be written in OCaml because it has to be resilient to lots of different things. And still, you will want to interface with your model at some point. So, you have multiple ways around that: You can either make OCaml call Python, and use that to decide on the model outputs, take the value back and process things accordingly. But you could also just try to replicate everything, so that things might be faster or you might have more type guarantees, but of course you have two different implementations. So, then you have a new problem, which is how do you ensure that the two implementations stay in line over time. It’s not something great to have, but we’ve done it for a couple things, and one way we have around that is, when you produce a model, the model outputs itself, like the configuration of weights that you’ve finally reached, as well as some sample values, and on the OCaml side, when you deploy the actual model, you will run with the configured weights and the sample values and check that you obtain the exact same results.
Some of what you were talking about there had to do with the tooling that we have for calling back and forth between Python and OCaml, and that’s actually an area that you’ve done a bunch of work on. So, can you say a little bit more about what we’ve done to make interoperability between the two languages better?
Yeah. For that, we rely a lot on a library called pyml which is open source and which is very neat. This library, it’s mostly designed so that you can call Python from OCaml. So, again, a typical example is you want a plotting library in OCaml. So, you want to use Matplotlib because, it has its own issues, but it has tons of features, and it works fairly well. So, you just call on the Python runtime from your OCaml runtime, but it turns out that you can also use it the other way around, and it’s the kind of use case that we have the most of at Jane Street. When you write things in your Python notebook, you want to be able to access the various Jane Street services, to get some market data, to get, actually all the data that you can think of, in your Python notebook and even to trigger some actions or to publish some values to all different systems. So, for that, it’s more calling OCaml from Python and again with pyml, what you can do is compile your OCaml code in a shared library and this shared library you would load it from the Python runtime, you will code the starting point of the OCaml code, which would bring up the OCaml runtime and the OCaml runtime will register lots of functions for the Python runtime to be able to use. This makes it possible to actually write Python wrappers around OCaml functions in a way where you almost don’t have to write a single line of Python. So, for people that do not use Python that much, that’s very nice and also, when you want to build functions that are used a lot by lots of different people on inputs that you might not have thought of, you’re actually pretty happy to do that in the OCaml world rather than to do it in the Python world.
And the key reason it’s important to do this without writing any Python is: there are lots of OCaml developers who are developing libraries, that would be useful to Python programmers who themselves basically don’t use Python at all. So you want keep down the amount of work that’s required for non-Python developers to explore things and make it available to the Python world.
Yes. Because if people don’t run the Python command line to test things or write small wrappers on a daily basis, they might not remember how to do it, it’s a bit of a pain. Whereas, if you do everything on the OCaml side, people in the development area are very familiar with doing that.
Right, and it’s maybe worth noting, that this is a kind of thing that we’ve done in multiple different places, what you’re describing is the ordinary thing that you have in any kind of higher order language, whether that’s a functional programming language, or an object-oriented language, where you’re essentially shipping around pointers to bits of code, and it’s pretty common to have things which, when you stop and think about them, they’re just calling back and forth multiple times across these different levels, and then, what’s interesting about it in a Python context is we’re now calling back and forth between two different language runtimes, not just between two different functions in the same language. And we do exactly this for our OCaml Emacs integration. With Emacs, we wanted to write lots of extensions, but we really didn’t want to write them in the native language that Emacs uses for this called elisp. Because people found it harder to test, harder to teach and harder to engineer, and so we wrote this layer called Ecaml, and now pretty much all of our Emacs extensions are written in OCaml, but you still need to interop with the basic Emacs functionality and library, so, in that sense, it’s very similar to the Python story, and you have exactly the same story where you have programs that are constantly bouncing back and forth between OCaml and between Emacs Lisp in that case, and hilariously, we’re now doing the same thing with Vim. So, you have the same kind of technology getting built over and over.
It’s pretty amazing when you build that kind of system, and you notice, the first time that it works. It sounds pretty weird that somehow, you’re able to interface that, having layers of Python calling OCaml calling Python calling OCaml, and somehow, even if you do that hundreds of times, it ends up just working. Though there are some nasty aspects to it. When doing the integration of some OCaml function, we actually run into a pretty funny bug because of that, which was as follows: OCaml has a garbage collector, so the OCaml runtime from time to time, stops the world and collects what is not alive. Here, by “alive”, we mean the values that can be reached by their runtime, so it’s also values that are actually useful to the user still at that point and all the rest is the garbage that can be collected and removed. Whereas, in the Python world, you use reference counting, so on each object, you keep a counter of how many times this is used, it has nice benefits that you can really see as soon as possible, but it has a slightly sad aspect which is if you have a cycle, then memory is lost. So, you still need a garbage collector that you run from time to time to detect cycles and to actually remove them and the bug that we actually encountered was because you had a cycle between the Python and the OCaml runtime. So, you had an OCaml object that was itself pointing at a Python value and the Python value was pointing back to the OCaml thing, and that thing cannot be detected by either of the garbage collector. Each of them will tell you, “oh, actually, that thing is being used, I don’t want to remove it”, but overall, you could remove the old value. It turns out that this was wasting tens of megabytes because of that and so we noticed it pretty quickly and fixed the thing, but there are some small drawbacks that you can have because of this.
This is exactly the classic problem you get when you have two garbage collectors interacting, and we have exactly the same bug in the Ecaml story. So, I think it is a fundamental problem that’s hard to fix.
I want to go back to this question of how do we think about Python and OCaml, and to ask you, when you’re going off to engage in some programming task, how do you think about the choice between whether you want to use Python and whether you want to use OCaml?
Myself, I would think that when things are there to last, and it’s some code that you want to still be working in multiple weeks, or months, then OCaml is very neat. Also, there are specific domains where OCaml shines when it’s about manipulating symbolic values. Of course, trying to do that, in Python would probably not be that much of a good idea. One good example of manipulating symbolic values is writing a syntax extension. In that case, OCaml’s type system does a really good job of helping catch errors in code that’s going to inspect your program and generate new code depending on what it finds, and making sure that you don’t forget to handle all the different kinds of syntax that show up in the language.
Although I’d say that the kind of example you gave about writing programs that manipulate programs is both a very good example in the sense that it highlights something OCaml’s really good at, but also, I feel like, in some ways, a bad example, because I think it understates how often this kind of programming comes up. To try and frame it in a somewhat more general context: I feel like the kind of places where OCaml works really well is where you have combinatorial structure that involves just differentiating between lots of different cases. If there’s different ways that your data might be laid out, and you want something that helps you do the case analysis and make sure that you capture all the cases. OCaml’s incredibly good at that. OCaml’s pattern matching facility gives you a way of just exhaustively saying, “what are all the different ways this data might be shaped” and making sure that you cover them all. That for sure shows up and shows up a lot when you’re doing compiler style work, or generally things that kind of feel more like program manipulation, but actually shows up a lot in various kinds of systems programming tasks; I’ve spent a lot of time working on the insides of trading systems, for example. And there’s actually lots of places where you want to think in exactly this kind of combinatorial way and where things like OCaml, do a really good job of catching those bugs.
Yeah, yeah, it’s definitely the case. You mentioned trading systems, and indeed, it’s quite challenging. Building trading systems in Python is not the best idea.
You were talking before about how we build tools to make it possible to call into OCaml from Python, and the kind of examples you pointed to are cases where you want to consume data where the primary wrapper of that data is some bit of OCaml code or to publish data, again, through some path that’s in OCaml, but I think it’s also useful to be able to invoke computations, pure computations that exist on the OCaml side, I think for two reasons. One is, because it’s way more efficient, as you’ve been highlighting, but also because writing the code, inside of our ecosystem, writing the code in OCaml means that you can write the program once, write the computation once, and then share it on both sides. Being able to have all of the core computational stuff available on the OCaml side and then being able to expose it in Python is a nice way to allow you to stop repeating yourself and having to rewrite things on both sides.
Yeah. Efficiency’s definitely something that we care about, and that would be far better on the OCaml side. Also, if some code is indeed to be called by lots of different people, and again, these people might have ways to call it that you will not have thought about, in Python, it will be pretty challenging, because your test suite will have to cover pretty much every single corner case, and still people might at some point rely on some specificities of your code, and wrap around that, which ends up being quite a mess. You don’t want to write in Python some code that is going to be queried in lots of different usage scenarios. Whereas in OCaml, you’re kind of forced by the type system to think about all these different scenarios and you want to get that correct. It’s not worth investment for a one-off, but if it’s for some code that is going to be shared across tons of people, it’s probably worth more of the investment.
When people think about the difference between dynamically typed languages and statically typed languages, they often think of, well, in statically typed language, you have the type system to help you and in dynamically typed languages, you have to write a lot of tests, which is, I think, not quite the right way to think about it. I think in a statically typed language, you still need tests, but they have almost a kind of snap-in-place property, which is: if you write your program, and can show that it works on a few key cases and catch a few of the corners, then the overall behavior tends to snap into place, and you get, almost like, wider coverage of your tests, by having the type system make the behavior of the program in some sense more rigid, and so, then you can just get by with a much lighter testing story. So, you still need to do tests, but you don’t have to be nearly as exhaustive in the testing, as you do in a language where tests are the only thing that are nailing the behavior of your program in place.
So earlier on when you were talking about the trade-offs between OCaml and Python, you talked about how, well, on the OCaml side, we have all this rich testing infrastructure, and on the Python side, you know, there’s much less of that and less practice around that. First of all, I imagine some professional Python developer listening to this conversation and saying, what is wrong with you people? Like, obviously, when you write Python code, you should have good testing infrastructure. I’m wondering if there’s stuff that we should be doing and maybe are in the process of doing to improve the tooling story that we have internally around Python, and to make more of a culture of testing the things that we do write?
Yeah. We are certainly learning a lot on the Python side, on the OCaml side, things have been polished by the years, and are very, very efficient. On the Python side we are discovering – testing – having a bit of type annotation, and generating automatically user documentation and so on, and you have to pretty much redo all the things that have been done for OCaml at Jane Street. So, we are going through that and trying to focus first on the most important bits, but we’re certainly not there, and I can imagine that our best practices are pretty far away from people that are doing lots of Python. I also think that when it comes to actually using Python not alone to build your system, but more as some kind of glue around some OCaml components, the testing story probably has to be less involved on the Python side. You still want your OCaml components to be well tested. You still want your intermediary layer that converts Python calls into OCaml calls to be of course properly tested. But you have far less of a possibility of bugs on the Python side because what we ship on the Python side is fairly limited. After that, the user is going to write 10s of lines, perhaps a bit more around it. But hopefully, not so much that there will be tons of possibilities for bugs. Though, we’re quite commonly surprised about how easy it is to sneak some bugs in a very, very short amount of code.
That is a thing about which programmers will never stop being surprised.
Yeah, it’s pretty impressive, and it’s also impressive when you start to rely on Python libraries. So, of course, we rely a lot on something called NumPy to represent multidimensional arrays, and also a data frame library called Pandas that lets you represent a matrix where each column has a heterogeneous type, and you can have a column of strings, a column of times, a column of floats. And this library is super powerful. It works very nicely. It lets you analyze your data very quickly, but it has very weird corner cases, and you might just notice that the code is actually not doing what you would expect and learn it the hard way. Because the only thing you noticed in the end is, oh, the error that I get while honing my simulation is not what I would have expected… it’s pretty bad. So, now I have to dig down in my code, annotate things quite a lot, and try to understand why the input and output of this Pandas library is not what I would expect, to finally Google that and discover that it’s a well-known gotcha around the library. That’s only for the case that we know about where the error was large enough for us to discover, of course; you have cases where you won’t even know about it.
This reminds me a little bit of what data analysis is like in Excel. Excel is something like the equivalent of the Python notebook. The program that you write in there is a mix of the little expression language that goes into the cells of a spreadsheet, and VBA that you write, for manipulating and doing various things that don’t look like simple computations that go along with it. Again, it has all the benefits that we were talking about, which is, it gives you some ability to visualize the data, this cell structured way of looking at computations, is actually in some ways, incredibly powerful. You look at someone who’s good at a spreadsheet, they’re able to very quickly do all sorts of rich computations in a way where they can see all the intermediate pieces and in the transformation the numbers become actually much easier to follow through because all of intermediate computations just kind of laid out in a big grid. And also, it has a bunch of baked-in totally terrifying, weird behaviors. I’m reminded of this recent news where some organization in the academic genomics community changed the names they used for various objects that show up in spreadsheets, because I think some of them were being interpreted as dates, which caused all sorts of crazy things to happen in the middle of spreadsheets.
Another example that we worried about a bunch, when it came out is a stock called “True”, and there was a bunch of worries – because Excel does a bunch of stuff where, again, related to the fact that it’s dynamically typed, where it tries to infer what kind of data you have, from what it is that you typed in, and this is incredibly helpful in lots of contexts and also absolutely terrifying in other contexts. And because the work that’s being done is numerical, when it fails, again, it fails in this kind of soft way a lot of the time where like the numbers aren’t quite right, and you may just not notice and your result may be a little less optimal than it should be or give you not quite the real answer. That’s a thing that we worry about and trying to be pretty careful about and defensive about when we write spreadsheets, and there’s stuff in the outside world where well-known results had to be retracted because of subtle bugs in spreadsheets.
Myself, I tend to make a lot of fun about Excel, because I live in the Linux land, I don’t like logging on a Windows computer. I don’t use Excel almost at all, but still, when you go and see, when you have a question for someone working on trading at Jane Street, and you see them manipulating Excel, it’s pretty impressive, because people tend to know all the shortcuts, and are able very, very quickly to actually get their data out of Bloomberg, plot them in Excel, and basically check that the values are in line with what they would expect and just send them to you. Whereas, if you were to have written OCaml code for that, you would probably have spent multiple hours on it. So, yeah, Excel definitely has its advantage, it also has a great reactive model, like having these cells, when where you modify your cell, everything gets recomputed, and that’s very nice. Python is definitely kind of in the middle, you’re able to do more computations there, it’s a bit more efficient, and it can handle larger data sets, but you lose a bit of the, oh, I can basically eyeball all the data at once. Of course, in Python at some point, what you’re manipulating is too big, so you only look at statistics and the extremums, that kind of thing. So, it’s far less good than what it will be when using Excel.
Also, Python notebooks lack a nice property that Excel spreadsheets have where in Excel spreadsheets, every cell is a computation, which makes references to other cells, and Excel keeps track of this graph of computations and how they depend on each other. That’s useful both for performance reasons – there’s this kind of incremental update, if I change something, it will refresh the computation, and just do the minimum that it needs to do to refresh things, so, it has a nice built in incremental computing model – but it’s also good for correctness reasons, it can actually refresh all the things that need to be refreshed whenever anything changes, so you know that your spreadsheet is in a consistent state. Not so with a Python notebook. In a Python notebook, you have something similar, you have like little chunks of code, which are almost like the equations, and then you have various visualizations that are interspersed between them, and these chunks of code depend on each other, but you have little buttons you can press to decide which ones to rerun when, and so you can look at a notebook, and you have no guarantee that it’s in a consistent state, and so that’s a thing that’s always struck me as kind of nerve-wracking about this model of programming.
Excel tends to be more functional than what Python would be. You have far less of a notion of state that can be mutated, it’s just this graph of computation, and that’s actually fairly neat. A big problem with notebook is what you mentioned: you can run each cell, cell may depend on the current state, and it’s very easy to create a notebook state and not remember how you actually reached that state, you might have executed a cell multiple times, and you might have executed the third cells before executed the second one, and you would have to redo everything again, if you wanted to be able to go back to the exact same state. So, for efficiency reason, you don’t want to keep this big graph and rerun everything, but it’s kind of a problem because when you reach some confusion from your notebook, if you’re not able to restart from scratch and rereach the same conclusion easily, it’s actually annoying.
Is there any work in the direction of trying to solve this problem with notebooks to make it so you could have notebooks that had more natural incremental models so that you didn’t have to choose between efficiency on one side and correctness on the other, pick one? Is there any work to try and make that better?
Yeah. I think there is a bit of work but nothing that has a level of adoption of the main Jupyter/iPython thing. So, a thing that can be mentioned, there is an alternative to Jupyter, which is called Polynote which is made by Netflix, I think. It has the same kind of drawbacks, except that it’s far more explicit about the notion of state. So, at least, you would see the variables that you’re seeing defined, and when you go back to a cell, it tries to put you back in the state that you had when editing this cell, so only taking into account what was there previously. But of course, because you want to do that efficiently, you don’t really handle aliasing correctly. So, if you’re doing deep mutation inside the object, I don’t think that this is tracked down properly and just kind of the first layer of object is as it was at that point in time.
I see. So, it tries to bring you back, but it’s not a sound method, it’s not guaranteed to always work.
Yeah, because it would explode probably.
Besides that, there is a nice thing, but it’s very experimental at this stage, in the Julia world, which is called Pluto. It will notice all the dependencies, and whenever you edit a cell, it will recompute all the cells that are depending on the results. So, this works, I think, only for Julia at this stage, but it’s pretty neat. It might be the case that you want to rerun a cell and not rerun things that would be too long to run. So, it’s a bit of a user interface problem at some point.
Yeah. Although, I guess you could just be explicit, you can say I reran the cell, and now if I could keep track and visually notify the user that the following cells are, in some sense out of date. At least if you understood and could expose to the user, so they knew which part of the computations were reliable and which parts were not reliable, that would already move you a big step forward.
You mentioned at the beginning, that you had spent some time when you were at DeepMind working on Swift as an alternative language for doing machine learning work, and the way I understand it, part of the story there, I think, is about auto-differentiation. Could you say a little bit more about what the ideas are behind using Swift as a language for doing this kind of machine learning work?
The idea there is indeed a differentiable programming. So, what you tend to do a lot in modern machine learning is doing gradient descent. So, you have a function that you want to optimize, so it does some parameters and you want to find optimal values for these parameters. You have a notion of loss, and you want to find the parameters that minimize this loss. And the way to do that is by gradient descent. So, you would just use the Newton method of computing the derivative, except that in this case, the derivative is with respect to lots of different variables and you follow the slope down until you think you have reached the minimum. So, of course, what you discover is a local minimum.
Right, and for people who don’t spend a lot of time doing machine learning, the way I always think about this is, you have some function you want to evaluate, think of it as like a series of hills or whatever, and gradient descent is just like the thing that like a rolling ball does, it just goes in the steepest direction, instead of being at the top of a hill, you’ll find yourself at something that looks like a bowl in its shape and that’s exactly a local minima. Having the ability to compute the derivative, particularly in a very high dimensional context, where instead of being you know, a two dimensional picture like the one I have in my head, you have 200 dimensions, or 2 thousand dimensions or more, having that derivative there is a very powerful thing.
If you were to compute the gradient numerically, that would work with a couple dimensions, but if you have a model that has millions of parameters, you don’t want to compute the gradients numerically. So, having symbolic gradients is far better and that’s where differential programming is actually very helpful. So, you can either use a library like TensorFlow or PyTorch, which will build the graph of the computation for you, and do the symbolic derivation of this computation graph. But you can imagine pushing that one step further, and just doing that at the compiler level. So, at this point, where you define a function to take as input lots of float values, and returns a float, you can imagine that automatically, you generate the gradient for this function, because as a compiler you know that this function is combining the addition with some subtraction, and lots of other functions for which you might have already computed to gradient.
To maybe make this more concrete for people, the notion of symbolic differentiation is basically exactly the kind of differentiation you ran into in your calculus class. You know, there were a bunch of rules that you could apply, you write down some expression, and there are rules like the derivative of x squared is 2x and rules about multiplication and addition and composition. It turns out this actually very old insight about programming languages – so called automatic differentiation is 30 years old – it showed up in lots of languages in the past, including ancient implementations in Fortran, and Lisp, gives you a way of just saying, oh, let’s just take this idea and generalize it and apply it to programs, not just programs that represent simple expressions, but programs that do iteration and recursion and all sorts of crazy stuff.
Yes, and of course, there are lots of challenges along the way, because it’s easy to differentiate an addition and multiplication, and the composition of multiple functions, but you have functions that are far harder to differentiate. You mentioned, oh, I code my function recursively. Or I might have an if branch, things that are kind of discrete, rather than continuous, and then you have to decide about what is the proper differentiation for that. So, it turns out that modern machine learning based models, combine functions that are all reasonably easy to differentiate. So, if you just write a model in a programming language that supports automatic differentiation, you will actually not need the library to compute the gradient for you, the compiler will return you the gradient because you only used operations that were easy to differentiate and being able to compose that for pretty much all the function in your program, that might let you optimize things that you didn’t think you would be able to optimize.
The work you’re talking about is trying to take this idea of automatic differentiation and apply it to Swift. Can you say a little bit more about what kind of language Swift is and why people looked at Swift as the language for doing this?
Swift is a programming language that originated from Apple as a replacement for Objective C, and the person that worked on it, Chris Lattner, was working at Google. He focused on what could we do with modern programming language theory in the machine learning world. And it turns out, he tried a few languages, and he tried what can I do with Swift? What can I do with Haskell? What can I do with the Rust? What can I do with Julia? And it ended up being Swift that was selected to build a prototype, which was probably a good choice, because he was leading the project and was very familiar with it. So, Swift is a compiled language. It has some very nice, functional aspects. So, you have the equivalent of type classes, except that it’s called, I think, protocols, it feels like you have sum types. It feels like a modern programming language.
The sense I get is a mash up between something like Objective C and something in the Haskell, OCaml, F# kind of world, right? I think the Objective C stuff was kind of just needed, fundamentally, to be compatible with the whole iOS world.
Yeah, I think that’s indeed the case. There is this, “I want to be able to be attractive to Objective C users and to interact with Objective C systems”, but besides that, it has lots of modern features and feels quite close actually, to OCaml, when it comes to the type system. And yes, Swift is compiled, it compiles down via LLVM, and there is a project that people were looking at and are still looking at Google is, let’s take some Swift code and rather than compiling it for CPU, let’s compile it for different hardware, let’s compile it for a GPU. So, we know that in this Swift code that I’ve written, I have some big matrix multiplication, addition and so on, and I don’t want to actually target a CPU for that, I want to target a GPU, or even these fancy TPUs that you can rent from Google online.
So, you want a compiler to actually extract the part of the code that will run on your CPU, extract the part of the code that can run on the GPU or on the TPU, and do the actual data transit from some part of my system like the main memory to the GPU memory, or to the TPU memory. So, that’s actually a challenging bit, and the other aspect is having automatic differentiation. So, there you want, again, to be able to compute the gradients, and of course, you want to show that on the GPU or on the TPU, or whatever your hardware is. So, it’s up to the compiler to decide and you just annotate a function and say, this function, let’s produce the backward pass for this function, roughly its gradient, and the compiler will happily create at compile time the gradient function, provided that all the functions that you use underneath, and all the constructions that you do are compatible with that.
Is there a reason that Swift is a better language to try this in than Python?
Python is, again, very dynamic, and very stateful. Swift is closer, as we said before, to OCaml, and to something functional. So, you have less of a notion of a state, you tend to have more pure functions, and computing derivatives, of course, would make more sense in a world of pure functions, than it would, if you start having some state. Also, the compiler has far more information about what the function does. It doesn’t mean that it’s not possible to do in Python, actually, a framework like PyTorch, lets you annotate Python functions with just-in-time information that tells the PyTorch framework to try to compile the function and compile it’s backward pass from the actual Python representation of the syntax tree.
One advantage that also Swift has, it’s not with respect to Python, but it’s with respect to OCaml, is that Swift actually uses reference counting rather than a GC, and that turns out to be actually pretty important. When you’re training a big machine learning model, one of the challenges is to free the memory as soon as possible. Your model is allocating huge matrixes or tensors in general – on the GPUs they can take multiple megabytes or even gigabytes, and your GPU only have, dunno, 24, perhaps, 48 gigabyte of memory if you have a very, very fancy GPU – so, being able to release the memory as soon as possible, is something very much worthwhile. And that also explains the success of Python when it comes to machine learning, the fact that it does reference counting helps with that. That also explains why Swift is actually a bit better than OCaml to do that kind of work at the moment. It also explains the same thing for Haskell. So, I heard that Haskell was getting linear types and linear types is another way to collect the resources as soon as you can.
Yeah, that’s right. In fact, we’re looking at doing very similar things in OCaml. We’ve actually, shockingly enough hired someone where a big part of their goal is trying to get the work on algebraic effects to get completed where, not to go into too much, but algebraic effects is another kind of type system level feature, which among other things will make it possible on top of it to build things that look more or less like management of resources, and in many ways lets you capture some of the same resource management that languages like Rust have. And I think this goes back to the basic fact that garbage collection is great for managing memory, and managing the one big resource of the shared memory that your whole program uses, but garbage collection is a terrible way of managing other kinds of resources. A classic example that you don’t need to think about machine learning to understand is file handles, like sometimes people will set up their programming language so that their files will get closed when the garbage collector gets rid of the files, but that’s terrible because it turns out, you have some shockingly low limit on the number of open file descriptors you can have, and your program will just crash because suddenly you can’t open files anymore, because your garbage collector didn’t feel like it had to work that hard to collect memory, and therefore wouldn’t collect this other completely different resource of file handles. So, for things like file handles, or GPU memory, or various other kinds of external resources, where there aren’t just like another chunk of memory on the heap, you really want something else.
Yeah. Yeah. I feel that it would be very amazing to have that kind of possibility in the OCaml world, at some point. Every scarce resource indeed, you don’t really want the GC to be accountable for it, and you want it to be grabbed as eagerly as possible.
So, I mentioned that I worked on some PyTorch binding for both Rust and OCaml, and of course, the Rust one is very nice, because you get all this borrow-checker magic, that will ensure that your resources like your tensors are released as soon as can be. Whereas, on the OCaml side, most of the code that I’m writing that uses PyTorch, I force the GC to trigger on every training loop because I want the resources to be collected. So, being able to mention that for some variables you want them to be tracked in a more accurate way, is something that would be fairly neat.
It’s obviously ongoing work, but I’m pretty excited about having OCaml being a system where, by default, you have a garbage collector, but that you can, in a focused way where you want to do precise resource management, have it checked and enforced at the level of the type system, and my hope is that that will end up being a system which gives you a lot of the things that are most attractive about Rust, but is all-in more ergonomic, because you don’t have to do the extra work of thinking about explicit tracking of the memory, except in the cases where you really need to do it. But obviously, this is like future stuff, things we’re hoping to get to, but it’s all vaporware at this point.
Having a world where the default is actually, you have a reference counting, or you have a GC, but only for some resources you have actual proper tracking of the memory, that would seem like the ideal thing to me.
Yeah, and if I remember correctly, Rust actually started the other way around. The earliest versions of Rust, I think, did have a garbage collector in there, baked into the core system, and over time they got removed and garbage collection became a library that you could add on. One thing to say about all of this is, this is in some sense, cutting edge stuff, and there’s a big design space, and I think we as a community of people working on programming languages are just starting to explore it.
Yeah, and there are some ways around it in the language, but having it properly supported inside OCaml will be neat.
Well, thank you for joining me. That was a really fun and unexpectedly wide-ranging conversation.
Thank you for having me.
You can find links to some of the things that we talked about, including some of Laurent’s open source work, as well as a full transcript of the episode, along with a glossary, at signalsandthreads.com. Thanks for joining us, and see you next week.
A set of techniques for numerically evaluating the derivative of a function specified in a program.
Or, language binding. An application programming interface that allows one programming language to use a library from another programming language.
When two or more objects refer to each other in memory, they are said to form a cycle.
A form of type checking that uses the presence of certain methods on and properties of an object to determine its suitability for a particular purposes.
Or, dynamic type checking. The process of verifying a programs' type safety at runtime.
The principal actor in a form of automatic memory management.
Graphics Processing Unit, a special processor optimized for computer graphics.
A type that ensures an object is used only once, meaning its memory can be safely freed after use.
Low Level Virtual Machine, the compiler framework used by the Swift language (among other languages).
A plotting library for Python
A mathematical library for Python supporting large matrices and high-level functions.
A Python data manipulation library, focused on data structures for manipulating tables and time series.
A function that will always return the same value for given arguments, and has no side effects.
An open-source Python library for machine learning based on the Torch library.
A machine learning paradigm concerned with building models by looking at how an agent responds to rewards and state changes with every action.
Or, static type checking. The process of verifying a program's type safety properties by analysis of the source code.
A language can be said to be strongly typed if it has strict typing rules at compile time.
An open-source Python library for dataflow and differentiable programming, primarily used for machine learning purposes.
Tensor Processing Unit, a special processor developed by Google for neural network machine learning, further optimized for use with TensorFlow.
The extent to which a programming language prevents or discourages type errors.