19 Jul 2019 | Vijay Ivaturi, Chris Rackauckas, Viral B. Shah
This is a repost from an ISoP newsletter item
Pharmacometricians have been early adopters of mathematical programming tools. We use tools that allow us to solve real problems. However, given the diversity of problems that we solve, our field has seen a split in the languages that we use due to the problems they are tailored to solve. R has become a staple in the community due to its strong statistics, data manipulation and plotting libraries. MATLAB and its ODE solvers have been the basis for many QsP models. The NLME community is still very much rooted in more traditional tools like FORTRAN, while machine learning is leading some researchers to explore Python.
While having a swiss army knife helps, an alarming trend in this field is the language split between the package developers and users. Pharmacometrics is a discipline where we need to know that the code we are running is doing what the mathematics says, and we need to validate everything. However, the libraries that we are calling are generally written in other languages and treated as an undebuggable black-box. If you’re interested in adding a hook in R’s deSolve ODE solvers for some new QsP model, or want to dig in and find out why Stan’s Bayesian estimation is missing a peak, you will not find R code. Instead, you will find that the core of these libraries are written in FORTRAN 77/90 and C++, and investigating what they are doing requires using opaque and non-interactive languages. This is a requirement because writing low-level code was a traditional requirement for higher speed. Smart programmers who know how to work with SIMD vectorizers, stack-allocation of volatile structs (and how they are sometimes miscompiled), and pointer tricks on mutable buffers compile a binary that the interactive language calls and the details of the algorithm are intertwined with a computer science degree. This is done because the performance difference is not small, and one can easily get performance increases of 10x-1000x by utilizing these “close to the metal” features to optimize the implementation of their underlying loops.
While the efforts of these impressive programs and their programmers are laudable, this split between “languages used to write packages” and “languages used to do science” is not a sustainable way to build a fast, usable, and verifiable scientific computing ecosystem. This is the problem that Julia solves. Code written in Julia is as fast as C or Fortran while being dynamic and interactive like R and MATLAB. Julia achieves this by reengineering what it is to be a high-level language from the ground up, using a type-based multiple-dispatch system that cannot easily be applied to fix languages like R, Python, or MATLAB. This means that a scientific computing ecosystem has been built from scratch to have Julia code all the way down without losing performance. There are thus three things that are gained by using Julia for scientific computing: its user-developer connection, it’s diverse ecosystem which has quickly grown due to the ease of development, and the speed of its packages.
The user-developer connection is an invaluable asset when working in Julia. One could be just a researcher using packages developed by others, but can very easily, e..g. use
@edit on a plot that would open up the Julia file in the library and add their own features in. Within hours one goes from a user to a “package developer”. As this barrier between users and developers is so small, the developers are all there in the user communities. The support system that thus exists in the Julia Slack and Discourse is unparalleled due to the fact that many times your questions will be answered by people who not only know how to use the package but also know the code. The number of people verifying obscure edge cases is also increased when every user is one line away from checking a code that’s in a language they know. This has allowed Julia to grow to have a very well-connected package ecosystem because it’s easy to make packages talk to each other since we are all using the same underlying language: Julia.
The connectedness and ease of development have allowed Julia to grow to have one of the most diverse and feature-rich package ecosystems. Julia packages are some of the most advanced in differential equations, statistics, and machine learning. One of the central packages for pharmacometrics in Julia is DifferentialEquations.jl. This software ecosystem is undergoing [continual advances(http://juliadiffeq.org/news/index.html)] in the areas of parallelism (multithreaded, distributed, and GPU), improved solver strategies like IMEX and exponential integrators, along with advancing the methodologies in new mathematical modeling areas like delay differential equations (DDEs), stochastic differential equations (SDEs), and jump equations (Gillespie). All of the tricks, like adjoint sensitivity analysis for parameter estimation, mixing differential equations into neural networks, automatic symbolic calculation of Jacobians, etc. are all supported by this one library, making it both faster than the traditional libraries while being easier to use for modern modeling practice.
In the statistical libraries, packages like DataFrames.jl, Distributions.jl, and MixedModels.jl are stunning counterparts to the R ecosystem. In fact, MixedModels.jl was developed by the same author as lme4 (Douglas Bates), who now requests users of lme4 to use JuliaCall to call MixedModels.jl from R since the Julia package has both better convergence properties and runs 30x faster. While on the machine learning side, machine learning libraries like Flux.jl and MLJ.jl give full-featured learning pipelines in ways that allow for GPUs and other acceleration opportunities.
And the pivotal feature, in the end, is the speed. The differential equation solvers have the most comprehensive set of benchmarks found at DiffEqBenchmarks.jl and its unique algorithms routinely beat classic C++ and Fortran methods. Users coming from R, Python, and MATLAB routinely report speedups, averaging at around 30x before optimizing the code. For example, this was recently demonstrated on a cardiac QsP model from Pfizer (26x), on a battery efficiency differential-algebraic equation model at Carnegie Melon (30x), and on the test suite at MATLABDiffEq.jl. In cases where there are no similar algorithms in other packages, like high order adaptive stochastic differential equation integrators, the difference is even more dramatic, with a stochastic epithelial-mesenchymal transition model (EMT) besting the standard methods by ~3000x.
Speed itself is a feature that enables models. By being faster and easily parallel, the size of the model can continue to increase and new domains can be explored. This is the central idea behind Pumas.jl, which is a soon to be released pharmacometric modeling suite in Julia. Building off of DifferentialEquations.jl, it allows for the large ODE models of PBPK and QsP models, while allowing nonlinear mixed effects (NLME) population models. The simulation methods can make use of these special GPU-accelerated high order ODE/SDE/DAE/DDE/etc. solvers to enable pharmacometricians to start exploring entirely new models. This is coupled with the ability to do parameter estimation through maximum likelihood and Bayesian methods (HMC and SAEM). A connected suite for bioequivalence and NCA allows one to integrate the whole simulation and estimation pipeline, and the connectedness to the existing Julia package ecosystem allows for one to swap in global optimizers, automatic differentiation, and uncertainty quantification.
Julia is written for scientific computing and its vibrant community involves mathematicians, statisticians, software engineers, and life-science majors. The Pharmacometrics community has mostly revolved around clinicians, pharmacists, life-science majors and consists of very few highly technical scientists from an engineering or mathematical background. Julia in pharmacometrics will provide an opportunity to make our community more inclusive and bring innovation and efficiency that is otherwise left at the mercy of the develop-user split. We are hopeful.