## Part 1: Methods

There is no best method, but most methods end up in quite a similar area. Here's an old review which is still quite good, though some of the language and specific methods need to be updated. The idea is either to develop some loss function against your data or take a Bayesian route. Let's look at the two separately.

Loss functions are an optimization-based approach. You essentially choose properties about the data you want to match. For example, method of moments and generalized method of moments are simply the process:

- Solve
`N`

times
- Compare the difference of some average quantity of your Monte Carlo solution against the data. Make the difference your scalar loss function $L(y)$ where $y$ is your numerical solution.
- Use whatever optimization algorithm to find the $y$ that minimizes the function $L(y)$

Then you mix and match. Use whatever SDE discretization method you like with your chosen loss function and local/global optimization algorithm, and that's a potential parameter estimation method. The simplest is Euler-Maruyama with $L_2$ loss on the expected value:

$$ L(y) = \sum_i \Vert E[y(t_i)] - E[d_i] \Vert $$

for discrete data points $d_i$ at time $t_i$, and then throw that into an optimization package. Inside of that is a parameter $N$ for how many times you need to solve the SDE, and the higher $N$ is the more accurate your expected value is. Doing this on the means is generally referred to as the method of moments approach, where you could also do some $E[h(y(t_i))]$ which is then generalized method of moments.

Very much related to method of moments approaches is likelihood-based approaches. In this case, you assume that your data goes according to some distribution, solve your SDE $N$ times, compare the two distributions, and then use the distance of the distributions (KL-divergence, etc.) or the likelihood to calculate the fit. In this case you get another $L(y)$ which you put into an optimizer. To do this, you have to have some pre-specified likelihood as your input data, and usually this is done by assuming that the data has a distribution and performing a maximum-likelihood fit. For example, you can assume that your data is normally distributed at each $t_i$, then find the mean and variance (those are the sufficient statistics for MLE of the normal distribution), and so then you assume that at time $t_i$ you must have $N(\mu_i,\sigma_i)$, and calculate the loss against this. This approach can make it much easier to incorporate distributional assumptions about the errors, and allows correlations to be taken into account by using multivariate distributions. In some sense, it's a super-set of the previous approach. For ODEs, minimizing the $L_2$-loss is equivalent to maximizing the log-liklihood under the assumption of constant $\sigma$. While I don't know of a result for SDEs, I assume there's something similar here.

Let me make a quick remark that you don't have to sample $N$ solutions. Instead, you can sample from the solution's distribution directly using "exact" methods (see this link or by solving PDEs for the distribution directly, though these methods can have trouble scaling to high-dimensional problems. But yes, mix and match these techniques with choices for loss functions and optimizers and it covers a large amount of what people have done.

The other approach is Bayesian. In this case, you again need to assume some likelihood on your data, but now you assume some prior distribution on your parameters, and then use an MCMC technique where you

- Sample parameters from the prior
- Calculate the likelihood for those parameters (solving the equations $N$ times, etc)
- Use the MCMC rules to choose whether to reject the sample
- Choose new parameters to step to, and repeat.

Just like with ODEs, this process is much more expensive but instead of getting a point estimate out you get posterior distributions for your possible parameters.

While there are quite a few published methods, when you dig into the details it's usually some variant of how to make the choices for the steps in these overarching ideas.

## Part 2: Toolboxes

I do not know of a toolbox in MATLAB or Python for this. Using what I described above you can put together some SDE solver toolbox, such as @horchler's or JiTCSDE, with an optimization routine. However, there are much more developed tools in Julia (including a more substantial selection of SDE solvers, and the built-in parameter estimation tooling from DifferentialEquations.jl currently handles the optimization-based approaches (here's an example), with the Bayesian approaches slated as coming soon.