fix: curve fit nonscalar initial guess and bound #3452

jokasimr · 2024-05-23T07:30:30Z

jl-wynen · 2024-05-23T08:46:58Z

src/scipp/curve_fit.py

-            b[1].to(unit=unit, dtype=float).value,
-        )
-    return b
+    le, ri = (v if isinstance(v, Variable) else scalar(v) for v in b)


This is odd. Previously we required that either both bounds are variables or both are numbers. Now they can be mixed. Normally, we don't allow this. See, e.g., linspace and arange which can take either variables or numbers, but never a mixture.

I think curve_fit should not deviate from this.

What's more, scipp.scipy.optimize.curve_fit does not allow this. It will only lead to confusion when the two functions deviate from each other.

I agree it's better to have them both be scalars or both be variables.

There's also a bug here, if inf is passed as the bound, you will get a unit error unless the associated parameter is dimensionless.

I wouldn't call this a bug. This is consistent with how we treat raw numbers elsewhere. They are treated as dimensionless.

Yeah you're right

jl-wynen · 2024-05-23T08:51:36Z

src/scipp/curve_fit.py

-def _as_scalar(obj, unit):
-    if unit == default_unit:
-        return obj
-    return scalar(value=obj, unit=unit)


Related to the discussion about mixed variable / number bounds: Deviating in the behaviour of params from scipp.scipy.optimize.curve_fit will be confusing.

Also, this is potentially in conflict with the signature of the fit model. Given a case like this

def model(x: sc.Variable, a: sc.Variable, b: float) -> sc.Variable: ... curve_fit(['x'], model, da, p0={'a': sc.scalar(1.0), 'b': 2.0})

curve_fit will call this with b = sc.scalar(2.0). This violates the expectations of the author of model. And it effectively limits the signature of fit models to only allow Scipp variables as arguments.

It is possible to make that limitation. But then it should be clearly documented that fit models are only allowed to take variables as arguments. And again, this should then also be done for scipp.scipy.optimize.curve_fit.

The main argument for always converting parameters and bounds to variables is that having parameters and bounds only be scipp variables reduces complexity, and reduces the number of combinations of argument types that we need to test for.

I don't really see the benefit of allowing non-scipp parameter values. Do you have a situation in mind where that would be useful?
If the motivation is that we want to avoid having to wrap the fit function I think that's not so useful because
1. wrapping a function is very easy to do in python, and
2. wrapping will have to be done in most situations anyway, because only rarely will your coordinate names and parameter names match the argument names of the function you are calling.

Agreed about the documentation.

About scipp.scipy.optimize.curve_fit maybe we should consider removing it to avoid having to keep them in sync. We don't have to remove it from the api, I think it should be possible to implement the scipp.scipy.optimize.curve_fit in terms of scipp.curve_fit.

I mainly brought this up to make sure that you considered this issue and that the interface is clearly defined. I don't know of a concrete use case. Fit parameters will always be floats, so wrapping them in a variable is always possible.

Concerning removing scipp.scipy.optimize.curve_fit, I'd be very careful because

Its interface is simpler and sufficient in most applications.

You need to make sure that curve_fit is not significantly slower. Bear in mind that applications like vanadium peak removal need to invoke the fit function many times. And scipp.scipy.optimize.curve_fit is already almost 2x slower than using scipy directly.

And scipp.scipy.optimize.curve_fit is already almost 2x slower than using scipy directly.

Yes performance is an issue for small fits.
The overhead for a very small 1d fit is (on my computer):
* 920us for scipp.curve_fit
* 510us for scipp.optimize.curve_fit
* 230us for scipy.optimize.curve_fit

import scipp as sc from scipp.scipy import optimize import numpy as np import scipy as sp t = sc.linspace('t', 0, 10, 10) da = sc.DataArray(t**2 + t + 1, coords=dict(t=t)) %timeit sc.curve_fit(['t'], lambda t, a, b: t**2 + a*t + b, da) %timeit optimize.curve_fit(lambda t, *, a, b: t**2 + a*t + b, da) %timeit sp.optimize.curve_fit(lambda t, a, b: t**2 + a*t + b, t.values, da.data.values)

Its interface is simpler and sufficient in most applications.

I think it is marginally simpler for 1d cases, since you don't have to pass the name of the coordinate that is used in the fit. But I don't think it is sufficiently much simpler that that's a good argument for keeping it.
I think having two functions with the same name doing similar things is a bigger source of confusion.

But I agree that the performance difference for small fits might on it's own be a sufficient argument for keeping it.

Is the peak finding a significant portion of the runtime in the diffraction workflow?

Is the peak finding a significant portion of the runtime in the diffraction workflow?

Probably not. The current plan is to run it offline and store the results. It was meant as an example to remind ourselves that the overhead of calling the fit can be relevant.

I'd argue that a rare 2x slowdown is not worth complicating things over.

(Rare in the sense "rarely the factor determining total runtime")

src/scipp/curve_fit.py

Co-authored-by: Jan-Lukas Wynen <jan-lukas.wynen@ess.eu>

jokasimr requested a review from jl-wynen May 23, 2024 07:30

jokasimr changed the title ~~fix: curve fit nonscalar initial guess~~ fix: curve fit nonscalar initial guess and bound May 23, 2024

jl-wynen reviewed May 23, 2024

View reviewed changes

jl-wynen requested changes May 27, 2024

View reviewed changes

src/scipp/curve_fit.py Outdated Show resolved Hide resolved

jl-wynen approved these changes May 28, 2024

View reviewed changes

fix: allow array valued initial guess and bounds

a682c5f

jokasimr force-pushed the curve-fit-nonscalar-guess branch from 99ecf3a to a682c5f Compare May 30, 2024 07:14

jl-wynen approved these changes May 30, 2024

View reviewed changes

src/scipp/curve_fit.py Outdated Show resolved Hide resolved

Update src/scipp/curve_fit.py

9355dc1

Co-authored-by: Jan-Lukas Wynen <jan-lukas.wynen@ess.eu>

jl-wynen approved these changes May 30, 2024

View reviewed changes

jokasimr merged commit f2f52bd into main May 30, 2024

jokasimr deleted the curve-fit-nonscalar-guess branch May 30, 2024 09:56

fix: curve fit nonscalar initial guess and bound #3452

fix: curve fit nonscalar initial guess and bound #3452

Uh oh!

Conversation

jokasimr commented May 23, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jokasimr May 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jokasimr May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jokasimr May 23, 2024 •

edited

Loading

jokasimr May 27, 2024 •

edited

Loading