Skip to content

Separate optimization from statistic#1914

Closed
nden wants to merge 5 commits into
astropy:masterfrom
nden:separate-optimization-from-statistic
Closed

Separate optimization from statistic#1914
nden wants to merge 5 commits into
astropy:masterfrom
nden:separate-optimization-from-statistic

Conversation

@nden

@nden nden commented Dec 18, 2013

Copy link
Copy Markdown
Contributor

This PR separates optimization from statistic and allows creating fitters by combining optimizers and statistic functions.

Because of the many changes It's probably best to review this by looking at the entire files (fitting.py, optimizers.py, statistic.py) and not the diffs in the PR.

I am proposing that the name of the fitter is constructed from the names of the optimizer and the statistic, so some existing fitters have been renamed:

SLSQPFitter --> SLSQPLSQFitter
NonLinearLSQFitter --> LevMarLSQFitter

Statistics are implemented as functions (but can be any callable) and are stored currently in ~astropy.modeling.statistic.py. Currently there are two leastsquare and (following @cdeil's example in a different PR) cash. The input to the statistic functions is func(measured_values, evaluated_values, weights).

Optimization methods are stored in ~astropy.modeling.optimizers.py. They are implemented as classes in order to store additional information about the optimization properties but can be any callable.

Dealing with model's constraints has been moved to optimizers. Some methods of the old Fitter class were made functions.

I still need to add tests for the new fitter and possibly write a custom_fitter. Other than that I have pretty much completed what I had in mind and would appreciate comments on overall design, statistic implementation, names and any other improvements.

@cdeil

cdeil commented Dec 18, 2013

Copy link
Copy Markdown
Member

Cool!

I can have a look in a few days, just one comment about this:

The input to the statistic functions is func(measured_values, evaluated_values, weights)

Is it possible to have these inputs to the fit statistic more flexible?

One of the fit statistics I'd like to implement in astropy that is common in X-ray and gamma-ray astronomy is wstat:
http://heasarc.nasa.gov/xanadu/xspec/manual/XSappendixStatistics.html

It doesn't take "weights" as input, like chi^2 statistics do, but instead a few other quantities.

Here's an example where @registerrier implemented wstat as a user-defined statistic in Sherpa:
https://gist.github.com/cdeil/8026184

It would be nice if the astropy.modeling API for fit statistics were more general so that such hacks are not needed to implement more complex fit statistics like wstat.

If this is already the case, apologies.
I'll take a closer look at the API that you propose here and will try implement wstat as an example in a few days.

Thanks for doing this!

@nden

nden commented Dec 18, 2013

Copy link
Copy Markdown
Contributor Author

The only hard requirement is that statistic is callable.
However, if it is in the above form, fitters can use the base class Fitter.errorfunc. Otherwise they have to implement their own errorfunc to match the interface of the statistic.

This can probably be improved but I am putting it out for discussion. It'd be great to have more examples, looking forward to the wstat example.

@nden

nden commented Dec 30, 2013

Copy link
Copy Markdown
Contributor Author

The current Travis failure is an example in whatsnew/0.3.rst, failing due to a name change. The proposed name change, if accepted, is going to be in 0.4 (NonLinearLSQFitter -- > LevMarLSQFitter) while the failing example is for 0.3.

Is there a way to skip a test based on the version of astropy?
Or perhaps this should be skipped entirely.

@cdeil

cdeil commented Feb 19, 2014

Copy link
Copy Markdown
Member

@nden I don't know how to get rid of the travis-ci test failure you mentioned in the what's new example. Maybe @astrofrog knows or you can ask on the astropy-dev list ... this will probably be a common problem as astropy ages...

@nden

nden commented Feb 19, 2014

Copy link
Copy Markdown
Contributor Author

@cdeil I just pushed an update which I hope will make it easier to add a wstat statistic. Can you confirm this? I need to work some momre on this but wanted to give you an idea.

@cdeil

cdeil commented Feb 19, 2014

Copy link
Copy Markdown
Member

I'll try this out now ... maybe @astrofrog and / or @embray, you can have a look, too?
This is a major improvement that has been lying around for two months and it should go into master soon so that it gets some real-world testing before the 0.4 release.

@cdeil

cdeil commented Feb 19, 2014

Copy link
Copy Markdown
Member

@nden Can you please add an entry to the CHANGES document?
(I think that's what we should do for modeling for 0.4 ... note major changes, but not start adding deprecation warnings)

Comment thread astropy/modeling/statistic.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add chi_line to __all__?

@nden

nden commented Feb 19, 2014

Copy link
Copy Markdown
Contributor Author

It's not quite ready for review. I just want to make sure it's flexible enough to add other statistic functions and optimizers.

I don't know if there's much sense in working on new features until #2049 is resolved since everything in modeling depends on how parameters are handled but @embray is working on it.

@cdeil

cdeil commented Feb 19, 2014

Copy link
Copy Markdown
Member

The fit statistics functions (leastsquare, cash and chi_line) need docstrings.

I have re-implemented all build-in fit statistics from Sherpa in gammapy.stats.fit_statistics, see e.g. gammapy.stats.fit_statistics.cash for an example.

Some points for discussion about the fit statistics functions:

  • Should fit statistics return the sum over bins or the per-bin fit statistic? (Sherpa returns both I think)
  • Do we need the chi_line fit statistic in addition to the leastsquare fit statistic? (see the various chi_square fit statistics in Sherpa / gammapy)
  • Can we avoid having special fit statistics for 1-dim, 2-dim, ... data? (If yes, there should be no fit statistic with name *_line)

@cdeil

cdeil commented Feb 19, 2014

Copy link
Copy Markdown
Member

@nden I wasn't aware that #2049 is the bottleneck at the moment ... in this case I'll continue looking at this later.

@cdeil

cdeil commented Feb 19, 2014

Copy link
Copy Markdown
Member

Just one more comment ... I suggest we rename Fitter.errorfunc to Fitter.fit_statistic or Fitter.objective_function, which are more general terms ... I think errorfunc is least-squares-specific terminology and I find it confusing for the general Fitter API.

@nden

nden commented Feb 19, 2014

Copy link
Copy Markdown
Contributor Author

All good questions.
I would prefer not to have special statistics but it's not clear to me that chi_line can be avoided and it's a useful case. I hope someone can prove me wrong.

@nden

nden commented Feb 19, 2014

Copy link
Copy Markdown
Contributor Author

I am in favor of objective_function since this is what's generally used. fit_statistic is easily confused with the statistic functions used within the objective function.

Comment thread astropy/modeling/statistic.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really clear on what chi_line does.
I presume it's one method to take errors on x into account when fitting a 1-dim model to (x, y) data points.

Some reference I'm aware of:
http://adsabs.harvard.edu/abs/2010arXiv1008.4686H
http://root.cern.ch/root/html/TGraph.html#TGraph:Chisquare

Please add a reference and a test case with a few points for this fit statistic.

Also I would suggest to rename it ... to me "line" sounds like straight line, whereas I think this can handle non-linear functions f(x)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a specific fitter which fits a straight line with uncertainties in x and y. I'm not sure how to generalize this (except may be in the case of a plane, hence z_sigma in the code).
I'll check the references you mentioned.

@nden

nden commented Feb 25, 2014

Copy link
Copy Markdown
Contributor Author

@cdeil After mulling over this for a while, I decided it's best to limit this PR to only separating statistics from optimizers and add additional statistics in separate PRs.
Outside this PR I tested that it works with other statistic types so I think it's ready for others to try to add their favourite statistic. I am marking it as ready for final review. There are name changes in this PR so I will add entries in CHANGES.rst before merging after everyone had a chance to change the names again.

@nden

nden commented Mar 1, 2014

Copy link
Copy Markdown
Contributor Author

This got stuck for some reason. There are so many changes in upstream/master that even rebasing or manually merging upstream locally does not produce diffs that are easy to review. So I prepared a new branch based on the current trunk. I am wondering if I replace the current branch with the new branch with "git branch -M ..." would Travis pick up the new branch? Or is it better to make a separate PR?

Comment thread astropy/modeling/fitting.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an astropy doc rule, but I would suggest to use ~ only for astropy-internal references
and to leave it off for astropy-external references like the links to the numpy and scipy docs here.

IMO this makes it simpler to know what's what when reading the docs and means less surprises when clicking links.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the text does say Numpy's lstsq function, so maybe it's ok?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's OK! This is a very minor point and probably just my personal preference when reading the astropy html docs. If @nden prefers how it's written now she's very welcome to ignore this comment.

Comment thread docs/modeling/new.rst

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use import * in examples.

@nden

nden commented Mar 3, 2014

Copy link
Copy Markdown
Contributor Author

Ah, I see. The goal was to make it possible to write a statistic without having to modify the objective function, defined in the fitter.

Adding fitter.statistic and fitter.optimizer properties will allow changing them after a fitter is initialized, for example:

fitter = fitting.SimplexLSQFitter()
fitter.statistic = cash

This is easy to add.

It may be good also to add a function (custom_fitter) which takes an optimizer and a statistic and returns a fitter. This will take some more work.

@nden

nden commented Mar 3, 2014

Copy link
Copy Markdown
Contributor Author

@taldcroft Another option is to pass the statistic to fitter.__call__ . I prefer the properties though. Do you see any reasons this would be better than having properties?

@astrofrog

Copy link
Copy Markdown
Member

@nden @taldcroft @cdeil - just to check, what is the current status of this? I think we had planned to re-apply the changes rather than trying to rebase? If so, would it make sense to do this this week so we can merge it (or have it ready for review) before the feature freeze?

@nden

nden commented Apr 30, 2014

Copy link
Copy Markdown
Contributor Author

My understanding is that this, as well as #2149 and #1985 should be closed as they are part of the refactoring of modeling that @embray is doing.

@embray

embray commented Apr 30, 2014

Copy link
Copy Markdown
Member

I would like to keep all of these open as reference--all of them address issues that should be tracked regardless of whether any attached implementations are used as-is.

@embray

embray commented May 1, 2014

Copy link
Copy Markdown
Member

Actually, were there still any major lingering issues with this? I know there were some minor niggles but they could be addressed later. In the interest of getting some of this functionality sooner rather than later I can rebase this to merge into the current master if people thing that would be better.

@cdeil

cdeil commented May 1, 2014

Copy link
Copy Markdown
Member

I've completely lost track with the many open astropy.modeling issues and pull requests.
Separating optimisation from fit statistic is in my opinion a big step in the right direction.
So 👍 to merge some of them into master now and then re-evaluate if everything works well or some things need to be adapted.

@nden

nden commented May 1, 2014

Copy link
Copy Markdown
Contributor Author

Something that needs to be added here is a simple way for a user to change the statistic function (See Tom's comments above). I did some work in this direction but it wasn't completed.

@embray

embray commented May 1, 2014

Copy link
Copy Markdown
Member

Okay, I've rebased this on master in a local branch of my own and ensured all the tests are passing on all supported Python versions. I think I'd rather go ahead and merge this as-is, and then build any additional enhancements on top of what's in there.

The only thing this is lacking is a changelog entry, and perhaps also warrants a mention in the 0.4 what's new page.

@embray

embray commented May 1, 2014

Copy link
Copy Markdown
Member

Unless you don't want to advertise it too broadly if the feature is considered "incomplete". But there are still API changes that deserve mention.

@astrofrog

Copy link
Copy Markdown
Member

I think we can work on the 'what's new' after feature freeze, so I'd just add the CHANGES.rst entry for now, and we can worry about the 'what's new' after.

@embray

embray commented May 2, 2014

Copy link
Copy Markdown
Member

Okay...I just worry about things slipping through the cracks with that. I wasn't really following if people were writing that or not as is.

@astrofrog astrofrog mentioned this pull request May 2, 2014
2 tasks
@astrofrog

Copy link
Copy Markdown
Member

@embray - I just created #2416. Note that if you want to write it now, you are very welcome to! But if there are enough other things to worry about today for the feature freeze, this can be delayed.

@embray

embray commented May 2, 2014

Copy link
Copy Markdown
Member

Okay thanks, sounds good. Will have to go through and find out what else needs to be in there. Need to ping other devs too.

@astrofrog

Copy link
Copy Markdown
Member

Will ping some devs now on the gitter channel (since a few are hanging out there)

@eteq

eteq commented May 26, 2014

Copy link
Copy Markdown
Member

@nden @astrofrog @embray - what's the story with this? Can it be rebased and merged for 0.4? Or do we need to push it off?

@astrofrog

Copy link
Copy Markdown
Member

Closed by #2441

@astrofrog astrofrog closed this Jun 10, 2014
@embray embray removed this from the v0.4.0 milestone Jun 10, 2014
@nden nden deleted the separate-optimization-from-statistic branch April 30, 2018 12:17
@nden nden restored the separate-optimization-from-statistic branch April 30, 2018 12:17
@nden nden deleted the separate-optimization-from-statistic branch January 22, 2026 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants