A metric such as r2_score requires comparing the distributions of the true and predicted values. With a single sample it either returns 1 for equality or 0 for inequality (which already is a bit weird given that it's a metric over continuous variables).
Instead, we should return NaN or issue a warning or raise an error. We should identify other metrics where this would also be appropriate behaviour.
Do others think this is the correct course of action?