The decision_function and predict_proba of a multi-label classifier (e.g. OneVsRestClassifier) is a 2d arrays where each column correspond to a label and each row correspond to a sample. (added in 0.14?)
The decision_function and predict_proba of multi-output multi-class classifier (e.g. RandomForestClassifier) is a list of length equal to the number of output with a multi-class decision_function or predict_proba output (a 2d array where each row corresponds to the samples and where each columns correspond to a class).
So this means that multi-output problem with only binary class output is a multi-label task, but isn't consistent with the multi-label format...
This is problematic if you want to code a roc_auc_score function to support multi-label output.