The labels parameter to precision_recall_fscore_support is described in the docstring as "array of labels". Things that are unclear to me:
- it is only tested with permutations of the range of labels. In this case, it only applies to
average=None and that should be stated.
- may it be used to limit the set of labels used? if so, this affects other averages, but really must be tested for all and documented.