Skip to content

Conversation

@iofall
Copy link
Contributor

@iofall iofall commented Feb 22, 2022

Fixes #936

This PR adds the sphinxcontrib-bibtex extension which allows us to use citations from a .bib file.

Usage

  1. Check if your required reference already exists in the docs/refs.bib file. If not add your bibtex snippet to the refs.bib file. To keep the .bib file organized, please ensure that you do not introduce duplicate entries.
  2. The convention for citation keys should be <FirstAuthorSurname><Year><PaperKeyword>.
  3. After you have added to the refs.bib file, you can reference your citation using the specified citation key using the syntax:
:footcite:`citation-key` 

The above citation will be rendered as [Number].
4. To add the bibliography, you need to use :footbibliography: directive at the bottom of your file. This will list all the citations for the current document automatically!

Assumptions

  • I used :footcite: instead of the normal :cite: because of the way references are resolved.
  • I have currently used the <FirstAuthorSurname><Year><PaperKeyword> convention for citation keys.

Caveats

  • If you see multiple URLs, change your bibtex to only show one URL. In case of arXiv references, the library somehow creates URLs for the arXiv paper and shows both the generated URL as well as the URL under the URL parameter in .bib file. To avoid this, either only specify a URL parameter or eprint along with eprinttype.

To Do

  • Change references throughout the documentation to the new format. This could be a good sprint issue especially for beginners.
  • Change the version check function. See the comments below.

Look out for

  • While testing using sphinx multiversion build, remember to whitelist the PR branch.
  • For now only mitigations.rst has the new citations.

Example Output

  • Before

  • After

docs/conf.py Outdated
Comment on lines 251 to 259
def check_if_v071dev0():
"""Check to see if current version being built is v0.7.1.dev0"""
result = False

if fairlearn.__version__ == "0.7.1.dev0":
print("Detected 0.7.1.dev0 in fairlearn.__version__")
result = True

return result
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the PR gets merged, we will have to change this version check. I was thinking of using a greater than comparison here. I did look into this and realized we might have to use the packaging library as suggested in Stackoverflow. However I am not sure if we want to introduce another dependency?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a doc build dependency, I don't mind adding it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I later realized that packaging is already used by setuptools so most people should already have it installed. But to be explicit, I have added it to the requirements-dev.txt.

@riedgar-ms
Copy link
Member

Thank you for starting to look at this! The versioning of the docs is something we've talked on and off about for a while; this might help animate those discussions.

@riedgar-ms riedgar-ms requested a review from a team February 23, 2022 12:35
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to use packaging for the version check, and it LGTM otherwise.

@iofall iofall requested a review from adrinjalali March 2, 2022 17:03
Copy link
Member

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting the bibtex conversion started! This is great!!!

Make sure to add a line to the changelog (part of docs/user_guide, there's a file somewhere for each version, so just add to the latest one).

docs/refs.bib Outdated
@@ -0,0 +1,51 @@
Mitigations.rst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of this, but the comments in this file don't add anything from my point of view. In fact, these papers are cited all over the place rather than just in one file. Keeping such a mapping in sync sounds tricky and I don't really see the value it would add since you can just search for the identifier to find where it's cited. Again, just the comments (including paper comments below), everything else is great! I assume you just copied from Google Scholar or somewhere similar?

Copy link
Member

@MiroDudik MiroDudik Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we often cite papers across several sections, so organizing the bibtex by the section of our docs might not work (perfectly). I'm not sure what a good alternative would be though. I'm okay going with this proposal but ensuring that we don't have duplicates.

The bibtex entries below look like a merge of DBLP versions, which I think is sensible. We shouldn't use Googe Scholar as a source for citations. The main decision is perhaps whether we are okay with the citation ids of the form

<FirstAuthorSurname><Year><PaperKeyword>

I'm fine with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping such a mapping in sync sounds tricky and I don't really see the value it would add since you can just search for the identifier to find where it's cited.

Ok that makes sense, I think the step 1 should cover the duplicates, no need for comments. If someone is adding the a new entry, they should check if an entry already exists. I'll modify the PR and the description.

  1. Check if your required reference already exists in the docs/refs.bib file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you just copied from Google Scholar or somewhere similar?

I used the bibtex entries available through arXiv and modified some of the entries to showcase how different formats look.

The bibtex entries below look like a merge of DBLP versions, which I think is sensible. We shouldn't use Googe Scholar as a source for citations. The main decision is perhaps whether we are okay with the citation ids of the form

Thank you for the reference, I did not know that about Google Scholar. I used the ones available through arXiv, they link to the DBLP hosted records. For example - arXiv Page Fairlearn and its DBLP Bibtex. However, they always link back to the arXiv page for the paper.

Takeaway: It might be helpful to establish where we source our bibtex snippets from.

~~~~~~~~~~~

.. topic:: References:
.. _references:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this smart enough to include only the cited ones on this page? It will be important moving forward when there are perhaps dozens of citations across the website, but only say, a handful, on this particular page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mirrored what we already had in the documentation. Right now the structure is - Each page like mitigation.rst will only list citations that are used on this page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great. I'm pretty sure that's what @romanlutz had in mind too.


# Setup for sphinx-bibtex

# Only use sphinx-bibtex if version is above 0.7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used in other versions, so it should just be fine(?) or am I missing something?

Oh or is this about the refs file not existing?

Just trying to make sure I get it 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, it will be fine. However, sphinx-multiversion uses the latest conf.py and for other versions we don't have a refs.bib file which generates a warning. Although it will not stop the documentation build, this can worry some new users, so I made it explicit to use sphinx-bibtex only when the required version is available.

image
It is really minor but it looks something like this.

Copy link
Member

@MiroDudik MiroDudik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked the specific bib entries that are being used.

docs/refs.bib Outdated
Mitigations.rst
Agarwal, Beygelzimer, Dudik, Langford, Wallach “A Reductions Approach to Fair Classification”, ICML, 2018.
@article{agarwal2018reductions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be using something more similar to this: https://dblp.uni-trier.de/rec/conf/icml/AgarwalBD0W18.html?view=bibtex&param=0
(so @inproceedings rather than @article)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it I will change to @inproceedings, do you mean to use the condensed format? It does not have URLs so the references will not link back to the papers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, the citations will look like this:

Standard

Condensed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having a URL is definitely nice to have, but the full standard citation seems very verbose. I would go for something in between, maybe (also fixing my name, which is getting a bit mangled):

@inproceedings{agarwal2018reductions,
  author    = {Alekh Agarwal and
               Alina Beygelzimer and
               Miroslav Dudík and
               John Langford and
               Hanna M. Wallach},
  title     = {A Reductions Approach to Fair Classification},
  booktitle = {Proceedings of the 35th International Conference on Machine Learning,
               {ICML} 2018},
  pages     = {60--69},
  year      = {2018},
  url       = {http://proceedings.mlr.press/v80/agarwal18a.html},
  biburl    = {https://dblp.org/rec/conf/icml/AgarwalBD0W18.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'd rather add more info than less. It's at the bottom of the page anyway...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also fixing my name, which is getting a bit mangled

Sorry about that, I'll change it.

I think let's go with the condensed version, we will add in the url field to it. The verbosity mainly comes from the booktitle.

It does not look too verbose, nor too less -
image

docs/refs.bib Outdated
}

Agarwal, Dudik, Wu “Fair Regression: Quantitative Definitions and Reduction-based Algorithms”, ICML, 2019.
@article{agarwal2019fair,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/refs.bib Outdated
}

Hardt, Price, Srebro “Equality of Opportunity in Supervised Learning”, NeurIPS, 2016.
@article{hardt2016equality,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe 2016 preceeds the renaming of the conference (?). Is it okay to call it by the new name regardless? I'm obviously very much in favor, just trying to understand the conventions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, retroactive renaming is okay... we're providing a link, so it shouldn't matter.

@adrinjalali
Copy link
Member

So it seems there's quite a bit of discussion on how to reference papers. What I've always done is to cope the bibtex entry from google scholar and let it be formatted by whatever library's rendering them.

But it seems @MiroDudik doesn't like that. I understand in a paper that may be important, but I'm not sure if this much discussion and work is worth it everytime we'd be adding new citations. This library is not an academic article, so I'm not sure why the sensitivity on the details of the citations are raised here. For me, as long as people can copy/paste our reference and find it wherever they find their articles, it's good enough.

Copy link
Member

@MiroDudik MiroDudik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@MiroDudik MiroDudik merged commit 8fd3ad6 into fairlearn:main Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DOC Use BibTeX for citations

5 participants