Add optional spacy_model argument to compute_lda_model in order to avoid expensive spacy model reloads
Currently, the compute_lda_model function makes N calls to the LoadFile.load_document method, one for each document in the dataset. The load_document method accepts an optional spacy_model, which is never specified by compute_lda_model.
This means that in turn load_document calls RawTextReader.read also without specifying an optional spacy_model argument, which finally leads to loading from scratch one of the installed spacy models at runtime N times. This obviously makes the whole LDA model computation significantly slower than necessary.
In my commit, I've added an optional spacy_model argument to the compute_lda_model function, which gets passed down to the load_document -> read chain, as well as a couple of additional logging lines to avoid giving the impression that the code is stuck.
On my machine, with a dataset of 14k documents, the LDA model computation time goes from ~2.5 days to ~20 minutes.