Hello
I have 200k documents and I create 100 topics. I look at the terms and see that the topics are good.
But when I want to look at examples for each topic I do probs, _ = topic_model.transform(count_matrix, details=True). Then I create new column for each for example dataframe['topic=0']=pd.Series(probs[:, 0]). Then I sort dataframe by prob value decrease and I see that about 1/3 of the document is relevant to the topic but others are irrelevant. Moreover no one word intersects between documents and topic. No indication of similarity between documents and topic.
I noticed that last ~10 topics have few words (3-8) in get_topics method result, random words and prob values ~ 0.2-0.3 which is above average
Could you advise me how I can change the model, in particular, recalculation of probability estimates document-topic ? Ty
Hello
I have 200k documents and I create 100 topics. I look at the terms and see that the topics are good.
But when I want to look at examples for each topic I do
probs, _ = topic_model.transform(count_matrix, details=True). Then I create new column for each for exampledataframe['topic=0']=pd.Series(probs[:, 0]). Then I sort dataframe by prob value decrease and I see that about 1/3 of the document is relevant to the topic but others are irrelevant. Moreover no one word intersects between documents and topic. No indication of similarity between documents and topic.I noticed that last ~10 topics have few words (3-8) in
get_topicsmethod result, random words and prob values ~ 0.2-0.3 which is above averageCould you advise me how I can change the model, in particular, recalculation of probability estimates document-topic ? Ty