Skip to content

Topic in document with 0.99 prob but no one word intersects between documents and topic #33

@vladradishevsky

Description

@vladradishevsky

Hello

I have 200k documents and I create 100 topics. I look at the terms and see that the topics are good.
But when I want to look at examples for each topic I do probs, _ = topic_model.transform(count_matrix, details=True). Then I create new column for each for example dataframe['topic=0']=pd.Series(probs[:, 0]). Then I sort dataframe by prob value decrease and I see that about 1/3 of the document is relevant to the topic but others are irrelevant. Moreover no one word intersects between documents and topic. No indication of similarity between documents and topic.

I noticed that last ~10 topics have few words (3-8) in get_topics method result, random words and prob values ~ 0.2-0.3 which is above average

Could you advise me how I can change the model, in particular, recalculation of probability estimates document-topic ? Ty

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions