Revisiting Fisher Kernels for Document Similarities

Nyffenegger, Martin; Chappelier, Jean-Cédric; Gaussier, Éric

doi:10.1007/11871842_73

Martin Nyffenegger²¹,
Jean-Cédric Chappelier²¹ &
Éric Gaussier²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4212))

Included in the following conference series:

European Conference on Machine Learning

6356 Accesses
5 Citations

Abstract

This paper presents a new metric to compute similarities between textual documents, based on the Fisher information kernel as proposed by T. Hofmann. By considering a new point-of-view on the embedding vector space and proposing a more appropriate way of handling the Fisher information matrix, we derive a new form of the kernel that yields significant improvements on an information retrieval task. We apply our approach to two different models: Naive Bayes and PLSI.

Download to read the full chapter text

Chapter PDF

On the Replicability of Combining Word Embeddings and Retrieval Models

Aggregating Neural Word Embeddings for Document Representation

Bayesian network Fisher kernel for categorical feature spaces

Article Open access 08 January 2020

References

Hofmann, T.: Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. In: Advances in Neural Information Processing Systems (NIPS), vol. 12, pp. 914–920 (2000)
Google Scholar
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems (NIPS), vol. 11, pp. 487–493 (1999)
Google Scholar
Lewis, D.: Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Chapter Google Scholar
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)
Article MATH Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of 22th International Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
Google Scholar
Nyffenegger, M.: Similarités textuelles à base de noyaux de Fisher. Master’s thesis, Ecole Polytechnique Fédérale de Lausanne, Switerland (2005)
Google Scholar
Jin, X., Zhou, Y., Mobasher, B.: Web usage mining based on probabilistic latent semantic analysis. In: Proc. of the 10th ACM SIGKDD International Conference on Knowledge discovery and Data Mining (KDD 2004), pp. 197–205 (2004)
Google Scholar
Ahrendt, P., Goutte, C., Larsen, J.: Co-occurrence models in music genre classification. In: IEEE Int. Workshop on Machine Learning for Signal Processing (2005)
Google Scholar
Vinokourov, A., Girolami, M.: A probabilistic framework for the hierarchic organisation and classification of document collections. Journal of Intelligent Information Systems 18, 153–172 (2002)
Article Google Scholar
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000)
Book MATH Google Scholar
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Ecole Polytechnique Fédérale de Lausanne, Switzerland
Martin Nyffenegger & Jean-Cédric Chappelier
Xerox Research Center Europe, Meylan, France
Éric Gaussier

Authors

Martin Nyffenegger
View author publications
Search author on:PubMed Google Scholar
Jean-Cédric Chappelier
View author publications
Search author on:PubMed Google Scholar
Éric Gaussier
View author publications
Search author on:PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nyffenegger, M., Chappelier, JC., Gaussier, É. (2006). Revisiting Fisher Kernels for Document Similarities. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_73

Download citation

DOI: https://doi.org/10.1007/11871842_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics