Skip to main content

Advertisement

Springer Nature Link
Log in
Menu
Find a journal Publish with us Track your research
Search
Saved research
Cart
  1. Home
  2. Machine Learning: ECML 2006
  3. Conference paper

Revisiting Fisher Kernels for Document Similarities

  • Conference paper
  • pp 727–734
  • Cite this conference paper
Machine Learning: ECML 2006 (ECML 2006)
Revisiting Fisher Kernels for Document Similarities
  • Martin Nyffenegger21,
  • Jean-Cédric Chappelier21 &
  • Éric Gaussier22 

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4212))

Included in the following conference series:

  • European Conference on Machine Learning
  • 6356 Accesses

  • 5 Citations

Abstract

This paper presents a new metric to compute similarities between textual documents, based on the Fisher information kernel as proposed by T. Hofmann. By considering a new point-of-view on the embedding vector space and proposing a more appropriate way of handling the Fisher information matrix, we derive a new form of the kernel that yields significant improvements on an information retrieval task. We apply our approach to two different models: Naive Bayes and PLSI.

Download to read the full chapter text

Chapter PDF

Similar content being viewed by others

On the Replicability of Combining Word Embeddings and Retrieval Models

Chapter © 2020

Aggregating Neural Word Embeddings for Document Representation

Chapter © 2018

Bayesian network Fisher kernel for categorical feature spaces

Article Open access 08 January 2020

Explore related subjects

Discover the latest articles, books and news in related subjects, suggested using machine learning.
  • Coding and Information Theory
  • Data Structures and Information Theory
  • Functional clustering
  • Information theory
  • Multivariate Analysis
  • Bioinformatics
  • Natural Language Processing Techniques for Sentiment Analysis

References

  1. Hofmann, T.: Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. In: Advances in Neural Information Processing Systems (NIPS), vol. 12, pp. 914–920 (2000)

    Google Scholar 

  2. Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems (NIPS), vol. 11, pp. 487–493 (1999)

    Google Scholar 

  3. Lewis, D.: Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  4. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)

    Article  MATH  Google Scholar 

  5. Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of 22th International Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)

    Google Scholar 

  6. Nyffenegger, M.: Similarités textuelles à base de noyaux de Fisher. Master’s thesis, Ecole Polytechnique Fédérale de Lausanne, Switerland (2005)

    Google Scholar 

  7. Jin, X., Zhou, Y., Mobasher, B.: Web usage mining based on probabilistic latent semantic analysis. In: Proc. of the 10th ACM SIGKDD International Conference on Knowledge discovery and Data Mining (KDD 2004), pp. 197–205 (2004)

    Google Scholar 

  8. Ahrendt, P., Goutte, C., Larsen, J.: Co-occurrence models in music genre classification. In: IEEE Int. Workshop on Machine Learning for Signal Processing (2005)

    Google Scholar 

  9. Vinokourov, A., Girolami, M.: A probabilistic framework for the hierarchic organisation and classification of document collections. Journal of Intelligent Information Systems 18, 153–172 (2002)

    Article  Google Scholar 

  10. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  11. McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000)

    Book  MATH  Google Scholar 

  12. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Ecole Polytechnique Fédérale de Lausanne, Switzerland

    Martin Nyffenegger & Jean-Cédric Chappelier

  2. Xerox Research Center Europe, Meylan, France

    Éric Gaussier

Authors
  1. Martin Nyffenegger
    View author publications

    Search author on:PubMed Google Scholar

  2. Jean-Cédric Chappelier
    View author publications

    Search author on:PubMed Google Scholar

  3. Éric Gaussier
    View author publications

    Search author on:PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Knowledge Engineering Group, Technische Universität Darmstadt,  

    Johannes Fürnkranz

  2. Max Planck Institute for Computer Science, Saarbrücken, Germany

    Tobias Scheffer

  3. Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany

    Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nyffenegger, M., Chappelier, JC., Gaussier, É. (2006). Revisiting Fisher Kernels for Document Similarities. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_73

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/11871842_73

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45375-8

  • Online ISBN: 978-3-540-46056-5

  • eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Fisher Information Matrix
  • Latent Class Model
  • Document Similarity
  • Document Length
  • Fisher Kernel

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Footer Navigation

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Language editing
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover

Corporate Navigation

  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

162.0.217.198

Not affiliated

Springer Nature

© 2026 Springer Nature