Abstract
This chapter has demonstrated the feasibility of full-text indexing of large information bases. The use of modern compression techniques means that there is no space penalty: large document databases can be compressed and indexed in less than a third of the space required by the originals. Surprisingly, there is little or no time penalty either: querying can be faster because less information needs to be read from disk. Simple queries can be answered in a second; more complex ones with more query terms may take a few seconds. One important application is the creation of static databases on CD-ROM, and a 1.5 gigabyte document database can be compressed onto a standard 660 megabyte CD-ROM.
Creating a compressed and indexed document database containing hundreds of thousands of documents and gigabytes of data takes a few hours. Whereas retrieval can be done on ordinary workstations, creation requires a machine with a fair amount of main memory.
Access this chapter
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Witten, I.H., Moffat, A., Bell, T.C. (1995). Compression and full-text indexing for Digital Libraries. In: Adam, N.R., Bhargava, B.K., Yesha, Y. (eds) Digital Libraries Current Issues. DL 1994. Lecture Notes in Computer Science, vol 916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026856
Download citation
DOI: https://doi.org/10.1007/BFb0026856
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-59282-2
Online ISBN: 978-3-540-49230-6
eBook Packages: Springer Book Archive
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
