Skip to main content

Subsequence Combinatorics and Applications to Microarray Production, DNA Sequencing and Chaining Algorithms

  • Conference paper
Combinatorial Pattern Matching (CPM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4009))

Included in the following conference series:

  • 1029 Accesses

  • 13 Citations

Abstract

We investigate combinatorial enumeration problems related to subsequences of strings; in contrast to substrings, subsequences need not be contiguous. For a finite alphabet Σ, the following three problems are solved. (1) Number of distinct subsequences: Given a sequence s ∈Σn and a nonnegative integer kn, how many distinct subsequences of length k does s contain? A previous result by Chase states that this number is maximized by choosing s as a repeated permutation of the alphabet. This has applications in DNA microarray production. (2) Number of ρ -restricted ρ -generated sequences: Given s ∈Σn and integers k ≥1 and ρ≥1, how many distinct sequences in Σk contain no single nucleotide repeat longer than ρ and can be written as \(s_1^{r_1}\dots s_n^{r_n}\) with 0≤r i ρ for all i? For ρ= ∞, the question becomes how many length-k sequences match the regular expression s 1 * s 2 * ...s n *. These considerations allow a detailed analysis of a new DNA sequencing technology (“454 sequencing”). (3) Exact length distribution of the longest increasing subsequence: Given Σ= {1,...,K} and an integer n ≥1, determine the number of sequences in Σn whose longest strictly increasing subsequence has length k, where 0 ≤kK. This has applications to significance computations for chaining algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057), 376–380 (2005), Corrigendum in Nature 439(7075), 502 (2006)

    Google Scholar 

  2. Aldous, D., Diaconis, P.: Longest increasing subsequences: From patience sorting to the Baik-Deift-Johansson theorem. Bulletin of the American Mathematical Society 36(4), 413–432 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  3. Niedermeier, R.: Invitation to Fixed Parameter Algorithms. Oxford University Press, Oxford (2006)

    Book  MATH  Google Scholar 

  4. Rahmann, S.: The shortest common supersequence problem in a microarray production setting. In: Proceedings of the 2nd European Conference in Computational Biology (ECCB 2003), pp. ii156–ii161, vol. 19(suppl. 2) of Bioinformatics (2003)

    Google Scholar 

  5. Chase, P.: Subsequence numbers and logarithmic concavity. Discrete Math. 16, 123–140 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  6. Skiena, S.S.: The Algorithm Design Manual. Springer, Heidelberg (1997)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rahmann, S. (2006). Subsequence Combinatorics and Applications to Microarray Production, DNA Sequencing and Chaining Algorithms. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_15

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics