Finite-size corrections to Poisson approximations of rare events in renewal processes

John L. Spouge

doi:10.1239/jap/996986762

Finite-size corrections to Poisson approximations of rare events in renewal processes

Part of: Limit theorems Special processes

Published online by Cambridge University Press: 14 July 2016

John L. Spouge

Show author details

John L. Spouge*: Affiliation:
National Library of Medicine, USA
*: ∗ Postal address: National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA. Email address: spouge@nih.gov

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Consider a renewal process. The renewal events partition the process into i.i.d. renewal cycles. Assume that on each cycle, a rare event called 'success’ can occur. Such successes lend themselves naturally to approximation by Poisson point processes. If each success occurs after a random delay, however, Poisson convergence may be relatively slow, because each success corresponds to a time interval, not a point. In 1996, Altschul and Gish proposed a finite-size correction to a particular approximation by a Poisson point process. Their correction is now used routinely (about once a second) when computers compare biological sequences, although it lacks a mathematical foundation. This paper generalizes their correction. For a single renewal process or several renewal processes operating in parallel, this paper gives an asymptotic expansion that contains in successive terms a Poisson point approximation, a generalization of the Altschul-Gish correction, and a correction term beyond that.

Keywords

renewals Chen-Stein method generating functions

MSC classification

Primary: 60K15: Markov renewal processes, semi-Markov processes 60K20: Applications of Markov renewal processes (reliability, queueing networks, etc.) 60F99: None of the above, but in this section

Type: Research Papers
Information: Journal of Applied Probability , Volume 38 , Issue 2 , June 2001 , pp. 554 - 569

DOI: https://doi.org/10.1239/jap/996986762 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 2001

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Altschul, S. F. (1997). Sequence comparison and alignment. In DNA and Protein Sequence Analysis, eds. Bishop, M. J. and Rawlings, C. J. IRL Press, Oxford, pp. 137–167.Google Scholar

Altschul, S. (1999). Comments on ‘Gapped BLAST and PSI-BLAST: a new generation of protein database search programs’ by Altschul, S. F. et al. Scientist 13, 15.Google Scholar

Altschul, S. (1999). Private communication.Google Scholar

Altschul, S. F., and Gish, W. (1996). Local alignment statistics. In Computer Methods for Macromolecular Sequence Analysis (Methods in Enzymology 266), ed. Doolittle, R. F. Academic Press, London, pp. 460–480.Google Scholar

Altschul, S. F, and Koonin, E. V. (1998). Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem. Sci. 23, 444–447.CrossRef Google Scholar PubMed

Altschul, S. F. et al. (1990). Basic Local Alignment Search Tool. J. Molec. Biol. 215, 403–410.,Google Scholar

Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. Altschul, S. F. et al. (1994). Issues in searching molecular sequence databases. Nature Genetics 6, 119–129.Google Scholar

Boguski, M. S., Gish, W. and Wootton, J. C., Altschul, S. F. et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–402.Google Scholar

Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. Arratia, R., and Waterman, M. S. (1989). The Erdõs—Rényi strong law for pattern matching with a given proportion of mismatches. Ann. Prob. 17, 1152–1169.Google Scholar

Arratia, R., Goldstein, L., and Gordon, L. (1989). Two moments suffice for Poisson approximations, the Chen–Stein method. Ann. Prob. 17, 9–25.Google Scholar

Barrett, C., Hughey, R., and Karplus, K. (1997). Scoring hidden Markov models. Comput. Appl. Biosci. 13, 191–199.Google Scholar

Bundschuh, R., and Hwa, T. (2000). An analytic study of the phase transition line in local sequence alignment with gaps. Discrete Appl. Math. 104, 113–142.CrossRef Google Scholar

Chen, L. H. Y. (1975). Poisson approximation for dependent trials. Ann. Prob. 3, 534–545.CrossRef Google Scholar

Cinlar, E. (1975). Introduction to Stochastic Processes. Prentice-Hall, Englewood, NJ.Google Scholar

Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1978). A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure, Vol. 5, suppl. 3, ed. Dayhoff, M. O. National Biomedical Research Foundation, Silver Spring, MD, pp. 345–352.Google Scholar

De Bruijn, N. G. (1981). Asymptotic Methods in Analysis. Dover, New York.Google Scholar

Dembo, A., and Karlin, S. (1991). Strong limit-theorems of empirical distributions for large segmental exceedances of partial-sums of Markov variables. Ann. Prob. 19, 1756–1767.Google Scholar

Dembo, A., and Karlin, S. (1991). Strong limit-theorems of empirical functionals for large exceedances of partial-sums of i.i.d. variables. Ann. Prob. 19, 1737–1755.Google Scholar

Dembo, A., and Karlin, S. (1993). Central limit-theorems of partial-sums for large segmental values. Stoch. Proc. Appl. 45, 259–271.CrossRef Google Scholar

Dembo, A., and Zeitouni, O. (1998). Large Deviations Techniques and Applications. Springer, New York.Google Scholar

Dembo, A., Karlin, S., and Zeitouni, O. (1994). Critical phenomena for sequence matching with scoring. Ann. Prob. 22, 1993–2021.CrossRef Google Scholar

Dembo, A., Karlin, S., and Zeitouni, O. (1994). Limit distributions of maximal non-aligned two-sequence segmental score. Ann. Prob. 22, 2022–2039.Google Scholar

Doob, J. L. (1991). Measure Theory. Springer, New York.Google Scholar

Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol. 1. John Wiley, New York.Google Scholar

Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol. 2. John Wiley, New York.Google Scholar

Freedman, D. (1974). The Poisson approximation for dependent events. Ann. Prob. 2, 256–269.Google Scholar

Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987). Profile analysis: detection of distantly related proteins. Proc. Nat. Acad. Sci. USA 84, 4355–4358.Google Scholar

Grimmett, G. R., and Stirzaker, D. R. (1998). Probability and Random Processes. Oxford University Press.Google Scholar

Henikoff, S., and Henikoff, J. G. (1993). Performance evaluation of animo acid substitution matrices. Proteins 17, 49–61.Google Scholar

Hille, E. (1976). Analytic Function Theory, Vol. 1. Ginn, New York.Google Scholar

Iglehart, D. L. (1972). Extreme values in the GI/G/1 queue. Ann. Math. Statist. 43, 627–635.CrossRef Google Scholar

Karlin, S., and Altschul, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Nat. Acad. Sci. USA 87, 2264–2268.Google Scholar

Karlin, S., and Altschul, S. F. (1993). Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Nat. Acad. Sci. USA 90, 5873–5877.Google Scholar

Karlin, S., and Dembo, A. (1992). Limit distributions of maximal segmental score among Markov-dependent partial-sums. Adv. Appl. Prob. 24, 113–140.Google Scholar

Karlin, S., and Taylor, H. M. (1975). A First Course in Stochastic Processes. Academic Press, New York.Google Scholar

Karlin, S. et al. (1991). Statistical-methods and insights for protein and DNA-sequences. Annual Rev. Biophys. Biophys. Chem. 20, 175–203.Google Scholar

Bucher, P., Brendel, V. and Altschul, S. F. Levinson, N., and Redheffer, R. M. (1970). Complex Variables. Holden-Day, San Francisco.Google Scholar

Mott, R. (2000). Accurate formula for p-values of gapped local sequence and profile alignments. J. Molec. Biol. 300, 649–659.CrossRef Google Scholar PubMed

Needleman, S. B., and Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Molec. Biol. 48, 443–453.Google Scholar

Olsen, R., Bundschuh, R., and Hwa, T. (1999). Rapid assessment of extremal statistics for local alignment with gaps. In Proc. Int. Conf. Intelligent Systems for Molec. Biol., eds. Lengauer, T. et al. AAAI Press, Menlo Park, CA.Google Scholar

Pearson, W. R. (1995). Comparison of methods for searching protein sequence databases. Protein Sci. 4, 1145–1160.Google Scholar

Pearson, W. R. (1996). Effective protein sequence comparison. Meth. Enzymol. 266, 227–258.Google Scholar

Pólya, G. and Szegö, G. (1972). Problems and Theorems in Analysis, Vol. 1. Springer, New York.Google Scholar

Reinert, G., and Schbath, S. (1998). Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains. J. Comput. Biol. 5, 223–253.CrossRef Google Scholar PubMed

States, D. J., Gish, W., and Altschul, S. F. (1991). Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods 3, 66–70.Google Scholar

Stein, C. (1970). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proc. 6th Berkeley Symp. Math. Statist. Prob., Vol. II, eds. Le Cam, L. M. et al. University of California Press, pp. 583–602.Google Scholar

Tanushev, M. S., and Arratia, R. (1997). Central limit theorem for renewal theory for several patterns. J. Comput. Biol. 4, 35–44.Google Scholar

Waterman, M. S., Gordon, L., and Arratia, R. (1987). Phase-transitions in sequence matches and nucleic-acid structure. Proc. Nat. Acad. Sci. USA 84, 1239–1243.Google Scholar

Williams, D. (1997). Probability with Martingales. Cambridge University Press.Google Scholar

Wolf, Y. (1999). Personal communication.Google Scholar

Wootton, J. C., and Federhen, S. (1993). Statistics of local complexity in amino-acid-sequences and sequence databases. Comput. Chem. 17, 149–163.Google Scholar

Article contents

Finite-size corrections to Poisson approximations of rare events in renewal processes

Abstract

Keywords

MSC classification

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests