IIRC string matching algorithms for VERY LONG strings (e.g., DNA
sequences), are relatively specialized beasts.  Tend to use vector
operations and be parallelized.  

Here's a pointer into the Citeseer database that might get you
started:

http://citeseer.nj.nec.com/298209.html

As far as the probabilities..... hmmmm.  As a first approximation, you
might try just looking at a geometric distribution.  Wouldn't be
accurate, because if you got a mis-match at point n, after matching at
positions 1...n, rolling back to position 2 wouldn't be an independent
trial.  But it would give you an upper bound quickly.

R

_______________________________________________
TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
http://www.mn-linux.org tclug-list at mn-linux.org
https://mailman.real-time.com/mailman/listinfo/tclug-list