Aquifer Exploration Probability

Dealing with uncertainty in groundwater exploration

By Darrel Dunn Ph.D., PG, Hydrogeologist

(View Résumé 🔳)

This is a technical page on aquifer exploration.  Click link for a non-technical page on aquifer exploration probability.


If the purpose of an aquifer exploration task is to estimate how many test wells would be required in an aquifer to complete at least one successful test well, then exceedance probability might be used.   Exceedance probability (Pe) is the probability that one value obtained at random from a large distribution of values will equal or exceed a certain value.  The probability of the value not exceeding that value is (1-Pe).   If the values in the distribution are independent (no value is affected by any other value), then if n values are obtained at random:

P(ALO) = 1-[P(LTC)]n                     Equation 1


P(ALO) is the probability that at least one value equals or exceeds a certain critical value, and

P(LTC) is the probability that a value obtained at random is less than the critical value.

The term [P(LTC)]n is the probability that all of the values obtained were less than the critical value, and P(ALO) is its complement.  That is: [P(LTC)]n+P(ALO)=1.

Solving Equation 1 for n, yields:

n = log[1-P(ALO)]/log[P(LTC)]                     Equation 2

This form of the equation may be useful in some aquifer exploration projects.  For example, if sufficient data on hydraulic conductivity at aquifer test sites scattered randomly in a certain geologic terrane are available, then one can select a value for P(ALO) that would provide an acceptable level of uncertainty of finding a certain critical value of hydraulic conductivity.  Then a cumulatave-distribution polygon for hydraulic conductivity can be constructed from the existing data, and P(LTC) can be read from cumulative distribution.  Once one has values for P(ALO) and P(LTC), a value for n my be calculated from Equation 2. The value for n provides an estimate of the number of test wells that would be required to find one or more sites with with hydraulic conductivity at or above the critical value.

As an example, consider the frequency histogram of hydraulic conductivity in a certain geologic terrane shown in Figure 1.   This histogram is based on 1539 values.  The corresponding cumulative-distribution polygon is shown in Figure 2.

Hydraulic Conductivity Histogram

Figure 1. Example of a frequency histogram.

Hydraulic Conductivity Cumulative Distribution

Figure 2. Example of cumulative distribution polygon.

If one were to need a hydraulic conductivity 500 ft/day for a successful well completion, then P(LTC) would be about 0.96.   If one wanted to be 90% confident of finding this critical value with at least one test well, then P(ALO) would be set at 0.90.  From Equation 2:

n = log[1-0.90]/log[0.96] = 56.4.

So one would be 90% confident of constructing at least one test well yielding a value of hydraulic conductivity of 500 ft/day or more if 57 test wells were constructed.  These wells would have to be located sufficiently far apart so that the hydraulic conductivity of one would not be affected by any other (independent).

However, if 100 ft/day would suffice, then P(LTC) would be about 0.68. If P(ALO) were kept at 0.90, then

n = log [1-.90]/log[.68] = 5.97.

So one would be 90% confident of constructing at least one test well yielding a value of hydraulic conductivity of 100 ft/day or more if 6 test wells were constructed.

Such exceedance probability calculations might be used at the planning stage of a water resource investigation to provide information on the possible cost of a test well program given a desired likelihood of success.   In practice, one likely would not need to construct the n test wells before a successful one is completed.   However, there is also a probability [1-P(ALO)] that even n test wells will not be successful.

The example above is based on a large number of data values, so that the cumulative distribution polygon is likely a good representation of the entire population of values.   However, if additional data were available, the cumulative distribution polygon would not be likely to be exactly the same. Some judgment is required regarding the effect of sample size.  The effect of sample size may be examined by using Kolmogorov's D-statistic.  This statistic is the maximum absolute value of the difference between the sample cumulative distribution and the population distribution.  Its distribution is known.   Consequently, it gives a probabilistic estimate of maximum difference between the cumulative distribution of a data set and the cumulative distribution of the entire population.  The value of D for a sample size (n) greater than 35 is 1.22/n1/2, 1.36/ n1/2, and 1.63/n1/2, at significance levels of 0.1, 0.05, and 0.01, respectively.   These significance levels represent confidence levels of 90, 95, and 99 percent.  In the case of the cumulative distribution polygon shown in Figure 2, where n=1539, D=0.04 at the 99 percent confidence level.  So this distribution is a good representation of the actual distribution of hydraulic conductivity in that geologic terrane.  The population distribution would differ from the sample distribution by no more than 4 percent with a very high degree of confidence.

Dixon and Massey (1957) provide a table of percentiles of the distribution of Kolmogorov's D-statistic and a relatively simple description of it use.  More complete tables (and abstruse mathematical disquisitions) are available elsewhere.  Values of Kolmogorov's D-statistic for selected confidence levels and samples sizes are plotted in Figure 3.   Figure 3 shows that as the sample size (for example, hydraulic conductivities from previous test wells) decreases and the selected confidence level increases, D increases.   In the extreme, if the sample size is one (1) and the confidence level is 99 percent, D is 0.995.   Hence, a single random sample yields virtually no information on the population distribution.   For small sample sizes, the D-statistic could be used to provide more conservative estimates of n, because its value at a selected confidence level could be added to P(LTC). However P(LTC) must be less than one (1) to avoid division by zero.

Kolmogorov D-Statistic

Figure 3.  Graph of selected values of Kolmogorov's D-Statistic.

One can increase the chance of success if test wells can be placed at geologically advantageous locations.  Identifying such locations may involve aerial photograph interpretation, interpretation of satellite images, geophysical investigations, stratigraphic facies mapping, outcrop studies of fracture density and orientation, geomorphology investigation, calibration of groundwater flow models, and other geologic studies, as appropriate for a particular terrane.

The concept of exceedance probability is also potentially useful in petroleum exploration and mineral exploration.  It is widely used in flood hazard reporting.

Reference for Aquifer Exploration Probability

Dixon, Wilfrid J. and Frank J. Massey, Jr. (1957): Introduction to Statistical Analysis, Second Edition; McGraw-Hill.

Revised May 26, 2023