@@ -26,10 +26,10 @@ numeric (:class:`Real`-valued) data.
2626 Unless explicitly noted otherwise, these functions support :class: `int `,
2727 :class: `float `, :class: `decimal.Decimal ` and :class: `fractions.Fraction `.
2828 Behaviour with other types (whether in the numeric tower or not) is
29- currently unsupported. Mixed types are also undefined and
30- implementation-dependent. If your input data consists of mixed types,
31- you may be able to use :func: `map ` to ensure a consistent result, e.g.
32- ``map(float, input_data) ``.
29+ currently unsupported. Collections with a mix of types are also undefined
30+ and implementation-dependent. If your input data consists of mixed types,
31+ you may be able to use :func: `map ` to ensure a consistent result, for
32+ example: ``map(float, input_data) ``.
3333
3434Averages and measures of central location
3535-----------------------------------------
@@ -102,11 +102,9 @@ However, for reading convenience, most of the examples show sorted sequences.
102102 .. note ::
103103
104104 The mean is strongly affected by outliers and is not a robust estimator
105- for central location: the mean is not necessarily a typical example of the
106- data points. For more robust, although less efficient, measures of
107- central location, see :func: `median ` and :func: `mode `. (In this case,
108- "efficient" refers to statistical efficiency rather than computational
109- efficiency.)
105+ for central location: the mean is not necessarily a typical example of
106+ the data points. For more robust measures of central location, see
107+ :func: `median ` and :func: `mode `.
110108
111109 The sample mean gives an unbiased estimate of the true population mean,
112110 which means that, taken on average over all the possible samples,
@@ -120,9 +118,8 @@ However, for reading convenience, most of the examples show sorted sequences.
120118 Convert *data * to floats and compute the arithmetic mean.
121119
122120 This runs faster than the :func: `mean ` function and it always returns a
123- :class: `float `. The result is highly accurate but not as perfect as
124- :func: `mean `. If the input dataset is empty, raises a
125- :exc: `StatisticsError `.
121+ :class: `float `. The *data * may be a sequence or iterator. If the input
122+ dataset is empty, raises a :exc: `StatisticsError `.
126123
127124 .. doctest ::
128125
@@ -136,15 +133,20 @@ However, for reading convenience, most of the examples show sorted sequences.
136133
137134 Convert *data * to floats and compute the geometric mean.
138135
136+ The geometric mean indicates the central tendency or typical value of the
137+ *data * using the product of the values (as opposed to the arithmetic mean
138+ which uses their sum).
139+
139140 Raises a :exc: `StatisticsError ` if the input dataset is empty,
140141 if it contains a zero, or if it contains a negative value.
142+ The *data * may be a sequence or iterator.
141143
142144 No special efforts are made to achieve exact results.
143145 (However, this may change in the future.)
144146
145147 .. doctest ::
146148
147- >>> round (geometric_mean([54 , 24 , 36 ]), 9 )
149+ >>> round (geometric_mean([54 , 24 , 36 ]), 1 )
148150 36.0
149151
150152 .. versionadded :: 3.8
@@ -174,7 +176,7 @@ However, for reading convenience, most of the examples show sorted sequences.
174176 3.6
175177
176178 Using the arithmetic mean would give an average of about 5.167, which
177- is too high .
179+ is well over the aggregate P/E ratio .
178180
179181 :exc: `StatisticsError ` is raised if *data * is empty, or any element
180182 is less than zero.
@@ -312,10 +314,10 @@ However, for reading convenience, most of the examples show sorted sequences.
312314 The mode (when it exists) is the most typical value and serves as a
313315 measure of central location.
314316
315- If there are multiple modes, returns the first one encountered in the * data *.
316- If the smallest or largest of multiple modes is desired instead, use
317- ``min(multimode(data)) `` or ``max(multimode(data)) ``. If the input * data * is
318- empty, :exc: `StatisticsError ` is raised.
317+ If there are multiple modes with the same frequency, returns the first one
318+ encountered in the * data *. If the smallest or largest of those is
319+ desired instead, use ``min(multimode(data)) `` or ``max(multimode(data)) ``.
320+ If the input * data * is empty, :exc: `StatisticsError ` is raised.
319321
320322 ``mode `` assumes discrete data, and returns a single value. This is the
321323 standard treatment of the mode as commonly taught in schools:
@@ -325,8 +327,8 @@ However, for reading convenience, most of the examples show sorted sequences.
325327 >>> mode([1 , 1 , 2 , 3 , 3 , 3 , 3 , 4 ])
326328 3
327329
328- The mode is unique in that it is the only statistic which also applies
329- to nominal (non-numeric) data:
330+ The mode is unique in that it is the only statistic in this package that
331+ also applies to nominal (non-numeric) data:
330332
331333 .. doctest ::
332334
@@ -368,15 +370,16 @@ However, for reading convenience, most of the examples show sorted sequences.
368370
369371.. function :: pvariance(data, mu=None)
370372
371- Return the population variance of *data *, a non-empty iterable of real-valued
372- numbers. Variance, or second moment about the mean, is a measure of the
373- variability (spread or dispersion) of data. A large variance indicates that
374- the data is spread out; a small variance indicates it is clustered closely
375- around the mean.
373+ Return the population variance of *data *, a non-empty sequence or iterator
374+ of real-valued numbers. Variance, or second moment about the mean, is a
375+ measure of the variability (spread or dispersion) of data. A large
376+ variance indicates that the data is spread out; a small variance indicates
377+ it is clustered closely around the mean.
376378
377- If the optional second argument *mu * is given, it should be the mean of
378- *data *. If it is missing or ``None `` (the default), the mean is
379- automatically calculated.
379+ If the optional second argument *mu * is given, it is typically the mean of
380+ the *data *. It can also be used to compute the second moment around a
381+ point that is not the mean. If it is missing or ``None `` (the default),
382+ the arithmetic mean is automatically calculated.
380383
381384 Use this function to calculate the variance from the entire population. To
382385 estimate the variance from a sample, the :func: `variance ` function is usually
@@ -401,10 +404,6 @@ However, for reading convenience, most of the examples show sorted sequences.
401404 >>> pvariance(data, mu)
402405 1.25
403406
404- This function does not attempt to verify that you have passed the actual mean
405- as *mu *. Using arbitrary values for *mu * may lead to invalid or impossible
406- results.
407-
408407 Decimals and Fractions are supported:
409408
410409 .. doctest ::
@@ -423,11 +422,11 @@ However, for reading convenience, most of the examples show sorted sequences.
423422 σ². When called on a sample instead, this is the biased sample variance
424423 s², also known as variance with N degrees of freedom.
425424
426- If you somehow know the true population mean μ, you may use this function
427- to calculate the variance of a sample, giving the known population mean as
428- the second argument. Provided the data points are representative
429- (e.g. independent and identically distributed) , the result will be an
430- unbiased estimate of the population variance.
425+ If you somehow know the true population mean μ, you may use this
426+ function to calculate the variance of a sample, giving the known
427+ population mean as the second argument. Provided the data points are a
428+ random sample of the population , the result will be an unbiased estimate
429+ of the population variance.
431430
432431
433432.. function :: stdev(data, xbar=None)
@@ -502,19 +501,19 @@ However, for reading convenience, most of the examples show sorted sequences.
502501 :func: `pvariance ` function as the *mu * parameter to get the variance of a
503502 sample.
504503
505- .. function :: quantiles(dist , *, n=4, method='exclusive')
504+ .. function :: quantiles(data , *, n=4, method='exclusive')
506505
507- Divide *dist * into *n * continuous intervals with equal probability.
506+ Divide *data * into *n * continuous intervals with equal probability.
508507 Returns a list of ``n - 1 `` cut points separating the intervals.
509508
510509 Set *n * to 4 for quartiles (the default). Set *n * to 10 for deciles. Set
511510 *n * to 100 for percentiles which gives the 99 cuts points that separate
512- *dist * in to 100 equal sized groups. Raises :exc: `StatisticsError ` if *n *
511+ *data * in to 100 equal sized groups. Raises :exc: `StatisticsError ` if *n *
513512 is not least 1.
514513
515- The *dist * can be any iterable containing sample data or it can be an
514+ The *data * can be any iterable containing sample data or it can be an
516515 instance of a class that defines an :meth: `~inv_cdf ` method. For meaningful
517- results, the number of data points in *dist * should be larger than *n *.
516+ results, the number of data points in *data * should be larger than *n *.
518517 Raises :exc: `StatisticsError ` if there are not at least two data points.
519518
520519 For sample data, the cut points are linearly interpolated from the
@@ -523,7 +522,7 @@ However, for reading convenience, most of the examples show sorted sequences.
523522 cut-point will evaluate to ``104 ``.
524523
525524 The *method * for computing quantiles can be varied depending on
526- whether the data in *dist * includes or excludes the lowest and
525+ whether the data in *data * includes or excludes the lowest and
527526 highest possible values from the population.
528527
529528 The default *method * is "exclusive" and is used for data sampled from
@@ -535,14 +534,14 @@ However, for reading convenience, most of the examples show sorted sequences.
535534
536535 Setting the *method * to "inclusive" is used for describing population
537536 data or for samples that are known to include the most extreme values
538- from the population. The minimum value in *dist * is treated as the 0th
537+ from the population. The minimum value in *data * is treated as the 0th
539538 percentile and the maximum value is treated as the 100th percentile.
540539 The portion of the population falling below the *i-th * of *m * sorted
541540 data points is computed as ``(i - 1) / (m - 1) ``. Given 11 sample
542541 values, the method sorts them and assigns the following percentiles:
543542 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%.
544543
545- If *dist * is an instance of a class that defines an
544+ If *data * is an instance of a class that defines an
546545 :meth: `~inv_cdf ` method, setting *method * has no effect.
547546
548547 .. doctest ::
@@ -580,7 +579,7 @@ A single exception is defined:
580579:class: `NormalDist ` is a tool for creating and manipulating normal
581580distributions of a `random variable
582581<http://www.stat.yale.edu/Courses/1997-98/101/ranvar.htm> `_. It is a
583- composite class that treats the mean and standard deviation of data
582+ class that treats the mean and standard deviation of data
584583measurements as a single entity.
585584
586585Normal distributions arise from the `Central Limit Theorem
@@ -616,13 +615,14 @@ of applications in statistics.
616615
617616 .. classmethod :: NormalDist.from_samples(data)
618617
619- Makes a normal distribution instance computed from sample data. The
620- *data * can be any :term: `iterable ` and should consist of values that
621- can be converted to type :class: `float `.
618+ Makes a normal distribution instance with *mu * and *sigma * parameters
619+ estimated from the *data * using :func: `fmean ` and :func: `stdev `.
622620
623- If *data * does not contain at least two elements, raises
624- :exc: `StatisticsError ` because it takes at least one point to estimate
625- a central value and at least two points to estimate dispersion.
621+ The *data * can be any :term: `iterable ` and should consist of values
622+ that can be converted to type :class: `float `. If *data * does not
623+ contain at least two elements, raises :exc: `StatisticsError ` because it
624+ takes at least one point to estimate a central value and at least two
625+ points to estimate dispersion.
626626
627627 .. method :: NormalDist.samples(n, *, seed=None)
628628
@@ -636,10 +636,10 @@ of applications in statistics.
636636 .. method :: NormalDist.pdf(x)
637637
638638 Using a `probability density function (pdf)
639- <https://en.wikipedia.org/wiki/Probability_density_function> `_,
640- compute the relative likelihood that a random variable *X * will be near
641- the given value *x *. Mathematically, it is the ratio ``P(x <= X <
642- x+dx) / dx ``.
639+ <https://en.wikipedia.org/wiki/Probability_density_function> `_, compute
640+ the relative likelihood that a random variable *X * will be near the
641+ given value *x *. Mathematically, it is the limit of the ratio ``P(x <=
642+ X < x+dx) / dx `` as * dx * approaches zero .
643643
644644 The relative likelihood is computed as the probability of a sample
645645 occurring in a narrow range divided by the width of the range (hence
@@ -667,8 +667,10 @@ of applications in statistics.
667667
668668 .. method :: NormalDist.overlap(other)
669669
670- Returns a value between 0.0 and 1.0 giving the overlapping area for
671- the two probability density functions.
670+ Measures the agreement between two normal probability distributions.
671+ Returns a value between 0.0 and 1.0 giving `the overlapping area for
672+ the two probability density functions
673+ <https://www.rasch.org/rmt/rmt101r.htm> `_.
672674
673675 Instances of :class: `NormalDist ` support addition, subtraction,
674676 multiplication and division by a constant. These operations
@@ -740,12 +742,11 @@ Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
740742 ... return (3 * x + 7 * x* y - 5 * y) / (11 * z)
741743 ...
742744 >>> n = 100_000
743- >>> seed = 86753099035768
744- >>> X = NormalDist(10 , 2.5 ).samples(n, seed = seed)
745- >>> Y = NormalDist(15 , 1.75 ).samples(n, seed = seed)
746- >>> Z = NormalDist(50 , 1.25 ).samples(n, seed = seed)
747- >>> NormalDist.from_samples(map (model, X, Y, Z)) # doctest: +SKIP
748- NormalDist(mu=1.8661894803304777, sigma=0.65238717376862)
745+ >>> X = NormalDist(10 , 2.5 ).samples(n, seed = 3652260728 )
746+ >>> Y = NormalDist(15 , 1.75 ).samples(n, seed = 4582495471 )
747+ >>> Z = NormalDist(50 , 1.25 ).samples(n, seed = 6582483453 )
748+ >>> quantiles(map (model, X, Y, Z)) # doctest: +SKIP
749+ [1.4591308524824727, 1.8035946855390597, 2.175091447274739]
749750
750751Normal distributions commonly arise in machine learning problems.
751752
0 commit comments