Variogram Class¶

class skgstat.Variogram(coordinates=None, values=None, estimator='matheron', model='spherical', dist_func='euclidean', bin_func='even', normalize=False, fit_method='trf', fit_sigma=None, use_nugget=False, maxlag=None, samples=None, n_lags=10, verbose=False, **kwargs)[source]¶

Variogram Class

Calculates a variogram of the separating distances in the given coordinates and relates them to one of the semi-variance measures of the given dependent values.

__init__(coordinates=None, values=None, estimator='matheron', model='spherical', dist_func='euclidean', bin_func='even', normalize=False, fit_method='trf', fit_sigma=None, use_nugget=False, maxlag=None, samples=None, n_lags=10, verbose=False, **kwargs)[source]¶

Variogram Class

Parameters

coordinates (numpy.ndarray, MetricSpace) –

Changed in version 0.5.0: now accepts MetricSpace

Array of shape (m, n). Will be used as m observation points of n-dimensions. This variogram can be calculated on 1 - n dimensional coordinates. In case a 1-dimensional array is passed, a second array of same length containing only zeros will be stacked to the passed one. For very large datasets, you can set maxlag to only calculate distances within the maximum lag in a sparse matrix. Alternatively you can supply a MetricSpace (optionally with a max_dist set for the same effect). This is useful if you’re creating many different variograms for different measured parameters that are all measured at the same set of coordinates, as distances will only be calculated once, instead of once per variogram.
values (numpy.ndarray) – Array of values observed at the given coordinates. The length of the values array has to match the m dimension of the coordinates array. Will be used to calculate the dependent variable of the variogram.
estimator (str, callable) –
String identifying the semi-variance estimator to be used. Defaults to the Matheron estimator. Possible values are:
- matheron [Matheron, default]
- cressie [Cressie-Hawkins]
- dowd [Dowd-Estimator]
- genton [Genton]
- minmax [MinMax Scaler]
- entropy [Shannon Entropy]
If a callable is passed, it has to accept an array of absoulte differences, aligned to the 1D distance matrix (flattened upper triangle) and return a scalar, that converges towards small values for similarity (high covariance).
model (str) –
String identifying the theoretical variogram function to be used to describe the experimental variogram. Can be one of:
- spherical [Spherical, default]
- exponential [Exponential]
- gaussian [Gaussian]
- cubic [Cubic]
- stable [Stable model]
- matern [Matérn model]
- nugget [nugget effect variogram]
dist_func (str) – String identifying the distance function. Defaults to ‘euclidean’. Can be any metric accepted by scipy.spatial.distance.pdist. Additional parameters are not (yet) passed through to pdist. These are accepted by pdist for some of the metrics. In these cases the default values are used.
bin_func (str | Callable | Iterable) –

Changed in version 0.3.8: added ‘fd’, ‘sturges’, ‘scott’, ‘sqrt’, ‘doane’

Changed in version 0.3.9: added ‘kmeans’, ‘ward’

String identifying the binning function used to find lag class edges. All methods calculate bin edges on the interval [0, maxlag[. Possible values are:
- ’even’ (default) finds n_lags same width bins
- ’uniform’ forms n_lags bins of same data count
- ’fd’ applies Freedman-Diaconis estimator to find n_lags
- ’sturges’ applies Sturge’s rule to find n_lags.
- ’scott’ applies Scott’s rule to find n_lags
- ’doane’ applies Doane’s extension to Sturge’s rule to find n_lags
- ’sqrt’ uses the square-root of distance as n_lags.
- ’kmeans’ uses KMeans clustering to well supported bins
- ’ward’ uses hierachical clustering to find minimum-variance clusters.
More details are given in the documentation for set_bin_func.
normalize (bool) – Defaults to False. If True, the independent and dependent variable will be normalized to the range [0,1].
fit_method (str | None) –

Changed in version 0.3.10: Added ‘ml’ and ‘custom’

String identifying the method to be used for fitting the theoretical variogram function to the experimental. If None is passed, the fit does not run. More info is given in the Variogram.fit docs. Can be one of:
- ’lm’: Levenberg-Marquardt algorithm for unconstrained problems. This is the faster algorithm, yet is the fitting of a variogram not unconstrianed.
- ’trf’: Trust Region Reflective function for non-linear constrained problems. The class will set the boundaries itself. This is the default function.
- ’ml’: Maximum-Likelihood estimation. With the current implementation only the Nelder-Mead solver for unconstrained problems is implemented. This will estimate the variogram parameters from a Gaussian parameter space by minimizing the negative log-likelihood.
- ’manual’: Manual fitting. You can set the range, sill and nugget either directly to the fit function, or as fit_ prefixed keyword arguments on Variogram instantiation.
fit_sigma (numpy.ndarray, str) –
Defaults to None. The sigma is used as measure of uncertainty during variogram fit. If fit_sigma is an array, it has to hold n_lags elements, giving the uncertainty for all lags classes. If fit_sigma is None (default), it will give no weight to any lag. Higher values indicate higher uncertainty and will lower the influcence of the corresponding lag class for the fit. If fit_sigma is a string, a pre-defined function of separating distance will be used to fill the array. Can be one of:
- ’linear’: Linear loss with distance. Small bins will have higher impact.
- ’exp’: The weights decrease by a e-function of distance
- ’sqrt’: The weights decrease by the squareroot of distance
- ’sq’: The weights decrease by the squared distance.
More info is given in the Variogram.fit_sigma documentation.
use_nugget (bool) – Defaults to False. If True, a nugget effet will be added to all Variogram.models as a third (or fourth) fitting parameter. A nugget is essentially the y-axis interception of the theoretical variogram function.
maxlag (float, str) – Can specify the maximum lag distance directly by giving a value larger than 1. The binning function will not find any lag class with an edge larger than maxlag. If 0 < maxlag < 1, then maxlag is relative and maxlag * max(Variogram.distance) will be used. In case maxlag is a string it has to be one of ‘median’, ‘mean’. Then the median or mean of all Variogram.distance will be used. Note maxlag=0.5 will use half the maximum separating distance, this is not the same as ‘median’, which is the median of all separating distances
samples (float, int) – If set to a non-None value point pairs are sampled randomly. Two random subset of all points are chosen, and the distance matrix is calculated only between these two subsets. The size of each subset is set by samples: if < 1 it specifies a fraction of all points, if >= 1 it specifies the number of points in each subset.
n_lags (int) – Specify the number of lag classes to be defined by the binning function.
verbose (bool) – Set the Verbosity of the class. Not Implemented yet.

Keyword Arguments

entropy_bins (int, str) –

New in version 0.3.7.

If the estimator <skgstat.Variogram.estimator> is set to ‘entropy’ this argument sets the number of bins, that should be used for histogram calculation.
percentile (int) –

New in version 0.3.7.

If the estimator <skgstat.Variogram.estimator> is set to ‘entropy’ this argument sets the percentile to be used.
binning_random_state (int, None) –

New in version 0.3.9.

If bin_func is ‘kmeans’ this can overwrite the seed for the initial guess of the cluster centroids. Note, that K-Means is not deterministic and is therefore seeded to 42 here. You can pass None to disable this behavior, but use it with care, as you will get different results.
binning_agg_func (str) –

New in version 0.3.10.

If bin_func is ‘ward’ this keyword argument can switch from default mean aggregation to median aggregation for calculating the cluster centroids.
obs_sigma (int, float) –

New in version 0.6.0.

If set, the Variogram will use this sigma as the standard deviation of the observations passed as values. Using a MonteCarlo simulation the uncertainties are propagated into the experimental variogram. If present, the plot will indicate the confidence interval as error bars around the experimental variogram.

NS¶

Nash Sutcliffe efficiency of the fitted Variogram

Returns

aic¶

bic¶

bin_func¶

Binning function

Returns an instance of the function used for binning the separating distances into the given amount of bins. Both functions use the same signature of func(distances, n, maxlag).

The setter of this property utilizes the Variogram.set_bin_func to set a new function.

Returns: binning_function
Return type: function

See also

Variogram.set_bin_func

bins¶

Distance lag bins

Independent variable of the the experimental variogram sample. The bins are the upper edges of all calculated distance lag classes. If you need bin centers, use get_empirical.

Returns: bins – 1D array of the distance lag classes.
Return type: numpy.ndarray

See also

Variogram.get_empirical

clone()[source]¶

Deep copy of self

Return a deep copy of self.

Returns
Return type: Variogram

coordinates¶

Coordinates property

Array of observation locations the variogram is build for. This property has no setter. If you want to change the coordinates, use a new Variogram instance.

Returns: coordinates
Return type: numpy.array

cross_validate(method: str = 'jacknife', n: int = None, metric: str = 'rmse', seed=None) → float[source]¶

Cross validation of the variogram model by means of Kriging. Right now, this function can only utilize a jacknife (leave-one-out) cross validation and will only use the builtin OrdinaryKriging method (not yet the to_gs_krige interface).

Parameters

method (str) – Right now, ‘jacknife’ is the only possible input.
n (int) – The number of points to be included into the cross-validation. If None (default), all points will be used.
metric (str) – Metric used for cross-validation. Can be root mean square error (rmse), mean squared error (mse) or mean absolute error (mae).
seed (int) – If n is not None, the random selection of input data for the cross-validation can be seeded.

Returns

metric – The cross-validation result as specified above.

Return type

float

data(n=100, force=False)[source]¶

Theoretical variogram function

Calculate the experimental variogram and apply the binning. On success, the variogram model will be fitted and applied to n lag values. Returns the lags and the calculated semi-variance values. If force is True, a clean preprocessing and fitting run will be executed.

Parameters

n (integer) – length of the lags array to be used for fitting. Defaults to 100, which will be fine for most plots
force (boolean) – If True, the preprocessing and fitting will be executed as a clean run. This will force all intermediate results to be recalculated. Defaults to False

Returns

variogram – first element is the created lags array second element are the calculated semi-variance values

Return type

tuple

describe(short=False, flat=False)[source]¶

Variogram parameters

Return a dictionary of the variogram parameters.

Changed in version 0.3.7: The describe now returns all init parameters in as the describe()[‘params’] key and all keyword arguments as describe()[‘kwargs’]. This output can be suppressed by setting short=True.

Parameters

short (bool) – If True, the ‘params’ and ‘kwargs’ keys will be omitted. Defaults to False.
flat (bool) – If True, the ‘params’ and ‘kwargs’ nested dict`s will be distributed to the main `dict to return a flat dict. Defaults to False

Returns

parameters – Returns fitting parameters of the theoretical variogram model along with the init parameters of the Variogram <skgstat.Variogram> instance.

Return type

dict

dim¶: Input coordinates dimensionality.

distance_difference_plot(ax=None, plot_bins=True, show=True)[source]¶

Raw distance plot

Plots all absoulte value differences of all point pair combinations over their separating distance, without sorting them into a lag.

Changed in version 0.4.0: This plot can be plotted with the plotly plotting backend

Parameters

ax (None, AxesSubplot) – If None, a new matplotlib.Figure will be created. In case a Figure was already created, pass the Subplot to use as ax argument.
plot_bins (bool) – If True (default) the bin edges will be included into the plot.
show (bool) – If True (default), the show method of the Figure will be called before returning the Figure. Can be set to False, to avoid doubled figure rendering in Jupyter notebooks.

Returns

Return type

matplotlib.pyplot.Figure

experimental¶

Experimental Variogram

Array of experimental (empirical) semivariance values. The array length will be aligned to Variogram.bins. The current Variogram.estimator has been used to calculate the values. Depending on the setting of Variogram.harmonize (True | False), either Variogram._experimental or Variogram.isotonic will be returned.

Returns: vario – Array of the experimental semi-variance values aligned to Variogram.bins.
Return type: numpy.ndarray

See also

Variogram._experimental, Variogram.isotonic

fit(force=False, method=None, sigma=None, **kwargs)[source]¶

Fit the variogram

The fit function will fit the theoretical variogram function to the experimental. The preprocessed distance matrix, pairwise differences and binning will not be recalculated, if already done. This could be forced by setting the force parameter to true.

In case you call fit function directly, with method or sigma, the parameters set on Variogram object instantiation will get overwritten. All other keyword arguments will be passed to scipy.optimize.curve_fit function.

Changed in version 0.3.10: added ‘ml’ and ‘custom’ method.

Parameters

force (bool) – If set to True, a clean preprocessing of the distance matrix, pairwise differences and the binning will be forced. Default is False.
method (string) –
A string identifying one of the implemented fitting procedures. Can be one of:
- lm: Levenberg-Marquardt algorithms implemented in scipy.optimize.leastsq function.
- trf: Trust Region Reflective algorithm implemented in scipy.optimize.least_squares(method=’trf’)
- ’ml’: Maximum-Likelihood estimation. With the current implementation only the Nelder-Mead solver for unconstrained problems is implemented. This will estimate the variogram parameters from a Gaussian parameter space by minimizing the negative log-likelihood.
- ’manual’: Manual fitting. You can set the range, sill and nugget either directly to the fit function, or as fit_ prefixed keyword arguments on Variogram instantiation.

sigmastring, array: Uncertainty array for the bins. Has to have the same dimension as self.bins. Refer to Variogram.fit_sigma for more information.

Returns
Return type: void

See also

scipy.optimize.minimize(), scipy.optimize.curve_fit(), scipy.optimize.leastsq(), scipy.optimize.least_squares()

fit_method¶

New in version 0.6.2.

Set the fit method to be used for this Variogram instance. Possible values are:

'trf' - Trust-Region Reflective (default)
'lm' - Levenberg-Marquardt
'ml' - Maximum Likelihood estimation
‘manual’` - Manual fitting by setting the parameters

Changed in version 0.6.6: Passing None will prevent the fitting procedure from running.

Notes

The default method (TRF) is a bounded least squares method, that sets constraints to the value space of all parameters. All methods use an initial guess for all used parameters. This is max(bins) for the range, max(experimental) for the sill, 20 for the Matérn smoothness, 2 for the stable model shape and 1 for the nugget if used.

fit_sigma¶

Fitting Uncertainty

Set or calculate an array of observation uncertainties aligned to the Variogram.bins. These will be used to weight the observations in the cost function, which divides the residuals by their uncertainty.

When setting fit_sigma, the array of uncertainties itself can be given, or one of the strings: [‘linear’, ‘exp’, ‘sqrt’, ‘sq’, ‘entropy’]. The parameters described below refer to the setter of this property.

Changed in version 0.3.11: added the ‘entropy’ option.

Parameters

sigma (string, array) –

Sigma can either be an array of discrete uncertainty values, which have to align to the Variogram.bins, or of type string. Then, the weights for fitting are calculated as a function of (lag) distance.

sigma=’linear’: The residuals get weighted by the lag distance normalized to the maximum lag distance, denoted as \(w_n\)

sigma=’exp’: The residuals get weighted by the function: \(w = e^{1 / w_n}\)

sigma=’sqrt’: The residuals get weighted by the function: \(w = \sqrt(w_n)\)

sigma=’sq’: The residuals get weighted by the function: \(w = w_n^2\)

sigma=’entropy’: Calculates the Shannon Entropy as intrinsic uncertainty of each lag class.

Returns

Return type

void

Notes

The cost function is defined as:

\[chisq = \sum {\frac{r}{\sigma}}^2\]

where r are the residuals between the experimental variogram and the modeled values for the same lag. Following this function, small values will increase the influence of that residual, while a very large sigma will cause the observation to be ignored.

See also

scipy.optimize.curve_fit

fitted_model¶

Fitted Model

Returns a callable that takes a distance value and returns a semivariance. This model is fitted to the current Variogram parameters. The function will be interpreted at return time with the parameters hard-coded into the function code.

Returns: model – The current semivariance model fitted to the current Variogram model parameters.
Return type: callable

get_empirical(bin_center=False)[source]¶

Empirical variogram

Returns a tuple of dependent and independent sample values, this Variogram is estimated for. This is a tuple of the current bins and experimental semi-variance values. By default the upper bin edges are used. This can be set to bin center by the bin_center argument.

Parameters

bin_center (bool) – If set to True, the center for each distance lag bin is used over the upper limit (default).

Returns

bins (numpy.ndarray) – 1D array of n_lags distance lag bins.
experimental (numpy.ndarray) – 1D array of n_lags experimental semi-variance values.

lag_classes()[source]¶

Iterate over the lag classes

Generates an iterator over all lag classes. Can be zipped with Variogram.bins to identify the lag.

Changed in version 0.3.6: yields an empty array for empty lag groups now

Returns
Return type: iterable

lag_groups()[source]¶

Lag class groups

Retuns a mask array with as many elements as self._diff has, identifying the lag class group for each pairwise difference. Can be used to extract all pairwise values within the same lag bin.

Returns
Return type: numpy.ndarray

See also

Variogram.lag_classes()

location_trend(axes=None, show=True, **kwargs)[source]¶

Location Trend plot

Plots the values over each dimension of the coordinates in a scatter plot. This will visually show correlations between the values and any of the coordinate dimension. If there is a value dependence on the location, this would violate the intrinsic hypothesis. This is a weaker form of stationarity of second order.

Changed in version 0.4.0: This plot can be plotted with the plotly plotting backend

Parameters

axes (list) – Can be None (default) or a list of matplotlib.AxesSubplots. If a list is passed, the location trend plots will be plotted on the given instances. Note that then length of the list has to match the dimeonsionality of the coordinates array. In case 3D coordinates are used, three subplots have to be given.
show (boolean) – If True (default), the show method of the Figure will be called. Can be set to False to prevent duplicated plots in some environments.

Keyword Arguments

add_trend_line (bool) –

New in version 0.3.5.

If set to True, the class will fit a linear model to each coordinate dimension and output the model along with a calculated R². With high R² values, you should consider rejecting the input data, or transforming it.

Note

Right now, this is only supported for 'plotly' backend

Returns

fig – The figure produced by the function. Dependends on the current backend.

Return type

matplotlib.Figure, plotly.graph_objects.Figure

mae¶

RMSE

Calculate the Mean absolute error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure.

Returns
Return type: float

See also

Variogram.residuals

Notes

The MAE is implemented like:

\[MAE = \frac{\sum_{i=0}^{i=N(x)} |x-y|}{N(x)}\]

maxlag¶

Maximum lag distance to be considered in this Variogram instance. You can limit the distance at which point pairs are calcualted. There are three possible ways how to do that, in absoulte lag units, which is a number larger one. Secondly, a number 0 < maxlag < 1 can be set, which will use this share of the maximum distance as maxlag. Lastly, a string can be set: 'mean' and 'median' for the mean or median value of the distance matrix.

Notes

This setting is largely flexible, but all options except the absolute limit in lag units need the full distance matrix to be calculated. Hence, it does not speed up the calculation of large distance matrices, just the estimation of the variogram. Thus, if you pre-calcualte the distance matrix using MetricSpace, only absoulte limits can be used.

mean_residual¶

Mean Model residuals

Calculates the mean, absoulte deviations between the experimental variogram and theretical model values.

Returns
Return type: float

metric_space¶

New in version 0.5.6.

MetricSpace representation of the input coordinates. A MetricSpace can be used to pass pre-calculated coordinates to other Variogram instances.

Returns: metric_space
Return type: skgstat.MetricSpace

See also

Variogram.coordinates: coordinate representation

model_deviations()[source]¶

Model Deviations

Calculate the deviations between the experimental variogram and the recalculated values for the same bins using the fitted theoretical variogram function. Can be utilized to calculate a quality measure for the variogram fit.

Returns: deviations – first element is the experimental variogram second element are the corresponding values of the theoretical model.
Return type: tuple

mse¶

RMSE

Calculate the Mean squared error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure.

Returns
Return type: float

See also

Variogram.residuals

Notes

The MSE is implemented like:

\[MSE = \frac{\sum_{i=0}^{i=N(x)} (x-y)^2}{N(x)}\]

n_lags¶

Number of lag bins

Pass the number of lag bins to be used on this Variogram instance. This will reset the grouping index and fitting parameters

nrmse¶

NRMSE

Calculate the normalized root mean squared error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure

Returns
Return type: float

See also

Variogram.residuals, Variogram.rmse

Notes

The NRMSE is implemented as:

\[NRMSE = \frac{RMSE}{mean(y)}\]

where RMSE is Variogram.rmse and y is Variogram.experimental

nrmse_r¶

NRMSE

Alternative normalized root mean squared error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure.

Returns
Return type: float

See also

Variogram.rmse, Variogram.nrmse

Notes

Unlike Variogram.nrmse, nrmse_r is not normalized to the mean of y, but the differece of the maximum y to its mean:

\[NRMSE_r = \frac{RMSE}{max(y) - mean(y)}\]

parameters¶

Extract just the variogram parameters range, sill and nugget from the describe output.

Returns: params – [range, sill, nugget] for most models and [range, sill, shape, nugget] for matern and stable model.
Return type: list

plot(axes=None, grid=True, show=True, hist=True)[source]¶

Variogram Plot

Plot the experimental variogram, the fitted theoretical function and an histogram for the lag classes. The axes attribute can be used to pass a list of AxesSubplots or a single instance to the plot function. Then these Subplots will be used. If only a single instance is passed, the hist attribute will be ignored as only the variogram will be plotted anyway.

Changed in version 0.4.0: This plot can be plotted with the plotly plotting backend

Parameters

axes (list, tuple, array, AxesSubplot or None) – If None, the plot function will create a new matplotlib figure. Otherwise a single instance or a list of AxesSubplots can be passed to be used. If a single instance is passed, the hist attribute will be ignored.
grid (bool) – Defaults to True. If True a custom grid will be drawn through the lag class centers
show (bool) – Defaults to True. If True, the show method of the passed or created matplotlib Figure will be called before returning the Figure. This should be set to False, when used in a Notebook, as a returned Figure object will be plotted anyway.
hist (bool) – Defaults to True. If False, the creation of a histogram for the lag classes will be suppressed.

Returns

Return type

matplotlib.Figure

preprocessing(force=False)[source]¶

Preprocessing function

Prepares all input data for the fit and transform functions. Namely, the distances are calculated and the value differences. Then the binning is set up and bin edges are calculated. If any of the listed subsets are already prepared, their processing is skipped. This behaviour can be changed by the force parameter. This will cause a clean preprocessing.

Parameters: force (bool) – If set to True, all preprocessing data sets will be deleted. Use it in case you need a clean preprocessing.
Returns
Return type: void

r¶

Pearson correlation of the fitted Variogram

Returns

residuals¶

Model residuals

Calculate the model residuals defined as the differences between the experimental variogram and the theoretical model values at corresponding lag values

Returns
Return type: numpy.ndarray

rmse¶

RMSE

Calculate the Root Mean squared error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure.

Returns
Return type: float

See also

Variogram.residuals

Notes

The RMSE is implemented like:

\[RMSE = \sqrt{\frac{\sum_{i=0}^{i=N(x)} (x-y)^2}{N(x)}}\]

scattergram(ax=None, show=True, **kwargs)[source]¶

Scattergram plot

Groups the values by lags and plots the head and tail values of all point pairs within the groups against each other. This can be used to investigate the distribution of the value residuals.

Changed in version 0.4.0: This plot can be plotted with the plotly plotting backend

Parameters

ax (matplotlib.Axes, plotly.graph_objects.Figure) – If None, a new plotting Figure will be created. If given, it has to be an instance of the used plotting backend, which will be used to plot on.
show (boolean) – If True (default), the show method of the Figure will be called. Can be set to False to prevent duplicated plots in some environments.

Returns

fig – Resulting figure, depending on the plotting backend

Return type

matplotlib.Figure, plotly.graph_objects.Figure

set_bin_func(bin_func: Union[str, Iterable[T_co], Callable[[numpy.ndarray, float, float], Tuple[numpy.ndarray, float]]])[source]¶

Set binning function

Sets a new binning function to be used. The new binning method is set by either a string identifying the new function to be used, or an iterable containing the bin edges, or any function that can compute bins from the distances, number of lags and maximum lag. The string can be one of: [‘even’, ‘uniform’, ‘fd’,

‘sturges’, ‘scott’, ‘sqrt’, ‘doane’].

If the number of lag classes should be estimated automatically, it is recommended to use ‘ sturges’ for small, normal distributed locations and ‘fd’ or ‘scott’ for large datasets, where ‘fd’ is more robust to outliers. ‘sqrt’ is by far the fastest estimator. ‘doane’ is an extension of Sturge’s rule for non-normal distributed data.

Changed in version 0.3.8: added ‘fd’, ‘sturges’, ‘scott’, ‘sqrt’, ‘doane’

Changed in version 0.3.9: added ‘kmeans’, ‘ward’

Changed in version 0.4.0: added ‘stable_entropy’

Changed in version 0.4.1: refactored local wrapper function definition. The wrapper to pass kwargs to the binning functions is now implemented as a instance method, to make it pickleable.

Changed in version 0.6.5: added iterable and function as arguments to allow for custom bins.

Parameters

bin_func (str | Iterable | Callable) –

Can be one of:

’even’

’uniform’

’fd’

’sturges’

’scott’

’sqrt’

’doane’

’kmeans’

’ward’

’stable_entropy’

Returns

Return type

void

Notes

`’even’`: Use skgstat.binning.even_width_lags for using n_lags lags of equal width up to maxlag.

`’uniform’`: Use skgstat.binning.uniform_count_lags for using n_lags lags up to maxlag in which the pairwise differences follow a uniform distribution.

`’sturges’`: estimates the number of evenly distributed lag classes (n) by Sturges rule 101:

\[n = log_2 n + 1\]

`’scott’`: estimates the lag class widths (h) by Scott’s rule 102:

\[h = \sigma \frac{24 * \sqrt{\pi}}{n}^{\frac{1}{3}}\]

`’sqrt’`: estimates the number of lags (n) by the suare-root:

\[n = \sqrt{n}\]

`’fd’`: estimates the lag class widths (h) using the Freedman Diaconis estimator 103:

\[h = 2\frac{IQR}{n^{1/3}}\]

`’doane’`: estimates the number of evenly distributed lag classes using Doane’s extension to Sturge’s rule 104:

\[n = 1 + \log_{2}(s) + \log_2\left(1 + \frac{|g|}{k}\right) g = E\left[\left(\frac{x - \mu_g}{\sigma}\right)^3\right] k = \sqrt{\frac{6(s - 2)}{(s + 1)(s + 3)}}\]

`’kmeans’`: This method will search for n clusters in the distance matrix. The cluster centroids are used to calculate the upper edges of the lag classes, by setting it to half of the distance between two neighboring clusters. Note: This does not necessarily result in even width bins.

`’ward’` uses a hierachical culstering algorithm to iteratively merge pairs of clusters until there are only n remaining clusters. The merging is done by minimizing the variance for the merged cluster.

`’stable_entropy’` will adjust n bin edges by minimizing the absolute differences between each lag’s Shannon Entropy. This will lead to uneven bin widths. Each lag class value distribution will be of comparable intrinsic uncertainty from an information theoretic point of view, which makes the semi-variances quite comparable. However, it is not guaranteed, that the binning makes any sense from a geostatistical point of view, as the first lags might be way too wide.

See also

Variogram.bin_func(), skgstat.binning.uniform_count_lags(), skgstat.binning.even_width_lags(), skgstat.binning.auto_derived_lags(), skgstat.binning.kmeans(), skgstat.binning.ward(), sklearn.cluster.KMeans(), sklearn.cluster.AgglomerativeClustering()

References

101: Scott, D.W. (2009), Sturges’ rule. WIREs Comp Stat, 1: 303-306. https://doi.org/10.1002/wics.35
102: Scott, D.W. (2010), Scott’s rule. WIREs Comp Stat, 2: 497-502. https://doi.org/10.1002/wics.103
103: Freedman, David, and Persi Diaconis (1981), “On the histogram as a density estimator: L 2 theory.” Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57.4: 453-476.
104: Doane, D. P. (1976). Aesthetic frequency classifications. The American Statistician, 30(4), 181-183.

set_dist_function(func)[source]¶

Set distance function

Set the function used for distance calculation. func can either be a callable or a string. The ranked distance function is not implemented yet. strings will be forwarded to the scipy.spatial.distance.pdist function as the metric argument. If func is a callable, it has to return the upper triangle of the distance matrix as a flat array (Like the pdist function).

Parameters: func (string, callable) –
Returns
Return type: numpy.array

set_model(model_name)[source]¶: Set model as the new theoretical variogram function.

set_values(values, calc_diff=True)[source]¶

Set new values

Will set the passed array as new value array. This array has to be of same length as the first axis of the coordinates array. The Variogram class does only accept one dimensional arrays. On success all fitting parameters are deleted and the pairwise differences are recalculated. Raises :py:class:`ValueError`s on shape mismatches and a Warning

Parameters

values (numpy.ndarray) –

Returns

Return type

void

Raises

ValueError : raised if the values array shape does not match the – coordinates array, or more than one dimension given
Warning : raised if all input values are the same

See also

Variogram.values()

to_DataFrame(n=100, force=False)[source]¶

Variogram DataFrame

Returns the fitted theoretical variogram as a pandas.DataFrame instance. The n and force parameter control the calaculation, refer to the data funciton for more info.

Parameters

n (integer) – length of the lags array to be used for fitting. Defaults to 100, which will be fine for most plots
force (boolean) – If True, the preprocessing and fitting will be executed as a clean run. This will force all intermediate results to be recalculated. Defaults to False

Returns

Return type

pandas.DataFrame

See also

Variogram.data()

to_gs_krige(**kwargs)[source]¶

Instatiate a GSTools Krige class.

This can only export isotropic models. Note: the fit_variogram is always set to False

Parameters

variogram (skgstat.Variogram) – Scikit-GStat Variogram instamce
**kwargs – Keyword arguments forwarded to GSTools Krige. Refer to Krige to learn about all possible options. Note that the fit_variogram parameter will always be False.

Raises

ImportError – When GSTools is not installed.
ValueError – When GSTools version is not v1.3 or greater.
ValueError – When given Variogram model is not supported (‘harmonize’).

Returns

Instantiated GSTools Krige class.

Return type

Krige

See also

gstools.Krige()

to_gstools(**kwargs)[source]¶

Instantiate a corresponding GSTools CovModel.

By default, this will be an isotropic model.

Parameters

**kwargs – Keyword arguments forwarded to the instantiated GSTools CovModel. The default parameters ‘dim’, ‘var’, ‘len_scale’, ‘nugget’, ‘rescale’ and optional shape parameters will be extracted from the given Variogram but they can be overwritten here.

Raises

ImportError – When GSTools is not installed.
ValueError – When GSTools version is not v1.3 or greater.
ValueError – When given Variogram model is not supported (‘harmonize’).

Returns

Corresponding GSTools covmodel.

Return type

CovModel

Note

In case you intend to use the coordinates in a GSTools workflow, you need to transpose the coordinate array like:

>> cond_pos Variogram.coordinates.T

transform(x)[source]¶

Transform

Transform a given set of lag values to the theoretical variogram function using the actual fitting and preprocessing parameters in this Variogram instance

Parameters: x (numpy.array) – Array of lag values to be used as model input for the fitted theoretical variogram model
Returns
Return type: numpy.array

triangular_distance_matrix¶: Like distance_matrix but with zeros below the diagonal… Only defined if distance_matrix is a sparse matrix

update_kwargs(**kwargs)[source]¶: New in version 0.3.7.

Update the keyword arguments of this Variogram instance. The keyword arguments will be validated first and the update the existing kwargs. That means, you can pass only the kwargs, which need to be updated.

Note

Updating the kwargs does not force a preprocessing circle. Any affected intermediate result, that might be cached internally, will not make use of updated kwargs. Make a call to preprocessing(force=True) to force a clean re-calculation of the Variogram instance.

value_matrix¶

Value matrix

Returns a matrix of pairwise differences in absolute values. The matrix will have the shape (m, m) with m = len(Variogram.values). Note that Variogram.values holds the values themselves, while the value_matrix consists of their pairwise differences.

Returns: values – Matrix of pairwise absolute differences of the values.
Return type: numpy.matrix

See also

Variogram._diff

values¶

Values property

Array of observations, the variogram is build for. The setter of this property utilizes the Variogram.set_values function for setting new arrays.

Returns: values
Return type: numpy.ndarray

See also

Variogram.set_values

Code Reference

DirectionalVariogram Class