Variogram Class¶
-
class
skgstat.
Variogram
(coordinates=None, values=None, estimator='matheron', model='spherical', dist_func='euclidean', bin_func='even', normalize=False, fit_method='trf', fit_sigma=None, use_nugget=False, maxlag=None, samples=None, n_lags=10, verbose=False, **kwargs)[source]¶ Variogram Class
Calculates a variogram of the separating distances in the given coordinates and relates them to one of the semi-variance measures of the given dependent values.
-
__init__
(coordinates=None, values=None, estimator='matheron', model='spherical', dist_func='euclidean', bin_func='even', normalize=False, fit_method='trf', fit_sigma=None, use_nugget=False, maxlag=None, samples=None, n_lags=10, verbose=False, **kwargs)[source]¶ Variogram Class
- Parameters
coordinates (numpy.ndarray, MetricSpace) –
Changed in version 0.5.0: now accepts MetricSpace
Array of shape (m, n). Will be used as m observation points of n-dimensions. This variogram can be calculated on 1 - n dimensional coordinates. In case a 1-dimensional array is passed, a second array of same length containing only zeros will be stacked to the passed one. For very large datasets, you can set maxlag to only calculate distances within the maximum lag in a sparse matrix. Alternatively you can supply a MetricSpace (optionally with a max_dist set for the same effect). This is useful if you’re creating many different variograms for different measured parameters that are all measured at the same set of coordinates, as distances will only be calculated once, instead of once per variogram.
values (numpy.ndarray) – Array of values observed at the given coordinates. The length of the values array has to match the m dimension of the coordinates array. Will be used to calculate the dependent variable of the variogram.
estimator (str, callable) –
String identifying the semi-variance estimator to be used. Defaults to the Matheron estimator. Possible values are:
matheron [Matheron, default]
cressie [Cressie-Hawkins]
dowd [Dowd-Estimator]
genton [Genton]
minmax [MinMax Scaler]
entropy [Shannon Entropy]
If a callable is passed, it has to accept an array of absoulte differences, aligned to the 1D distance matrix (flattened upper triangle) and return a scalar, that converges towards small values for similarity (high covariance).
model (str) –
String identifying the theoretical variogram function to be used to describe the experimental variogram. Can be one of:
spherical [Spherical, default]
exponential [Exponential]
gaussian [Gaussian]
cubic [Cubic]
stable [Stable model]
matern [Matérn model]
nugget [nugget effect variogram]
dist_func (str) – String identifying the distance function. Defaults to ‘euclidean’. Can be any metric accepted by scipy.spatial.distance.pdist. Additional parameters are not (yet) passed through to pdist. These are accepted by pdist for some of the metrics. In these cases the default values are used.
bin_func (str | Callable | Iterable) –
Changed in version 0.3.8: added ‘fd’, ‘sturges’, ‘scott’, ‘sqrt’, ‘doane’
Changed in version 0.3.9: added ‘kmeans’, ‘ward’
String identifying the binning function used to find lag class edges. All methods calculate bin edges on the interval [0, maxlag[. Possible values are:
’even’ (default) finds n_lags same width bins
’uniform’ forms n_lags bins of same data count
’fd’ applies Freedman-Diaconis estimator to find n_lags
’sturges’ applies Sturge’s rule to find n_lags.
’scott’ applies Scott’s rule to find n_lags
’doane’ applies Doane’s extension to Sturge’s rule to find n_lags
’sqrt’ uses the square-root of
distance
as n_lags.’kmeans’ uses KMeans clustering to well supported bins
’ward’ uses hierachical clustering to find minimum-variance clusters.
More details are given in the documentation for
set_bin_func
.normalize (bool) – Defaults to False. If True, the independent and dependent variable will be normalized to the range [0,1].
fit_method (str | None) –
Changed in version 0.3.10: Added ‘ml’ and ‘custom’
String identifying the method to be used for fitting the theoretical variogram function to the experimental. If None is passed, the fit does not run. More info is given in the Variogram.fit docs. Can be one of:
’lm’: Levenberg-Marquardt algorithm for unconstrained problems. This is the faster algorithm, yet is the fitting of a variogram not unconstrianed.
’trf’: Trust Region Reflective function for non-linear constrained problems. The class will set the boundaries itself. This is the default function.
’ml’: Maximum-Likelihood estimation. With the current implementation only the Nelder-Mead solver for unconstrained problems is implemented. This will estimate the variogram parameters from a Gaussian parameter space by minimizing the negative log-likelihood.
’manual’: Manual fitting. You can set the range, sill and nugget either directly to the
fit
function, or as fit_ prefixed keyword arguments on Variogram instantiation.
fit_sigma (numpy.ndarray, str) –
Defaults to None. The sigma is used as measure of uncertainty during variogram fit. If fit_sigma is an array, it has to hold n_lags elements, giving the uncertainty for all lags classes. If fit_sigma is None (default), it will give no weight to any lag. Higher values indicate higher uncertainty and will lower the influcence of the corresponding lag class for the fit. If fit_sigma is a string, a pre-defined function of separating distance will be used to fill the array. Can be one of:
’linear’: Linear loss with distance. Small bins will have higher impact.
’exp’: The weights decrease by a e-function of distance
’sqrt’: The weights decrease by the squareroot of distance
’sq’: The weights decrease by the squared distance.
More info is given in the Variogram.fit_sigma documentation.
use_nugget (bool) – Defaults to False. If True, a nugget effet will be added to all Variogram.models as a third (or fourth) fitting parameter. A nugget is essentially the y-axis interception of the theoretical variogram function.
maxlag (float, str) – Can specify the maximum lag distance directly by giving a value larger than 1. The binning function will not find any lag class with an edge larger than maxlag. If 0 < maxlag < 1, then maxlag is relative and maxlag * max(Variogram.distance) will be used. In case maxlag is a string it has to be one of ‘median’, ‘mean’. Then the median or mean of all Variogram.distance will be used. Note maxlag=0.5 will use half the maximum separating distance, this is not the same as ‘median’, which is the median of all separating distances
samples (float, int) – If set to a non-None value point pairs are sampled randomly. Two random subset of all points are chosen, and the distance matrix is calculated only between these two subsets. The size of each subset is set by samples: if < 1 it specifies a fraction of all points, if >= 1 it specifies the number of points in each subset.
n_lags (int) – Specify the number of lag classes to be defined by the binning function.
verbose (bool) – Set the Verbosity of the class. Not Implemented yet.
- Keyword Arguments
New in version 0.3.7.
If the estimator <skgstat.Variogram.estimator> is set to ‘entropy’ this argument sets the number of bins, that should be used for histogram calculation.
percentile (int) –
New in version 0.3.7.
If the estimator <skgstat.Variogram.estimator> is set to ‘entropy’ this argument sets the percentile to be used.
binning_random_state (int, None) –
New in version 0.3.9.
If
bin_func
is ‘kmeans’ this can overwrite the seed for the initial guess of the cluster centroids. Note, that K-Means is not deterministic and is therefore seeded to 42 here. You can pass None to disable this behavior, but use it with care, as you will get different results.binning_agg_func (str) –
New in version 0.3.10.
If
bin_func
is ‘ward’ this keyword argument can switch from default mean aggregation to median aggregation for calculating the cluster centroids.New in version 0.6.0.
If set, the Variogram will use this sigma as the standard deviation of the observations passed as values. Using a MonteCarlo simulation the uncertainties are propagated into the experimental variogram. If present, the plot will indicate the confidence interval as error bars around the experimental variogram.
-
NS
¶ Nash Sutcliffe efficiency of the fitted Variogram
- Returns
-
aic
¶
-
bic
¶
-
bin_func
¶ Binning function
Returns an instance of the function used for binning the separating distances into the given amount of bins. Both functions use the same signature of func(distances, n, maxlag).
The setter of this property utilizes the Variogram.set_bin_func to set a new function.
- Returns
binning_function
- Return type
function
See also
-
bins
¶ Distance lag bins
Independent variable of the the experimental variogram sample. The bins are the upper edges of all calculated distance lag classes. If you need bin centers, use
get_empirical
.- Returns
bins – 1D array of the distance lag classes.
- Return type
See also
-
coordinates
¶ Coordinates property
Array of observation locations the variogram is build for. This property has no setter. If you want to change the coordinates, use a new Variogram instance.
- Returns
coordinates
- Return type
numpy.array
-
cross_validate
(method: str = 'jacknife', n: int = None, metric: str = 'rmse', seed=None) → float[source]¶ Cross validation of the variogram model by means of Kriging. Right now, this function can only utilize a jacknife (leave-one-out) cross validation and will only use the builtin OrdinaryKriging method (not yet the to_gs_krige interface).
- Parameters
method (str) – Right now, ‘jacknife’ is the only possible input.
n (int) – The number of points to be included into the cross-validation. If None (default), all points will be used.
metric (str) – Metric used for cross-validation. Can be root mean square error (rmse), mean squared error (mse) or mean absolute error (mae).
seed (int) – If n is not None, the random selection of input data for the cross-validation can be seeded.
- Returns
metric – The cross-validation result as specified above.
- Return type
-
data
(n=100, force=False)[source]¶ Theoretical variogram function
Calculate the experimental variogram and apply the binning. On success, the variogram model will be fitted and applied to n lag values. Returns the lags and the calculated semi-variance values. If force is True, a clean preprocessing and fitting run will be executed.
- Parameters
n (integer) – length of the lags array to be used for fitting. Defaults to 100, which will be fine for most plots
force (boolean) – If True, the preprocessing and fitting will be executed as a clean run. This will force all intermediate results to be recalculated. Defaults to False
- Returns
variogram – first element is the created lags array second element are the calculated semi-variance values
- Return type
-
describe
(short=False, flat=False)[source]¶ Variogram parameters
Return a dictionary of the variogram parameters.
Changed in version 0.3.7: The describe now returns all init parameters in as the describe()[‘params’] key and all keyword arguments as describe()[‘kwargs’]. This output can be suppressed by setting short=True.
- Parameters
- Returns
parameters – Returns fitting parameters of the theoretical variogram model along with the init parameters of the Variogram <skgstat.Variogram> instance.
- Return type
-
dim
¶ Input coordinates dimensionality.
-
distance_difference_plot
(ax=None, plot_bins=True, show=True)[source]¶ Raw distance plot
Plots all absoulte value differences of all point pair combinations over their separating distance, without sorting them into a lag.
Changed in version 0.4.0: This plot can be plotted with the plotly plotting backend
- Parameters
ax (None, AxesSubplot) – If None, a new matplotlib.Figure will be created. In case a Figure was already created, pass the Subplot to use as ax argument.
plot_bins (bool) – If True (default) the bin edges will be included into the plot.
show (bool) – If True (default), the show method of the Figure will be called before returning the Figure. Can be set to False, to avoid doubled figure rendering in Jupyter notebooks.
- Returns
- Return type
matplotlib.pyplot.Figure
-
experimental
¶ Experimental Variogram
Array of experimental (empirical) semivariance values. The array length will be aligned to Variogram.bins. The current Variogram.estimator has been used to calculate the values. Depending on the setting of Variogram.harmonize (True | False), either Variogram._experimental or Variogram.isotonic will be returned.
- Returns
vario – Array of the experimental semi-variance values aligned to Variogram.bins.
- Return type
See also
Variogram._experimental
,Variogram.isotonic
-
fit
(force=False, method=None, sigma=None, **kwargs)[source]¶ Fit the variogram
The fit function will fit the theoretical variogram function to the experimental. The preprocessed distance matrix, pairwise differences and binning will not be recalculated, if already done. This could be forced by setting the force parameter to true.
In case you call fit function directly, with method or sigma, the parameters set on Variogram object instantiation will get overwritten. All other keyword arguments will be passed to scipy.optimize.curve_fit function.
Changed in version 0.3.10: added ‘ml’ and ‘custom’ method.
- Parameters
force (bool) – If set to True, a clean preprocessing of the distance matrix, pairwise differences and the binning will be forced. Default is False.
method (string) –
A string identifying one of the implemented fitting procedures. Can be one of:
lm: Levenberg-Marquardt algorithms implemented in scipy.optimize.leastsq function.
trf: Trust Region Reflective algorithm implemented in scipy.optimize.least_squares(method=’trf’)
’ml’: Maximum-Likelihood estimation. With the current implementation only the Nelder-Mead solver for unconstrained problems is implemented. This will estimate the variogram parameters from a Gaussian parameter space by minimizing the negative log-likelihood.
’manual’: Manual fitting. You can set the range, sill and nugget either directly to the
fit
function, or as fit_ prefixed keyword arguments on Variogram instantiation.
- sigmastring, array
Uncertainty array for the bins. Has to have the same dimension as self.bins. Refer to Variogram.fit_sigma for more information.
- Returns
- Return type
void
See also
scipy.optimize.minimize()
,scipy.optimize.curve_fit()
,scipy.optimize.leastsq()
,scipy.optimize.least_squares()
-
fit_method
¶ New in version 0.6.2.
Set the fit method to be used for this Variogram instance. Possible values are:
'trf'
- Trust-Region Reflective (default)'lm'
- Levenberg-Marquardt'ml'
- Maximum Likelihood estimation‘manual’` - Manual fitting by setting the parameters
Changed in version 0.6.6: Passing None will prevent the fitting procedure from running.
Notes
The default method (TRF) is a bounded least squares method, that sets constraints to the value space of all parameters. All methods use an initial guess for all used parameters. This is
max(bins)
for the range,max(experimental)
for the sill,20
for the Matérn smoothness,2
for the stable model shape and1
for the nugget if used.
-
fit_sigma
¶ Fitting Uncertainty
Set or calculate an array of observation uncertainties aligned to the Variogram.bins. These will be used to weight the observations in the cost function, which divides the residuals by their uncertainty.
When setting fit_sigma, the array of uncertainties itself can be given, or one of the strings: [‘linear’, ‘exp’, ‘sqrt’, ‘sq’, ‘entropy’]. The parameters described below refer to the setter of this property.
Changed in version 0.3.11: added the ‘entropy’ option.
- Parameters
sigma (string, array) –
Sigma can either be an array of discrete uncertainty values, which have to align to the Variogram.bins, or of type string. Then, the weights for fitting are calculated as a function of (lag) distance.
sigma=’linear’: The residuals get weighted by the lag distance normalized to the maximum lag distance, denoted as \(w_n\)
sigma=’exp’: The residuals get weighted by the function: \(w = e^{1 / w_n}\)
sigma=’sqrt’: The residuals get weighted by the function: \(w = \sqrt(w_n)\)
sigma=’sq’: The residuals get weighted by the function: \(w = w_n^2\)
sigma=’entropy’: Calculates the Shannon Entropy as intrinsic uncertainty of each lag class.
- Returns
- Return type
void
Notes
The cost function is defined as:
\[chisq = \sum {\frac{r}{\sigma}}^2\]where r are the residuals between the experimental variogram and the modeled values for the same lag. Following this function, small values will increase the influence of that residual, while a very large sigma will cause the observation to be ignored.
See also
-
fitted_model
¶ Fitted Model
Returns a callable that takes a distance value and returns a semivariance. This model is fitted to the current Variogram parameters. The function will be interpreted at return time with the parameters hard-coded into the function code.
- Returns
model – The current semivariance model fitted to the current Variogram model parameters.
- Return type
callable
-
get_empirical
(bin_center=False)[source]¶ Empirical variogram
Returns a tuple of dependent and independent sample values, this
Variogram
is estimated for. This is a tuple of the currentbins
andexperimental
semi-variance values. By default the upper bin edges are used. This can be set to bin center by the bin_center argument.- Parameters
bin_center (bool) – If set to True, the center for each distance lag bin is used over the upper limit (default).
- Returns
See also
-
lag_classes
()[source]¶ Iterate over the lag classes
Generates an iterator over all lag classes. Can be zipped with Variogram.bins to identify the lag.
Changed in version 0.3.6: yields an empty array for empty lag groups now
- Returns
- Return type
iterable
-
lag_groups
()[source]¶ Lag class groups
Retuns a mask array with as many elements as self._diff has, identifying the lag class group for each pairwise difference. Can be used to extract all pairwise values within the same lag bin.
- Returns
- Return type
See also
-
location_trend
(axes=None, show=True, **kwargs)[source]¶ Location Trend plot
Plots the values over each dimension of the coordinates in a scatter plot. This will visually show correlations between the values and any of the coordinate dimension. If there is a value dependence on the location, this would violate the intrinsic hypothesis. This is a weaker form of stationarity of second order.
Changed in version 0.4.0: This plot can be plotted with the plotly plotting backend
- Parameters
axes (list) – Can be None (default) or a list of matplotlib.AxesSubplots. If a list is passed, the location trend plots will be plotted on the given instances. Note that then length of the list has to match the dimeonsionality of the coordinates array. In case 3D coordinates are used, three subplots have to be given.
show (boolean) – If True (default), the show method of the Figure will be called. Can be set to False to prevent duplicated plots in some environments.
- Keyword Arguments
add_trend_line (bool) –
New in version 0.3.5.
If set to True, the class will fit a linear model to each coordinate dimension and output the model along with a calculated R². With high R² values, you should consider rejecting the input data, or transforming it.
Note
Right now, this is only supported for
'plotly'
backend- Returns
fig – The figure produced by the function. Dependends on the current backend.
- Return type
matplotlib.Figure, plotly.graph_objects.Figure
-
mae
¶ RMSE
Calculate the Mean absolute error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure.
- Returns
- Return type
See also
Notes
The MAE is implemented like:
\[MAE = \frac{\sum_{i=0}^{i=N(x)} |x-y|}{N(x)}\]
-
maxlag
¶ Maximum lag distance to be considered in this Variogram instance. You can limit the distance at which point pairs are calcualted. There are three possible ways how to do that, in absoulte lag units, which is a number larger one. Secondly, a number
0 < maxlag < 1
can be set, which will use this share of the maximum distance as maxlag. Lastly, a string can be set:'mean'
and'median'
for the mean or median value of the distance matrix.Notes
This setting is largely flexible, but all options except the absolute limit in lag units need the full distance matrix to be calculated. Hence, it does not speed up the calculation of large distance matrices, just the estimation of the variogram. Thus, if you pre-calcualte the distance matrix using
MetricSpace
, only absoulte limits can be used.
-
mean_residual
¶ Mean Model residuals
Calculates the mean, absoulte deviations between the experimental variogram and theretical model values.
- Returns
- Return type
-
metric_space
¶ New in version 0.5.6.
MetricSpace
representation of the input coordinates. AMetricSpace
can be used to pass pre-calculated coordinates to otherVariogram
instances.- Returns
metric_space
- Return type
See also
Variogram.coordinates
coordinate representation
-
model_deviations
()[source]¶ Model Deviations
Calculate the deviations between the experimental variogram and the recalculated values for the same bins using the fitted theoretical variogram function. Can be utilized to calculate a quality measure for the variogram fit.
- Returns
deviations – first element is the experimental variogram second element are the corresponding values of the theoretical model.
- Return type
-
mse
¶ RMSE
Calculate the Mean squared error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure.
- Returns
- Return type
See also
Notes
The MSE is implemented like:
\[MSE = \frac{\sum_{i=0}^{i=N(x)} (x-y)^2}{N(x)}\]
-
n_lags
¶ Number of lag bins
Pass the number of lag bins to be used on this Variogram instance. This will reset the grouping index and fitting parameters
-
nrmse
¶ NRMSE
Calculate the normalized root mean squared error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure
- Returns
- Return type
See also
Notes
The NRMSE is implemented as:
\[NRMSE = \frac{RMSE}{mean(y)}\]where RMSE is Variogram.rmse and y is Variogram.experimental
-
nrmse_r
¶ NRMSE
Alternative normalized root mean squared error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure.
- Returns
- Return type
See also
Notes
Unlike Variogram.nrmse, nrmse_r is not normalized to the mean of y, but the differece of the maximum y to its mean:
\[NRMSE_r = \frac{RMSE}{max(y) - mean(y)}\]
-
parameters
¶ Extract just the variogram parameters range, sill and nugget from the
describe
output.- Returns
params – [range, sill, nugget] for most models and [range, sill, shape, nugget] for matern and stable model.
- Return type
-
plot
(axes=None, grid=True, show=True, hist=True)[source]¶ Variogram Plot
Plot the experimental variogram, the fitted theoretical function and an histogram for the lag classes. The axes attribute can be used to pass a list of AxesSubplots or a single instance to the plot function. Then these Subplots will be used. If only a single instance is passed, the hist attribute will be ignored as only the variogram will be plotted anyway.
Changed in version 0.4.0: This plot can be plotted with the plotly plotting backend
- Parameters
axes (list, tuple, array, AxesSubplot or None) – If None, the plot function will create a new matplotlib figure. Otherwise a single instance or a list of AxesSubplots can be passed to be used. If a single instance is passed, the hist attribute will be ignored.
grid (bool) – Defaults to True. If True a custom grid will be drawn through the lag class centers
show (bool) – Defaults to True. If True, the show method of the passed or created matplotlib Figure will be called before returning the Figure. This should be set to False, when used in a Notebook, as a returned Figure object will be plotted anyway.
hist (bool) – Defaults to True. If False, the creation of a histogram for the lag classes will be suppressed.
- Returns
- Return type
matplotlib.Figure
-
preprocessing
(force=False)[source]¶ Preprocessing function
Prepares all input data for the fit and transform functions. Namely, the distances are calculated and the value differences. Then the binning is set up and bin edges are calculated. If any of the listed subsets are already prepared, their processing is skipped. This behaviour can be changed by the force parameter. This will cause a clean preprocessing.
- Parameters
force (bool) – If set to True, all preprocessing data sets will be deleted. Use it in case you need a clean preprocessing.
- Returns
- Return type
void
-
r
¶ Pearson correlation of the fitted Variogram
- Returns
-
residuals
¶ Model residuals
Calculate the model residuals defined as the differences between the experimental variogram and the theoretical model values at corresponding lag values
- Returns
- Return type
-
rmse
¶ RMSE
Calculate the Root Mean squared error between the experimental variogram and the theoretical model values at corresponding lags. Can be used as a fitting quality measure.
- Returns
- Return type
See also
Notes
The RMSE is implemented like:
\[RMSE = \sqrt{\frac{\sum_{i=0}^{i=N(x)} (x-y)^2}{N(x)}}\]
-
scattergram
(ax=None, show=True, **kwargs)[source]¶ Scattergram plot
Groups the values by lags and plots the head and tail values of all point pairs within the groups against each other. This can be used to investigate the distribution of the value residuals.
Changed in version 0.4.0: This plot can be plotted with the plotly plotting backend
- Parameters
ax (matplotlib.Axes, plotly.graph_objects.Figure) – If None, a new plotting Figure will be created. If given, it has to be an instance of the used plotting backend, which will be used to plot on.
show (boolean) – If True (default), the show method of the Figure will be called. Can be set to False to prevent duplicated plots in some environments.
- Returns
fig – Resulting figure, depending on the plotting backend
- Return type
matplotlib.Figure, plotly.graph_objects.Figure
-
set_bin_func
(bin_func: Union[str, Iterable[T_co], Callable[[numpy.ndarray, float, float], Tuple[numpy.ndarray, float]]])[source]¶ Set binning function
Sets a new binning function to be used. The new binning method is set by either a string identifying the new function to be used, or an iterable containing the bin edges, or any function that can compute bins from the distances, number of lags and maximum lag. The string can be one of: [‘even’, ‘uniform’, ‘fd’,
‘sturges’, ‘scott’, ‘sqrt’, ‘doane’].
If the number of lag classes should be estimated automatically, it is recommended to use ‘ sturges’ for small, normal distributed locations and ‘fd’ or ‘scott’ for large datasets, where ‘fd’ is more robust to outliers. ‘sqrt’ is by far the fastest estimator. ‘doane’ is an extension of Sturge’s rule for non-normal distributed data.
Changed in version 0.3.8: added ‘fd’, ‘sturges’, ‘scott’, ‘sqrt’, ‘doane’
Changed in version 0.3.9: added ‘kmeans’, ‘ward’
Changed in version 0.4.0: added ‘stable_entropy’
Changed in version 0.4.1: refactored local wrapper function definition. The wrapper to pass kwargs to the binning functions is now implemented as a instance method, to make it pickleable.
Changed in version 0.6.5: added iterable and function as arguments to allow for custom bins.
- Parameters
bin_func (str | Iterable | Callable) –
Can be one of:
’even’
’uniform’
’fd’
’sturges’
’scott’
’sqrt’
’doane’
’kmeans’
’ward’
’stable_entropy’
- Returns
- Return type
void
Notes
`’even’`: Use skgstat.binning.even_width_lags for using n_lags lags of equal width up to maxlag.
`’uniform’`: Use skgstat.binning.uniform_count_lags for using n_lags lags up to maxlag in which the pairwise differences follow a uniform distribution.
`’sturges’`: estimates the number of evenly distributed lag classes (n) by Sturges rule 101:
\[n = log_2 n + 1\]`’scott’`: estimates the lag class widths (h) by Scott’s rule 102:
\[h = \sigma \frac{24 * \sqrt{\pi}}{n}^{\frac{1}{3}}\]`’sqrt’`: estimates the number of lags (n) by the suare-root:
\[n = \sqrt{n}\]`’fd’`: estimates the lag class widths (h) using the Freedman Diaconis estimator 103:
\[h = 2\frac{IQR}{n^{1/3}}\]`’doane’`: estimates the number of evenly distributed lag classes using Doane’s extension to Sturge’s rule 104:
\[n = 1 + \log_{2}(s) + \log_2\left(1 + \frac{|g|}{k}\right) g = E\left[\left(\frac{x - \mu_g}{\sigma}\right)^3\right] k = \sqrt{\frac{6(s - 2)}{(s + 1)(s + 3)}}\]`’kmeans’`: This method will search for n clusters in the distance matrix. The cluster centroids are used to calculate the upper edges of the lag classes, by setting it to half of the distance between two neighboring clusters. Note: This does not necessarily result in even width bins.
`’ward’` uses a hierachical culstering algorithm to iteratively merge pairs of clusters until there are only n remaining clusters. The merging is done by minimizing the variance for the merged cluster.
`’stable_entropy’` will adjust n bin edges by minimizing the absolute differences between each lag’s Shannon Entropy. This will lead to uneven bin widths. Each lag class value distribution will be of comparable intrinsic uncertainty from an information theoretic point of view, which makes the semi-variances quite comparable. However, it is not guaranteed, that the binning makes any sense from a geostatistical point of view, as the first lags might be way too wide.
See also
Variogram.bin_func()
,skgstat.binning.uniform_count_lags()
,skgstat.binning.even_width_lags()
,skgstat.binning.auto_derived_lags()
,skgstat.binning.kmeans()
,skgstat.binning.ward()
,sklearn.cluster.KMeans()
,sklearn.cluster.AgglomerativeClustering()
References
- 101
Scott, D.W. (2009), Sturges’ rule. WIREs Comp Stat, 1: 303-306. https://doi.org/10.1002/wics.35
- 102
Scott, D.W. (2010), Scott’s rule. WIREs Comp Stat, 2: 497-502. https://doi.org/10.1002/wics.103
- 103
Freedman, David, and Persi Diaconis (1981), “On the histogram as a density estimator: L 2 theory.” Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57.4: 453-476.
- 104
Doane, D. P. (1976). Aesthetic frequency classifications. The American Statistician, 30(4), 181-183.
-
set_dist_function
(func)[source]¶ Set distance function
Set the function used for distance calculation. func can either be a callable or a string. The ranked distance function is not implemented yet. strings will be forwarded to the scipy.spatial.distance.pdist function as the metric argument. If func is a callable, it has to return the upper triangle of the distance matrix as a flat array (Like the pdist function).
- Parameters
func (string, callable) –
- Returns
- Return type
numpy.array
-
set_values
(values, calc_diff=True)[source]¶ Set new values
Will set the passed array as new value array. This array has to be of same length as the first axis of the coordinates array. The Variogram class does only accept one dimensional arrays. On success all fitting parameters are deleted and the pairwise differences are recalculated. Raises :py:class:`ValueError`s on shape mismatches and a Warning
- Parameters
values (numpy.ndarray) –
- Returns
- Return type
void
- Raises
ValueError : raised if the values array shape does not match the – coordinates array, or more than one dimension given
Warning : raised if all input values are the same
See also
-
to_DataFrame
(n=100, force=False)[source]¶ Variogram DataFrame
Returns the fitted theoretical variogram as a pandas.DataFrame instance. The n and force parameter control the calaculation, refer to the data funciton for more info.
- Parameters
n (integer) – length of the lags array to be used for fitting. Defaults to 100, which will be fine for most plots
force (boolean) – If True, the preprocessing and fitting will be executed as a clean run. This will force all intermediate results to be recalculated. Defaults to False
- Returns
- Return type
See also
-
to_gs_krige
(**kwargs)[source]¶ Instatiate a GSTools Krige class.
This can only export isotropic models. Note: the fit_variogram is always set to False
- Parameters
variogram (skgstat.Variogram) – Scikit-GStat Variogram instamce
**kwargs – Keyword arguments forwarded to GSTools Krige. Refer to
Krige
to learn about all possible options. Note that the fit_variogram parameter will always be False.
- Raises
ImportError
– When GSTools is not installed.ValueError
– When GSTools version is not v1.3 or greater.ValueError
– When given Variogram model is not supported (‘harmonize’).
- Returns
Instantiated GSTools Krige class.
- Return type
Krige
See also
gstools.Krige()
-
to_gstools
(**kwargs)[source]¶ Instantiate a corresponding GSTools CovModel.
By default, this will be an isotropic model.
- Parameters
**kwargs – Keyword arguments forwarded to the instantiated GSTools CovModel. The default parameters ‘dim’, ‘var’, ‘len_scale’, ‘nugget’, ‘rescale’ and optional shape parameters will be extracted from the given Variogram but they can be overwritten here.
- Raises
ImportError
– When GSTools is not installed.ValueError
– When GSTools version is not v1.3 or greater.ValueError
– When given Variogram model is not supported (‘harmonize’).
- Returns
Corresponding GSTools covmodel.
- Return type
CovModel
Note
In case you intend to use the
coordinates
in a GSTools workflow, you need to transpose the coordinate array like:>> cond_pos Variogram.coordinates.T
-
transform
(x)[source]¶ Transform
Transform a given set of lag values to the theoretical variogram function using the actual fitting and preprocessing parameters in this Variogram instance
- Parameters
x (numpy.array) – Array of lag values to be used as model input for the fitted theoretical variogram model
- Returns
- Return type
numpy.array
-
triangular_distance_matrix
¶ Like distance_matrix but with zeros below the diagonal… Only defined if distance_matrix is a sparse matrix
-
update_kwargs
(**kwargs)[source]¶ New in version 0.3.7.
Update the keyword arguments of this Variogram instance. The keyword arguments will be validated first and the update the existing kwargs. That means, you can pass only the kwargs, which need to be updated.
Note
Updating the kwargs does not force a preprocessing circle. Any affected intermediate result, that might be cached internally, will not make use of updated kwargs. Make a call to
preprocessing(force=True)
to force a clean re-calculation of the Variogram instance.
-
value_matrix
¶ Value matrix
Returns a matrix of pairwise differences in absolute values. The matrix will have the shape (m, m) with m = len(Variogram.values). Note that Variogram.values holds the values themselves, while the value_matrix consists of their pairwise differences.
- Returns
values – Matrix of pairwise absolute differences of the values.
- Return type
See also
Variogram._diff
-
values
¶ Values property
Array of observations, the variogram is build for. The setter of this property utilizes the Variogram.set_values function for setting new arrays.
- Returns
values
- Return type
See also
-