Utility Functions

Shannon Entropy

skgstat.util.shannon.shannon_entropy(x, bins)[source]

Shannon Entropy

Calculates the Shannon Entropy, which is the most basic metric in information theory. It can be used to calculate the information content of discrete distributions. This can be used to estimate the intrinsic uncertainty of a sample,independent of the value range or variance, which makes it more comparable.

Parameters
  • x (numpy.ndarray) – flat 1D array of the observations

  • bins (list, int) – upper edges of the bins used to calculate the histogram of x.

Returns

h – Shannon Entropy of x, given bins.

Return type

float

Cross Validation

skgstat.util.cross_validation.jacknife(variogram, n: int = None, metric: str = 'rmse', seed=None) → float[source]

Leave-one-out cross validation of the given variogram model using the OrdinaryKriging instance. This method can be called using Variogram.cross_validate.

Parameters
  • variogram (skgstat.Variogram) – The variogram isnstance to be validated

  • n (int) – Number of points that should be used for cross validation. If None is given, all points are used (default).

  • metric (str) – Metric used for cross validation. Can be one of [‘rmse’, ‘mse’, ‘mae’]

Returns

metric – Cross-validation result The value is given in the selected metric.

Return type

float

Uncertainty Propagation

skgstat.util.uncertainty.propagate(variogram: skgstat.Variogram.Variogram = None, source: Union[str, List[str]] = 'values', sigma: Union[float, List[float]] = 5, evalf: Union[str, List[str]] = 'experimental', verbose: bool = False, use_bounds: bool = False, **kwargs)[source]

Uncertainty propagation for the variogram. For a given Variogram instance a source of error and scale of error distribution can be specified. The function will propagate the uncertainty into different parts of the Variogram and return the confidence intervals or error bounds.

Parameters
  • variogram (skgstat.Variogram) – The base variogram. The variogram parameters will be used as fixed arguments for the Monte Carlo simulation.

  • source (list) – Source of uncertainty. This has to be an attribute of Variogram. Right now only 'values' is really supported, anything else is untested.

  • sigma (list) – Standard deviation of the error distribution.

  • evalf (list) – Evaluation function. This specifies, which part of the Variogram should be used to be evaluated. Possible values are 'experimental' for the experimental variogram, 'model' for the fitted model and parameter' for the variogram parameters

  • verbose (bool) – If True, the uncertainty_framework package used under the hood will print a progress bar to the console. Defaults to False.

  • use_bounds (bool) – Shortcut to set the confidence interval bounds to the minimum and maximum value and thus return the error margins over a confidence interval.

Keyword Arguments
  • distribution (str) – Any valid numpy.random distribution function, that takes the scale as argument. Defaults to 'normal'.

  • q (int) – Width (percentile) of the confidence interval. Has to be a number between 0 and 100. 0 will result in the minimum and maximum value as bounds. 100 turns both bounds into the median value. Defaults to 10

  • num_iter (int) – Number of iterations used in the Monte Carlo simulation. Defaults to 500.

  • eval_at (int) – If evalf is set to model, the theoretical model get evaluated at this many evenly spaced lags up to maximum lag. Defaults to 100.

  • n_jobs (int) –

    The evaluation can be performed in parallel. This will specify how many processes may be spawned in parallel. None will spwan only one (default).

    Note

    This is an untested experimental feature.

Returns

conf_interval – Confidence interval of the uncertainty propagation as [lower, median, upper]. If more than one evalf is given, a list of ndarrays will be returned. See notes for more details.

Return type

numpy.ndarray

Notes

For each member of the evaluated property, the lower and upper bound along with the median value is retuned as [low, median, up]. Thus the returned array has the shape (N, 3). N is the lengh of evaluated property, which is n_lags <skgstat.Variogram.n_lags() for 'experimental', either 3 for 'parameter' or 4 if Variogram.model = 'stable' | 'matern' and 100 for 'model' as the model gets evaluated at 100 evenly spaced lags up to the maximum lag class. This amount can be changed using the eval_at parameter.

If more than one evalf parameter is given, the Variogram will be evaluated at multiple steps and each one will be returned as a confidence interval. Thus if len(evalf) == 2, a list containing two confidence interval matrices will be returned. The order is [experimental, parameter, model].

Maximum Likelihood Estimation