Binning functions

SciKit-GStat implements a large amount of binning functions, which can be used to spatially aggregate the distance matrix into lag classes, or bins. There are a number of functions available, which usually accept more than one method identifier:

skgstat.binning.even_width_lags(distances, n, maxlag)[source]

Even lag edges

Calculate the lag edges for a given amount of bins using the same lag step width for all bins.

Changed in version 0.3.8: Function returns None as second value to indicate that The number of lag classes was not changed

Parameters
  • distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.

  • n (integer) – Amount of lag classes to find

  • maxlag (integer, float) – Limit the last lag class to this separating distance.

Returns

bin_edges – The upper bin edges of the lag classes

Return type

numpy.ndarray

skgstat.binning.uniform_count_lags(distances, n, maxlag)[source]

Uniform lag counts

Calculate the lag edges for a given amount of bins with the same amount of observations in each lag class. The lag step width will be variable.

Changed in version 0.3.8: Function returns None as second value to indicate that The number of lag classes was not changed

Parameters
  • distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.

  • n (integer) – Amount of lag classes to find

  • maxlag (integer, float) – Limit the last lag class to this separating distance.

Returns

bin_edges – The upper bin edges of the lag classes

Return type

numpy.ndarray

skgstat.binning.auto_derived_lags(distances, method_name, maxlag)[source]

Derive bins automatically .. versionadded:: 0.3.8

Uses histogram_bin_edges <numpy.histogram_bin_edges> to derive the lag classes automatically. Supports any method supported by histogram_bin_edges <numpy.histogram_bin_edges>. It is recommended to use ‘sturges’, ‘doane’ or ‘fd’.

Parameters
  • distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.

  • maxlag (integer, float) – Limit the last lag class to this separating distance.

  • method_name (str) – Any method supported by histogram_bin_edges <numpy.histogram_bin_edges>

Returns

bin_edges – The upper bin edges of the lag classes

Return type

numpy.ndarray

skgstat.binning.kmeans(distances, n, maxlag, binning_random_state=42, **kwargs)[source]

New in version 0.3.9.

Clustering of pairwise separating distances between locations up to maxlag. The lag class edges are formed equidistant from each cluster center. Note: this does not necessarily result in equidistance lag classes.

Parameters
  • distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.

  • n (integer) – Amount of lag classes to find

  • maxlag (integer, float) – Limit the last lag class to this separating distance.

Returns

bin_edges – The upper bin edges of the lag classes

Return type

numpy.ndarray

See also

sklearn.cluster.KMeans()

Note

The KMeans that is used under the hood is not a deterministic algorithm, as the starting cluster centroids are seeded randomly. This can yield slightly different results on reach run. Thus, for this application, the random_state on KMeans is fixed to a specific value. You can change the seed by passing another seed to Variogram as binning_random_state.

skgstat.binning.ward(distances, n, maxlag, **kwargs)[source]

New in version 0.3.9.

Clustering of pairwise separating distances between locations up to maxlag. The lag class edges are formed equidistant from each cluster center. Note: this does not necessarily result in equidistance lag classes.

The clustering is done by merging pairs of clusters that minimize the variance for the merged clusters, unitl n clusters are found.

Parameters
  • distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.

  • n (integer) – Amount of lag classes to find

  • maxlag (integer, float) – Limit the last lag class to this separating distance.

Returns

bin_edges – The upper bin edges of the lag classes

Return type

numpy.ndarray

See also

sklearn.clsuter.AgglomerativeClustering()

skgstat.binning.stable_entropy_lags(distances, n, maxlag, **kwargs)[source]

Optimizes the lag class edges for n lag classes. The algorithm minimizes the difference between Shannon Entropy for each lag class. Consequently, the final lag classes should be of comparable uncertainty.

Parameters
  • distances (numpy.array) – Flat numpy array representing the upper triangle of the distance matrix.

  • n (integer) – Amount of lag classes to find

  • maxlag (integer, float) – Limit the last lag class to this separating distance.

Keyword Arguments
  • binning_maxiter (int) – Maximum iterations before the optimization is stopped, if the lag edges do not converge.

  • binning_entropy_bins (int, str) – Binning method for calculating the shannon entropy on each iteration.

Returns

bin_edges – The upper bin edges of the lag classes

Return type

numpy.ndarray