artlib.cvi.iCVIs.CalinkskiHarabasz

Some things to consider in the future.

Removing entire labels from the dataset should be possible.

Right now I think its not possible for Fuzzy Art to delete a cluster, so no need to do so now.

Creating functions to explictly add a sample, remove or switch, instead of requiring the update call would be nice Or at least change the function names for add, remove, and switch, since they imply that its done, not that it will update after update is called.

Classes

iCVI_CH

Implementation of the Calinski Harabasz Validity Index in incremental form.

Functions

`delta_add_sample_to_average`(→ float)	Calculate the new average if a sample is added.
`delta_remove_sample_from_average`(→ float)	Calculate the new average if a sample is removed.

Module Contents

artlib.cvi.iCVIs.CalinkskiHarabasz.delta_add_sample_to_average(average: float, sample: float, total_samples: int) → float

Calculate the new average if a sample is added.

Parameters:

average (float) – Current average.
sample (float) – New sample to be added.
total_samples (int) – Total number of samples including the new one.

Returns:

Updated average after adding the sample.

Return type:

float

artlib.cvi.iCVIs.CalinkskiHarabasz.delta_remove_sample_from_average(average: float, sample: float, total_samples: int) → float

Calculate the new average if a sample is removed.

Parameters:

average (float) – Current average.
sample (float) – Sample to be removed.
total_samples (int) – Total number of samples before removal.

Returns:

Updated average after removing the sample.

Return type:

float

class artlib.cvi.iCVIs.CalinkskiHarabasz.iCVI_CH(x: numpy.ndarray)

Implementation of the Calinski Harabasz Validity Index in incremental form.

Expanded implementation of the incremental version of the Calinski Harabasz Cluster Validity Index.

The original matlab code can be found at https://github.com/ACIL-Group/iCVI-toolbox/blob/master/classes/CVI_CH.m

The formulation is available at

scholarsmine.mst.edu/cgi/viewcontent.cgi?article=3833&context=doctoral_dissertations Pages 314-316 and 319-320

This implementation returns a dictionary of updated parameters when calling functions, which can then be passed with the update function to accept the changes. This allows for testing changes/additions to the categories without doing a deep copy of the object.

In addition, the calculations for removing a sample, or switching the label of a sample from the dataset are included. This allows for very efficient calculations on clustering algorithms that would like to prune or adjust the labels of samples in the dataset.

For the Calinski Harabasz validity Index, larger values represent better clusters.

dim

n_samples: int = 0

mu

CD: Dict

WGSS = 0

criterion_value = 0

add_sample(x: numpy.ndarray, label: int) → Dict

Calculate the result of adding a new sample with a given label.

Parameters:

x (np.ndarray) – The sample to add to the current validity index calculation.
label (int) – The sample category/cluster.

Returns:

A dictionary containing the updated values after the sample is added.

Return type:

dict

update(params: dict) → None

Update the parameters of the object. Takes the updated params from adding/removing a sample or switching its label, and updates the object. Switching a label needs more updates, so those dicts have an extra set of things to update, signified with the ‘label2’ key existing.

Parameters:: params (dict) – Dictionary containing the updated parameters to be applied.

switch_label(x: numpy.ndarray, label_old: int, label_new: int) → dict

Calculate the parameters when a sample has its label changed.

This essentially removes a sample with the old label from the clusters, then adds it back with the new sample. There are a few optimizations, such as keeping mu the same since adding and removing it doesn’t affect any calculations that are needed.

Otherwise it should work the same as removing a sample and updating, then adding the sample back and updating, without the need to create a deep copy of the object if just testing the operation.

Parameters:

x (np.ndarray) – The sample whose label is being changed.
label_old (int) – The old label of the sample.
label_new (int) – The new label of the sample.

Returns:

A dictionary containing the updated values after switching the label.

Return type:

dict

remove_sample(x: numpy.ndarray, label: int) → dict

Remove a sample from the clusters.

Parameters:

x (np.ndarray) – The sample to remove from the current validity index calculation.
label (int) – The sample category/cluster.

Returns:

A dictionary containing the updated values after the sample is removed.

Return type:

dict