artlib.cvi.iCVIs.CalinkskiHarabasz
Some things to consider in the future.
Removing entire labels from the dataset should be possible.
Right now I think its not possible for Fuzzy Art to delete a cluster, so no need to do so now.
Creating functions to explictly add a sample, remove or switch, instead of requiring the update call would be nice Or at least change the function names for add, remove, and switch, since they imply that its done, not that it will update after update is called.
Classes
Implementation of the Calinski Harabasz Validity Index in incremental form. |
Functions
|
Calculate the new average if a sample is added. |
|
Calculate the new average if a sample is removed. |
Module Contents
- artlib.cvi.iCVIs.CalinkskiHarabasz.delta_add_sample_to_average(average: float, sample: float, total_samples: int) float
Calculate the new average if a sample is added.
- artlib.cvi.iCVIs.CalinkskiHarabasz.delta_remove_sample_from_average(average: float, sample: float, total_samples: int) float
Calculate the new average if a sample is removed.
- class artlib.cvi.iCVIs.CalinkskiHarabasz.iCVI_CH(x: numpy.ndarray)
Implementation of the Calinski Harabasz Validity Index in incremental form.
Expanded implementation of the incremental version of the Calinski Harabasz Cluster Validity Index.
The original matlab code can be found at https://github.com/ACIL-Group/iCVI-toolbox/blob/master/classes/CVI_CH.m
The formulation is available at
scholarsmine.mst.edu/cgi/viewcontent.cgi?article=3833&context=doctoral_dissertations Pages 314-316 and 319-320
This implementation returns a dictionary of updated parameters when calling functions, which can then be passed with the update function to accept the changes. This allows for testing changes/additions to the categories without doing a deep copy of the object.
In addition, the calculations for removing a sample, or switching the label of a sample from the dataset are included. This allows for very efficient calculations on clustering algorithms that would like to prune or adjust the labels of samples in the dataset.
For the Calinski Harabasz validity Index, larger values represent better clusters.
- dim
- mu
- CD: Dict
- WGSS = 0
- criterion_value = 0
- add_sample(x: numpy.ndarray, label: int) Dict
Calculate the result of adding a new sample with a given label.
- update(params: dict) None
Update the parameters of the object. Takes the updated params from adding/removing a sample or switching its label, and updates the object. Switching a label needs more updates, so those dicts have an extra set of things to update, signified with the ‘label2’ key existing.
- Parameters:
params (dict) – Dictionary containing the updated parameters to be applied.
- switch_label(x: numpy.ndarray, label_old: int, label_new: int) dict
Calculate the parameters when a sample has its label changed.
This essentially removes a sample with the old label from the clusters, then adds it back with the new sample. There are a few optimizations, such as keeping mu the same since adding and removing it doesn’t affect any calculations that are needed.
Otherwise it should work the same as removing a sample and updating, then adding the sample back and updating, without the need to create a deep copy of the object if just testing the operation.