artlib.biclustering.BARTMAP

BARTMAP [], [].

Classes

BARTMAP

BARTMAP for Biclustering.

Module Contents

class artlib.biclustering.BARTMAP.BARTMAP(module_a: artlib.common.BaseART.BaseART, module_b: artlib.common.BaseART.BaseART, eta: float)

Bases: sklearn.base.BaseEstimator, sklearn.base.BiclusterMixin

BARTMAP for Biclustering.

This class implements BARTMAP as first published in: [].

BARTMAP accepts two instantiated BaseART modules module_a and module_b which cluster the rows (samples) and columns (features) respectively. The features are clustered independently, but the samples are clustered by considering samples already within a row cluster as well as the candidate sample and enforcing a minimum correlation within the subset of features belonging to at least one of the feature clusters.

rows_: numpy.ndarray

columns_: numpy.ndarray

params

module_a

module_b

__getattr__(key)

__setattr__(key, value)

get_params(deep: bool = True) → dict

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, return the parameters for this estimator and contained subobjects that are estimators.
Returns:: Dictionary of parameter names mapped to their values.
Return type:: dict

set_params(**params)

Set the parameters of this estimator.

Specific redefinition of sklearn.BaseEstimator.set_params for ART classes.

Parameters:: **params (dict) – Estimator parameters as keyword arguments.
Returns:: self – The estimator instance.
Return type:: object

static validate_params(params: dict)

Validate clustering parameters.

Parameters:: params (dict) – Dictionary containing parameters for the algorithm.

property column_labels_: numpy.ndarray

Cluster labels for the columns.

Returns:: column_labels_ – Array of cluster labels assigned to each column.
Return type:: ndarray of shape (n_columns,)

property row_labels_: numpy.ndarray

Cluster labels for the rows.

Returns:: row_labels_ – Array of cluster labels assigned to each row.
Return type:: ndarray of shape (n_rows,)

property n_row_clusters: int

Number of row clusters.

Returns:: n_row_clusters – The number of clusters for the rows.
Return type:: int

property n_column_clusters: int

Number of column clusters.

Returns:: n_column_clusters – The number of clusters for the columns.
Return type:: int

_get_x_cb(x: numpy.ndarray, c_b: int)

Get the components of a vector belonging to a b-side cluster.

Parameters:

x (np.ndarray) – A sample vector.
c_b (int) – The b-side cluster label.

Returns:

The sample vector x filtered to include only features belonging to the b-side cluster c_b.

Return type:

np.ndarray

static _pearsonr(a: numpy.ndarray, b: numpy.ndarray) → float

Get the Pearson correlation between two vectors.

Parameters:

a (np.ndarray) – A vector.
b (np.ndarray) – Another vector.

Returns:

The Pearson correlation between the two vectors a and b.

Return type:

float

_average_pearson_corr(X: numpy.ndarray, k: int, c_b: int) → float

Get the average Pearson correlation for a sample across all features in cluster b.

Parameters:

X (np.ndarray) – The dataset A.
k (int) – The sample index.
c_b (int) – The b-side cluster to check.

Returns:

The average Pearson correlation for the sample at index k across all features in cluster c_b.

Return type:

float

validate_data(X_a: numpy.ndarray, X_b: numpy.ndarray)

Validate the data prior to clustering.

Parameters:

X_a (np.ndarray) – Dataset A, containing the samples.
X_b (np.ndarray) – Dataset B, containing the features.

match_criterion_bin(X: numpy.ndarray, k: int, c_b: int, params: dict) → bool

Get the binary match criterion of the cluster.

Parameters:

X (np.ndarray) – The dataset.
k (int) – The sample index.
c_b (int) – The b-side cluster to check.
params (dict) – Dictionary containing parameters for the algorithm.

Returns:

Binary value indicating whether the cluster match criterion is met.

Return type:

bool

match_reset_func(i: numpy.ndarray, w: numpy.ndarray, cluster_a, params: dict, extra: dict, cache: dict | None = None) → bool

Permit external factors to influence cluster creation.

Parameters:

i (np.ndarray) – Data sample.
w (np.ndarray) – Cluster weight or information.
cluster_a (int) – A-side cluster label.
params (dict) – Dictionary containing parameters for the algorithm.
extra (dict) – Additional parameters for the algorithm.
cache (dict, optional) – Dictionary containing values cached from previous calculations.

Returns:

True if the match is permitted, otherwise False.

Return type:

bool

step_fit(X: numpy.ndarray, k: int) → int

Fit the model to a single sample.

Parameters:

X (np.ndarray) – The dataset.
k (int) – The sample index.

Returns:

The cluster label of the input sample.

Return type:

int

fit(X: numpy.ndarray, max_iter=1)

Fit the model to the data.

Parameters:

X (np.ndarray) – The dataset to fit the model on.
max_iter (int) – The number of iterations to fit the model on the same dataset.

visualize(cmap: matplotlib.colors.Colormap | None = None)

Visualize the clustering of the data.

Parameters:: cmap (matplotlib.colors.Colormap or str) – The colormap to use for visualization.