artlib.biclustering.BARTMAP

BARTMAP [], [].

Classes

BARTMAP

BARTMAP for Biclustering.

Module Contents

class artlib.biclustering.BARTMAP.BARTMAP(module_a: artlib.common.BaseART.BaseART, module_b: artlib.common.BaseART.BaseART, eta: float)

Bases: sklearn.base.BaseEstimator, sklearn.base.BiclusterMixin

BARTMAP for Biclustering.

This class implements BARTMAP as first published in: [].

BARTMAP accepts two instantiated BaseART modules module_a and module_b which cluster the rows (samples) and columns (features) respectively. The features are clustered independently, but the samples are clustered by considering samples already within a row cluster as well as the candidate sample and enforcing a minimum correlation within the subset of features belonging to at least one of the feature clusters.

rows_: numpy.ndarray
columns_: numpy.ndarray
params
module_a
module_b
__getattr__(key)
__setattr__(key, value)
get_params(deep: bool = True) dict

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, return the parameters for this estimator and contained subobjects that are estimators.

Returns:

Dictionary of parameter names mapped to their values.

Return type:

dict

set_params(**params)

Set the parameters of this estimator.

Specific redefinition of sklearn.BaseEstimator.set_params for ART classes.

Parameters:

**params (dict) – Estimator parameters as keyword arguments.

Returns:

self – The estimator instance.

Return type:

object

static validate_params(params: dict)

Validate clustering parameters.

Parameters:

params (dict) – Dictionary containing parameters for the algorithm.

property column_labels_: numpy.ndarray

Cluster labels for the columns.

Returns:

column_labels_ – Array of cluster labels assigned to each column.

Return type:

ndarray of shape (n_columns,)

property row_labels_: numpy.ndarray

Cluster labels for the rows.

Returns:

row_labels_ – Array of cluster labels assigned to each row.

Return type:

ndarray of shape (n_rows,)

property n_row_clusters: int

Number of row clusters.

Returns:

n_row_clusters – The number of clusters for the rows.

Return type:

int

property n_column_clusters: int

Number of column clusters.

Returns:

n_column_clusters – The number of clusters for the columns.

Return type:

int

_get_x_cb(x: numpy.ndarray, c_b: int)

Get the components of a vector belonging to a b-side cluster.

Parameters:
  • x (np.ndarray) – A sample vector.

  • c_b (int) – The b-side cluster label.

Returns:

The sample vector x filtered to include only features belonging to the b-side cluster c_b.

Return type:

np.ndarray

static _pearsonr(a: numpy.ndarray, b: numpy.ndarray) float

Get the Pearson correlation between two vectors.

Parameters:
  • a (np.ndarray) – A vector.

  • b (np.ndarray) – Another vector.

Returns:

The Pearson correlation between the two vectors a and b.

Return type:

float

_average_pearson_corr(X: numpy.ndarray, k: int, c_b: int) float

Get the average Pearson correlation for a sample across all features in cluster b.

Parameters:
  • X (np.ndarray) – The dataset A.

  • k (int) – The sample index.

  • c_b (int) – The b-side cluster to check.

Returns:

The average Pearson correlation for the sample at index k across all features in cluster c_b.

Return type:

float

validate_data(X_a: numpy.ndarray, X_b: numpy.ndarray)

Validate the data prior to clustering.

Parameters:
  • X_a (np.ndarray) – Dataset A, containing the samples.

  • X_b (np.ndarray) – Dataset B, containing the features.

match_criterion_bin(X: numpy.ndarray, k: int, c_b: int, params: dict) bool

Get the binary match criterion of the cluster.

Parameters:
  • X (np.ndarray) – The dataset.

  • k (int) – The sample index.

  • c_b (int) – The b-side cluster to check.

  • params (dict) – Dictionary containing parameters for the algorithm.

Returns:

Binary value indicating whether the cluster match criterion is met.

Return type:

bool

match_reset_func(i: numpy.ndarray, w: numpy.ndarray, cluster_a, params: dict, extra: dict, cache: dict | None = None) bool

Permit external factors to influence cluster creation.

Parameters:
  • i (np.ndarray) – Data sample.

  • w (np.ndarray) – Cluster weight or information.

  • cluster_a (int) – A-side cluster label.

  • params (dict) – Dictionary containing parameters for the algorithm.

  • extra (dict) – Additional parameters for the algorithm.

  • cache (dict, optional) – Dictionary containing values cached from previous calculations.

Returns:

True if the match is permitted, otherwise False.

Return type:

bool

step_fit(X: numpy.ndarray, k: int) int

Fit the model to a single sample.

Parameters:
  • X (np.ndarray) – The dataset.

  • k (int) – The sample index.

Returns:

The cluster label of the input sample.

Return type:

int

fit(X: numpy.ndarray, max_iter=1)

Fit the model to the data.

Parameters:
  • X (np.ndarray) – The dataset to fit the model on.

  • max_iter (int) – The number of iterations to fit the model on the same dataset.

visualize(cmap: matplotlib.colors.Colormap | None = None)

Visualize the clustering of the data.

Parameters:

cmap (matplotlib.colors.Colormap or str) – The colormap to use for visualization.