circle_bundles.fiberwise_clustering

circle_bundles.fiberwise_clustering(data, U, eps_values, min_sample_values, *, build_pca_embeddings=True, pca_dim=2, verbose=True)[source]

Cluster each fiber with DBSCAN and then merge clusters globally via an overlap graph.

This is a useful diagnostic tool when you have a cover (fibers) over the dataset and want to see whether local cluster structure aligns across overlaps.

Workflow

For each fiber r (row of U), run DBSCAN on the points data[U[r]] using eps_values[r] and min_sample_values[r].
Create a graph whose nodes are fiber-clusters (r, label) (excluding label -1). Each node stores the list of sample indices belonging to that DBSCAN cluster.
Add an edge between two nodes if the corresponding fiber-clusters share at least one sample index that is non-noise in both fibers. Edges store indices_shared.
Define a global component label per sample by taking connected components of the graph and assigning each sample that appears in any node of that connected component.

param data:

Array of shape (n_samples, d) containing the ambient data vectors.

param U:

Boolean membership matrix of shape (n_fibers, n_samples). Row r indicates which samples lie in fiber/cover set U_r.

param eps_values:

Array of shape (n_fibers,) with the DBSCAN eps parameter used for each fiber.

param min_sample_values:

Array of shape (n_fibers,) with the DBSCAN min_samples parameter used for each fiber.

param build_pca_embeddings:

If True, compute PCA embeddings (within each fiber) for quick plotting with plot_fiberwise_pca_grid().

param pca_dim:

Number of PCA components to compute per fiber (typically 2).

returns:

components – Integer array of shape (n_samples,) giving a global component label per sample. Samples not belonging to any non-noise fiber-cluster remain -1.
G – A networkx.Graph. Nodes are tuples (fiber_idx, cluster_label) with node attribute indices (list of sample indices). Edges indicate overlap, and store indices_shared (the supporting sample indices).
graph_dict – A simple serialization-friendly representation of the graph (nodes/links).
cl – Integer array of shape (n_fibers, n_samples) giving the DBSCAN label of each sample within each fiber. Label -1 denotes noise (or not present in that fiber).
summary –
Dict of helpful arrays for downstream plotting, including:
- fiber_component_counts: number of DBSCAN clusters per fiber (excluding noise)
- global_component_counts: number of graph nodes per global component
- point_counts: number of samples assigned to each global component
- pca_store: per-fiber PCA embeddings if requested

Notes

This routine requires optional dependencies: networkx and scikit-learn.
Edge weights are not computed here. If you want a weighted filtration on the cluster graph, call get_weights() (internal helper) or attach your own edge["weight"] values.

Examples

>>> components, G, graph_dict, cl, summary = fiberwise_clustering(
...     data, U,
...     eps_values=np.full(U.shape[0], 0.25),
...     min_sample_values=np.full(U.shape[0], 10),
... )
>>> fig, _ = plot_fiberwise_summary_bars(summary)

Parameters:

data (ndarray)
U (ndarray)
eps_values (ndarray)
min_sample_values (ndarray)
build_pca_embeddings (bool)
pca_dim (int)
verbose (bool)