circle_bundles.fiberwise_clustering
- circle_bundles.fiberwise_clustering(data, U, eps_values, min_sample_values, *, build_pca_embeddings=True, pca_dim=2, verbose=True)[source]
Cluster each fiber with DBSCAN and then merge clusters globally via an overlap graph.
This is a useful diagnostic tool when you have a cover (fibers) over the dataset and want to see whether local cluster structure aligns across overlaps.
Workflow
For each fiber r (row of
U), run DBSCAN on the pointsdata[U[r]]usingeps_values[r]andmin_sample_values[r].Create a graph whose nodes are fiber-clusters
(r, label)(excluding label-1). Each node stores the list of sample indices belonging to that DBSCAN cluster.Add an edge between two nodes if the corresponding fiber-clusters share at least one sample index that is non-noise in both fibers. Edges store
indices_shared.Define a global component label per sample by taking connected components of the graph and assigning each sample that appears in any node of that connected component.
- param data:
Array of shape
(n_samples, d)containing the ambient data vectors.- param U:
Boolean membership matrix of shape
(n_fibers, n_samples). Rowrindicates which samples lie in fiber/cover setU_r.- param eps_values:
Array of shape
(n_fibers,)with the DBSCANepsparameter used for each fiber.- param min_sample_values:
Array of shape
(n_fibers,)with the DBSCANmin_samplesparameter used for each fiber.- param build_pca_embeddings:
If True, compute PCA embeddings (within each fiber) for quick plotting with
plot_fiberwise_pca_grid().- param pca_dim:
Number of PCA components to compute per fiber (typically 2).
- returns:
components – Integer array of shape
(n_samples,)giving a global component label per sample. Samples not belonging to any non-noise fiber-cluster remain-1.G – A
networkx.Graph. Nodes are tuples(fiber_idx, cluster_label)with node attributeindices(list of sample indices). Edges indicate overlap, and storeindices_shared(the supporting sample indices).graph_dict – A simple serialization-friendly representation of the graph (nodes/links).
cl – Integer array of shape
(n_fibers, n_samples)giving the DBSCAN label of each sample within each fiber. Label-1denotes noise (or not present in that fiber).summary –
- Dict of helpful arrays for downstream plotting, including:
fiber_component_counts: number of DBSCAN clusters per fiber (excluding noise)global_component_counts: number of graph nodes per global componentpoint_counts: number of samples assigned to each global componentpca_store: per-fiber PCA embeddings if requested
Notes
This routine requires optional dependencies:
networkxandscikit-learn.Edge weights are not computed here. If you want a weighted filtration on the cluster graph, call
get_weights()(internal helper) or attach your ownedge["weight"]values.
Examples
>>> components, G, graph_dict, cl, summary = fiberwise_clustering( ... data, U, ... eps_values=np.full(U.shape[0], 0.25), ... min_sample_values=np.full(U.shape[0], 10), ... ) >>> fig, _ = plot_fiberwise_summary_bars(summary)