PLSCANClusterer
PLSCANClusterer is a Toponymy Clusterer wrapper around
fast_hdbscan.PLSCAN. It is useful when you want PLSCAN-based layered
clustering inside Toponymy while still using the standard Toponymy clusterer
interface. It does not replace ToponymyClusterer; it is an alternative
clusterer for users who specifically want PLSCAN’s persistence-based cluster
layers.
When to use it
Use PLSCANClusterer when you want to:
cluster on
clusterable_vectors, often a low-dimensional map or another clusterable representation;compute Toponymy centroids from
embedding_vectors;inspect PLSCAN layers through the same
cluster_layers_andcluster_tree_interface used by other Toponymy clusterers.
Basic usage
from toponymy import ClusterLayerText, PLSCANClusterer
clusterer = PLSCANClusterer(
min_clusters=6,
min_samples=5,
base_min_cluster_size=10,
max_layers=4,
)
cluster_layers, cluster_tree = clusterer.fit_predict(
clusterable_vectors=clusterable_vectors,
embedding_vectors=embedding_vectors,
layer_class=ClusterLayerText,
)
# Toponymy layer objects and the parent/child cluster tree
clusterer.cluster_layers_
clusterer.cluster_tree_
# PLSCAN metadata retained for the returned layers
clusterer.cluster_probabilities_
clusterer.cluster_persistence_scores_
clusterer.plscan_min_cluster_sizes_
cluster_layers is the list of Toponymy cluster layer objects. cluster_tree
maps clusters between neighboring layers.
Notes
clusterable_vectorsare passed tofast_hdbscan.PLSCAN.fit(...).embedding_vectorsare used for Toponymy centroid construction.-1labels are preserved as noise or unlabelled points.cluster_probabilities_andcluster_persistence_scores_are stored on the clusterer object for the returned layers.plscan_min_cluster_sizes_stores themin_cluster_sizes_trace exposed byfast_hdbscan.PLSCAN.