cophenet#
- scipy.cluster.hierarchy.cophenet(Z, Y=None)[source]#
Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage
Z.Suppose
pandqare original observations in disjoint clusterssandt, respectively andsandtare joined by a direct parent clusteru. The cophenetic distance between observationsiandjis simply the distance between clusterssandt.- Parameters:
- Zndarray
The hierarchical clustering encoded as an array (see
linkagefunction).- Yndarray (optional)
Calculates the cophenetic correlation coefficient
cof a hierarchical clustering defined by the linkage matrix Z of a set of \(n\) observations in \(m\) dimensions. Y is the condensed distance matrix from which Z was generated.
- Returns:
- cndarray
The cophentic correlation distance (if
Yis passed).- dndarray
The cophenetic distance matrix in condensed form. The \(ij\) th entry is the cophenetic distance between original observations \(i\) and \(j\).
See also
linkagefor a description of what a linkage matrix is.
scipy.spatial.distance.squareformtransforming condensed matrices into square ones.
Notes
cophenethas experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variableSCIPY_ARRAY_API=1and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.Library
CPU
GPU
NumPy
✅
n/a
CuPy
n/a
⛔
PyTorch
✅
⛔
JAX
✅
⛔
Dask
⚠️ merges chunks
n/a
See Support for the array API standard for more information.
Examples
>>> from scipy.cluster.hierarchy import single, cophenet >>> from scipy.spatial.distance import pdist, squareform
Given a dataset
Xand a linkage matrixZ, the cophenetic distance between two points ofXis the distance between the largest two distinct clusters that each of the points:>>> X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]
Xcorresponds to this datasetx x x x x x x x x x x x
>>> Z = single(pdist(X)) >>> Z array([[ 0., 1., 1., 2.], [ 2., 12., 1., 3.], [ 3., 4., 1., 2.], [ 5., 14., 1., 3.], [ 6., 7., 1., 2.], [ 8., 16., 1., 3.], [ 9., 10., 1., 2.], [11., 18., 1., 3.], [13., 15., 2., 6.], [17., 20., 2., 9.], [19., 21., 2., 12.]]) >>> cophenet(Z) array([1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 1., 1., 1.])
The output of the
scipy.cluster.hierarchy.cophenetmethod is represented in condensed form. We can usescipy.spatial.distance.squareformto see the output as a regular matrix (where each elementijdenotes the cophenetic distance between eachi,jpair of points inX):>>> squareform(cophenet(Z)) array([[0., 1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [1., 0., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [1., 1., 0., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 0., 1., 1., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 1., 0., 1., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 1., 1., 0., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 0., 1., 1., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 1., 0., 1., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 1., 1., 0., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 0., 1., 1.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 0., 1.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 0.]])
In this example, the cophenetic distance between points on
Xthat are very close (i.e., in the same corner) is 1. For other pairs of points is 2, because the points will be located in clusters at different corners - thus, the distance between these clusters will be larger.