Skip to content

graph laplacian difference #811

@satra

Description

@satra

i'm trying to refactor similarity matrices and graph and related manifold learning components and noticed that what scikits currently calls normed graph_laplacian (i'm assuming normed means normalized) is very different in output from the equations from von luxburg: http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Luxburg07_tutorial_4488%5b0%5d.pdf

any thoughts on what 'normed' means in this context. the reason this is important is because the normed version is used in spectral clustering and that's different from any of the approaches in the paper.

In [1]: from sklearn.utils.graph import _graph_laplacian_dense

In [2]: import numpy as np

In [6]: adj = np.random.rand(3,3)

In [8]: adj += adj.T

In [9]: adj
Out[9]: 
array([[ 1.40570635,  0.94538262,  0.69864294],
       [ 0.94538262,  0.09899904,  0.87797918],
       [ 0.69864294,  0.87797918,  0.06119393]])

Current output

In [26]: _graph_laplacian_dense(W)
Out[26]: 
array([[ 1.64402557, -0.94538262, -0.69864294],
       [-0.94538262,  1.82336181, -0.87797918],
       [-0.69864294, -0.87797918,  1.57662212]])

In [27]: _graph_laplacian_dense(W, normed=True)
Out[27]: 
array([[ 1.        , -0.5460305 , -0.43394749],
       [-0.5460305 ,  1.        , -0.51782616],
       [-0.43394749, -0.51782616,  1.        ]])

In [29]: _graph_laplacian_dense(W, normed=True, return_diag=True)
Out[29]: 
(array([[ 1.        , -0.5460305 , -0.43394749],
       [-0.5460305 ,  1.        , -0.51782616],
       [-0.43394749, -0.51782616,  1.        ]]),
 array([ 1.2821956 ,  1.35031915,  1.25563614]))

proposed changes (based on von Luxburg)

see commit in: satra/scikit-learn@2d289ec

In [12]: D = np.diag(np.sum(adj, axis=0))

In [13]: W = adj

In [15]: D
Out[15]: 
array([[ 3.04973191,  0.        ,  0.        ],
       [ 0.        ,  1.92236084,  0.        ],
       [ 0.        ,  0.        ,  1.63781605]])

In [17]: Dinv = np.diag(1/np.sum(adj, axis=0))

In [18]: Dinv
Out[18]: 
array([[ 0.32789767,  0.        ,  0.        ],
       [ 0.        ,  0.5201937 ,  0.        ],
       [ 0.        ,  0.        ,  0.61056918]])

In [19]: L = D-W

In [20]: Lsym = np.dot(np.sqrt(Dinv), np.dot(L, np.sqrt(Dinv)))

In [22]: Lrw = np.dot(Dinv, L)

In [23]: L
Out[23]: 
array([[ 1.64402557, -0.94538262, -0.69864294],
       [-0.94538262,  1.82336181, -0.87797918],
       [-0.69864294, -0.87797918,  1.57662212]])

In [24]: Lsym
Out[24]: 
array([[ 0.53907216, -0.39044452, -0.31260209],
       [-0.39044452,  0.94850132, -0.49480514],
       [-0.31260209, -0.49480514,  0.96263687]])

In [25]: Lrw
Out[25]: 
array([[ 0.53907216, -0.30998876, -0.2290834 ],
       [-0.49178209,  0.94850132, -0.45671924],
       [-0.42656985, -0.53606703,  0.96263687]])

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions