Application of the Latent Position Cluster Model to Large(r) Networks
Networks are used to represent data on interactions between actors or nodes. They are employed to model a diverse range of statistical problem, from disease epidemics to social human networks. Recent work on Bayesian analysis of the links between these actors has focussed on embedding the actors in a latent ``social space''. Links between actors are more likely given a closer relative position in this social space. The Latent Position Cluster Model (LPCM) allows explicit modelling of the clustering that is exhibited in many network datasets (Handcock et al 2007). A new addition to this model is the inclusion of sender and / or receiver effects for the nodes. This explicit modelling of sociality allows for degree heterogeneity of the nodes (Krivitsky et al, 2009).
Inference via MCMC is cumbersome and scaling of these methods to large networks with many interacting nodes is a challenge. Variational Bayesian methods offer one solution to this problem. An approximate, closed form posterior is formed, with unknown variational parameters. These parameters are tuned by minimisation of the Kullback-Leibler divergence between the approximate variational posterior and the true posterior, known only up to proportionality.
The computational overhead is far less than sampling based methods and this allows for richer models to be developed. We demonstrate that inclusion of sociality effects may be treated as a special case of covariates of the nodes of the network. We address the issue of convergence of the variational algorithm to a local optimum and demonstrate a force based algorithm for initialisation of the latent positions that is based on the log-likelihood of the data. We also discuss some interesting and unresolved challenges including: overlapping network data (missing links), applicability of LPCM to disconnected components and very large networks and hierarchical clustering for large networks.