In this article, we introduce two divide-and-conquer algorithms for clustering large graphs. Both algorithms apply a basic algorithm to several small subgraphs and then use these individual local clusterings to obtain a global clustering. We show that our methods help to scale computationally intensive basic clustering algorithms to large networks and to improve the algorithmic stability of some well-known algorithms.
In this article, we develop divide-and-conquer strategies to solve the problem of community discovery in networks. We propose two algorithms that cluster on multiple small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly reduce the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, probability-based methods, etc., without losing accuracy, and sometimes even improving accuracy. By nature, these algorithms can also be parallelized. Since most traditional algorithms are accurate and the corresponding optimization problems are much simpler for small problems, our divide-and-conquer methods offer an omnibus recipe for scaling traditional algorithms to large networks. We prove the consistency of these algorithms under different subgraph selection procedures and carry out extensive simulations and real data analyzes in order to understand the advantages of the divide-and-conquer approach in different situations.
- Accepted 08/30/2021.
Contributions by authors: Research designed by SSM, PS and PJB; SSM, PS, and PJB conducted research; SSM and PS contributed new reagents / analysis tools; SSM and PS analyzed data; and SSM and PS wrote the paper.
Reviewer: CEP, Johns Hopkins University; and KR, University of Wisconsin-Madison.
The authors do not declare any competing interests.
This article provides supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2100482118/-/DCSupplemental.