Atomic Cycles and Finding Holes in Manifolds
It's been a while! Turns out being in a PhD program keeps you pretty busy. I'm gonna try to get back into research blogging, but we'll see how long that lasts. This post is going back to manifold learning, my research focus during my undergrad at the University of Michigan.
Machine learning theoreticians and practitioners frequently make use of the manifold hypothesis, which asserts that high-dimensional data generally lie along lower-dimensional latent manifolds. Manifold learning makes heavy use of that assumption: it tries to find a coordinate chart of that manifold, towards producing an explicit low-dimensional representation of the data. One potential shortcoming of this strategy is that not all manifolds can be represented using a single coordinate chart, due to holes in the manifold. One can get around this by making cuts to the manifold or embedding pieces of the manifold as an atlas of charts, but to do so, one must identify the holes. Since the manifold is unknown, one has to do this with only the input data.
Given the standard manifold learning approach of connecting data points together to form a nearest neighbors graph, it makes sense to look at that graph to try and identify holes. Certainly, if there's a hole in the manifold, there should also be some sort of "hole" in the graph (assuming sufficient data). But such a notion must be made explicit, and it needs to be efficiently computable. (For example, topological data analysis provides formal methods for these purposes, but they generally don't scale to the problems we care about.)
Formally, a hole in a graph is a chordless cycle within the graph of length four or more. One can certainly compute all of the holes in a graph, but this notion doesn't necessarily correspond to the notion of a hole in the manifold. For example, in the following image (reproduced from Robust Manifold Learning with CycleCut, Gashler and Martinez), the large outer cycle is indeed chordless, but the structure in the middle suggests it's not a hole in the manifold.
To avoid this problem, Gashler and Martinez introduced the notion of an atomic cycle. An atomic cycle is a cycle with no n-chords, where n is some constant integer. (An n-chord is a sequence of n edges that connect two vertices in the cycle, where n is less than the length of the path connecting the vertices along the cycle.) For example, the outer cycle in the above graph is not an atomic cycle, since it has a 2-chord.
Gashler and Martinez presented an algorithm for finding large atomic cycles as part of their CycleCut paper. It involves two steps: an outer breadth-first search (BFS) to find candidate cycles, and an inner BFS to search for n-chords. I implemented their algorithm as part of my atlas learning paper, and put together a simple visualization of the algorithm, which you can run for yourself here. Given data with an obvious topological hole, you can see the neighborhood graph, the outer BFS (which has found a candidate cycle), the inner BFS (which searches for an n-chord), and the identification that no n-chord exists (yielding the atomic cycle).
Large atomic cycles are an excellent indicator of 1-dimensional holes, but as far as I know, identifying higher-dimensional topological holes in a graph is an open problem. If you happen to know of efficient algorithms for this, please send them my way!
Comments
Post a Comment