This is a zebra mussel.
While diminutive in size, it is a predatory, invasive species that severely affects the ecology of any locale it is distributed to. And it’s also changed the way we need to think about networks. Recently, a team of scientists at the University of Notre Dame have discovered, through empirical evidence of the invasion patterns of the zebra mussel, that the way we currently view networks as first-order networks or fixed second-order networks are not really applicable in reality, and instead are merely theoretical frameworks unable to explain multiple real-world phenomena. What the scientists realized was that, using the current network framework in place, the pattern of where zebra mussels would show up was highly inconsistent with where they were predicted to be. This model is directly derived from the network framework because the only way for zebra mussels to move is to attach to a container ship. Thus, what the researchers realized is that the first-order network modeling where any random container ship will travel to was quite inaccurate.
The issue the scientists point out is this: in a first-order network, we are led to believe that an assumption of Markov dependency holds – that is – the only thing that affects what node you will go to next is what node you currently stand on. What these researchers have proven in their paper is that actually, where you have been matters, not only just the most recent node, but actually up to five nodes prior to the current one you are at. This phenomena is captured in a network these scientists have formulated called a “higher-order network,” or HON for short, and has been developed because of the fact that with the zebra mussels, the trajectory of shipping lanes, and the invasion of the zebra mussels whose only vector are the large container vessels that move through these lanes, were poorly explained by a network view which would dictate that given you’re at node C and have the choice of node D or node E, the probability of choosing node D or node E is roughly equal. The issue with this approach is that basically, real-world networks are not based on randomness, but rather preferences based on historical data. This is especially the case for container ships, whose choice of port from an intermediary fueling station is most definitely dependent upon what port they picked up their goods at. As such, instead of viewing network edges as simply connections between unrelated pairs of nodes, we need to start viewing the choice of next node as, given you’re at node C with choices of node D and node E, the probability of choosing D/E depends upon whether you started prior to node C at node A or node B. What this accomplishes is the realization of network edge weights as not just between random pairings of nodes with consistent probabilities, but rather dependent upon the prior travelled nodes of a person at some current node. The below infographic helps further illustrate this point.
The importance of moving from this simplistic first-order network to the HON is that, as the researchers show, the HON has a much higher rate of accuracy when trying to model network events such as ““random walking, clustering and ranking” and thus has large implications for both internet-based and social-based models of human activity and possible preferences. One such implication that we have discussed in class is Google’s PageRank, which uses the random walk as an integral component to its algorithm. Where PageRank states that we need to equally divide the “pool of marbles” at some node C to all the nodes it has directed paths to [ say D and E], what the ND researchers have shown is that actually, the proportion of marbles that will be pushed through to D and E is dependent upon whether node C receives its own “marbles” from node A or node B.