GraphHopper

Warning

This module is under active development and may change significantly in future versions.

class s2.graph.hopper.GraphHopper

The primary role of this class is to implement the hop() method, which decides whether to traverse an edge and hop to a new node.

This simple binary decision-making process allows us to create diverse strategies for constructing a subset of a root paper’s citation graph, with specific properties. In particular, subclasses implementing hop() can be passed to S2GraphBuilder to create S2Graph instances with arbitrary structures.

For example, this default implementation always hops, and will eventually cover the entire citation graph of the root paper (i.e. the connected component of the full citation graph that contains the root paper). However, the interface of hop() allows complex decision-making based on the current state of the graph and the path traversed to reach the current candidate paper from the root paper.

class s2.graph.hopper.MaxHopHopper(max_hops=1)

Hops until a max distance from the root paper is exceeded.

Parameters

max_hops (int, optional) – Max number of hops from the root paper (len(gpath) - 1) beyond which GraphHopper.hop() returns False. Defaults to 1.

class s2.graph.hopper.MaxPaperHopper(max_papers=10)

Hops until a max number of papers are added to the graph.

Parameters

max_papers (int, optional) –

Max number of papers in graph beyond which GraphHopper.hop() returns False. Defaults to 1.

Note that the number of papers is len(graph.edges), as graph.papers is an instance of :class:S2DataStore which may contain papers not in the graph.

class s2.graph.hopper.BowtieHopper(max_reference=1, max_citation=1, verify_gpath=False)

Creates a bowtie or funnel shaped subset of a citation graph.

i.e. hops only if the traversed path consists of citations of citations or of references of references, up to specified lengths.

Parameters
  • max_reference (int, optional) – Max distance allowed from the root paper in path of references. Defaults to 1.

  • max_citation (int, optional) – Max distance allowed from the root paper in path of citations. Defaults to 1.

  • verify_gpath (bool, optional) – If False, then assume that the path leading to the current node already consists exclusively of citations of citations or of references of references. Otherwise, checks every paper in gpath to ensure this condition is met. Defaults to False.