s2.graph

PyS2 allows you to construct arbitrary subgraphs of a paper’s citation network via GraphHopper which can be used with S2GraphBuilder to create S2Graph objects. These leverage S2DataStore to reduce memory footprint or API calls when working with large amounts of papers.

Custom Types

Warning

This module is under active development and may change significantly in future versions.

class s2.graph.PaperId

A valid S2 Paper Identifier string.

class s2.graph.AuthorId

A valid S2 Author Identifier string.

class s2.graph.S2PaperMap

MutableMapping of PaperId to S2Paper.

It is recommended to use objects of type s2.store.S2DataStore, but for rapid prototyping this can be a simple dictionary.

class s2.graph.S2AuthorMap

MutableMapping of AuthorId to S2Author.

It is recommended to use objects of type s2.store.S2DataStore, but for rapid prototyping this can be a simple dictionary.

class s2.graph.EdgeType

Either of ‘references’, ‘citations’, or ‘author’.

class s2.graph.EdgeMeta

Dict for storing any additional edge meta-information.

class s2.graph.Neighbours

MutableMapping of EdgeType to list of (PaperId, EdgeMeta) tuples for neighbouring papers.

class s2.graph.EdgeMap

MutableMapping of PaperId to Neighbours.

class s2.graph.HopFrom

Tuple of PaperId (the paper being hopped from) and EdgeType (the type of the edge being hopped across)

class s2.graph.HopTo

Tuple of PaperId (the paper being hopped to) and EdgeType (the type of the edge being hopped across).

Note that unlike HopFrom, the edge type can be None if the paper “being hopped to” is actually the first paper being visited (i.e. without any edge traversal).

class s2.graph.GraphPath

List of HopTo for the path traversed in a S2Graph.

S2Graph

Warning

This module is under active development and may change significantly in future versions.

class s2.graph.S2Graph(edges=None, papers=None, authors=None)

Class for storing citation network subgraph.

Parameters
  • edges (EdgeMap, optional) –

    Stores subgraph edge information. While S2Paper and S2Author already contain the information required to reconstruct citation graphs, they do not allow arbitrary subgraphs and are not as lightweight as a simple mapping of identifiers.

    Defaults to defaultdict with factory for Neighbours for in-memory storage.

  • papers (S2PaperMap, optional) –

    Stores S2Paper objects retrievable by PaperId. Recommended to use s2.store.S2DataStore to avoid keeping large amounts of data in memory.

    Defaults to dict.

  • authors (S2AuthorMap, optional) –

    Stores S2Author objects retrievable by AuthorId. Recommended to use s2.store.S2DataStore to avoid keeping large amounts of data in memory.

    Defaults to dict.

S2GraphBuilder

Warning

This module is under active development and may change significantly in future versions.

class s2.graph.S2GraphBuilder(graph=<s2.graph.graph.S2Graph object>, hopper=<s2.graph.hopper.MaxHopHopper object>, queue=deque([]), discovered_from=None, not_found=None, colliding_paperIds=None, log_every=10, save_path=None, **api_kwargs)

Builds an S2Graph object.

Parameters
  • graph (S2Graph) – The S2Graph object to build or continue buiding.

  • hopper (GraphHopper) – The GraphHopper object that defines the strategy for building the citation network subgraph.

  • queue (Deque) – A queue of papers that remain to be added. Everytime a paper is added, all its neighbours are added to the queue and the hopper will decide whether these papers should also be added.

  • discovered_from (Optional[Dict[PaperId, HopFrom]]) – A dictionary for reconstructing graph paths.

  • not_found (Optional[Set]) – A set of paper identifiers that were not found, to allow subsequent follow-up.

  • colliding_paperIds (Optional[Dict[PaperId, Set[PaperId]]]) – A dictionary of papers with inconsistent identifiers.

  • log_every (int) – Log updates every x paper added.

  • save_path (Union[Path, str, None]) – Where to save progress in event of interruption.

  • **api_kwargs – Additional kwargs for the ::`` module.

_add_to_queue(ref, source, edge_type)

Add ref to queue, record ref visit, and add edge from source to ref.

Return type

None

_get_gpath(paperId)

Get the path that was traversed to reach paperId in the S2Graph.

Return type

GraphPath

_get_paper(paperId)

Get an S2Paper via its paperId, checking local db first.

Return type

S2Paper

from_paper_id(paperId)

Construct S2Graph from PaperId based on GraphHopper strategy.

Parameters

paperId (str) – S2 paper identifier (see get_paper() for more info)

GraphHopper

Warning

This module is under active development and may change significantly in future versions.

class s2.graph.hopper.GraphHopper

The primary role of this class is to implement the hop() method, which decides whether to traverse an edge and hop to a new node.

This simple binary decision-making process allows us to create diverse strategies for constructing a subset of a root paper’s citation graph, with specific properties. In particular, subclasses implementing hop() can be passed to S2GraphBuilder to create S2Graph instances with arbitrary structures.

For example, this default implementation always hops, and will eventually cover the entire citation graph of the root paper (i.e. the connected component of the full citation graph that contains the root paper). However, the interface of hop() allows complex decision-making based on the current state of the graph and the path traversed to reach the current candidate paper from the root paper.

class s2.graph.hopper.MaxHopHopper(max_hops=1)

Hops until a max distance from the root paper is exceeded.

Parameters

max_hops (int, optional) – Max number of hops from the root paper (len(gpath) - 1) beyond which GraphHopper.hop() returns False. Defaults to 1.

class s2.graph.hopper.MaxPaperHopper(max_papers=10)

Hops until a max number of papers are added to the graph.

Parameters

max_papers (int, optional) –

Max number of papers in graph beyond which GraphHopper.hop() returns False. Defaults to 1.

Note that the number of papers is len(graph.edges), as graph.papers is an instance of :class:S2DataStore which may contain papers not in the graph.

class s2.graph.hopper.BowtieHopper(max_reference=1, max_citation=1, verify_gpath=False)

Creates a bowtie or funnel shaped subset of a citation graph.

i.e. hops only if the traversed path consists of citations of citations or of references of references, up to specified lengths.

Parameters
  • max_reference (int, optional) – Max distance allowed from the root paper in path of references. Defaults to 1.

  • max_citation (int, optional) – Max distance allowed from the root paper in path of citations. Defaults to 1.

  • verify_gpath (bool, optional) – If False, then assume that the path leading to the current node already consists exclusively of citations of citations or of references of references. Otherwise, checks every paper in gpath to ensure this condition is met. Defaults to False.