s2.graph¶
PyS2 allows you to construct arbitrary subgraphs of a paper’s citation network
via GraphHopper which can be used with S2GraphBuilder to
create S2Graph objects. These leverage S2DataStore to
reduce memory footprint or API calls when working with large amounts of papers.
Custom Types¶
Warning
This module is under active development and may change significantly in future versions.
-
class
s2.graph.PaperId¶ A valid S2 Paper Identifier string.
-
class
s2.graph.AuthorId¶ A valid S2 Author Identifier string.
-
class
s2.graph.S2PaperMap¶ MutableMappingofPaperIdtoS2Paper.It is recommended to use objects of type
s2.store.S2DataStore, but for rapid prototyping this can be a simple dictionary.
-
class
s2.graph.S2AuthorMap¶ MutableMappingofAuthorIdtoS2Author.It is recommended to use objects of type
s2.store.S2DataStore, but for rapid prototyping this can be a simple dictionary.
-
class
s2.graph.EdgeType¶ Either of ‘references’, ‘citations’, or ‘author’.
-
class
s2.graph.EdgeMeta¶ Dict for storing any additional edge meta-information.
-
class
s2.graph.Neighbours¶ MutableMappingofEdgeTypeto list of (PaperId,EdgeMeta) tuples for neighbouring papers.
-
class
s2.graph.EdgeMap¶ MutableMappingofPaperIdtoNeighbours.
-
class
s2.graph.HopFrom¶ Tuple of
PaperId(the paper being hopped from) andEdgeType(the type of the edge being hopped across)
S2Graph¶
Warning
This module is under active development and may change significantly in future versions.
-
class
s2.graph.S2Graph(edges=None, papers=None, authors=None)¶ Class for storing citation network subgraph.
- Parameters
edges (
EdgeMap, optional) –Stores subgraph edge information. While
S2PaperandS2Authoralready contain the information required to reconstruct citation graphs, they do not allow arbitrary subgraphs and are not as lightweight as a simple mapping of identifiers.Defaults to
defaultdictwith factory forNeighboursfor in-memory storage.papers (
S2PaperMap, optional) –Stores
S2Paperobjects retrievable byPaperId. Recommended to uses2.store.S2DataStoreto avoid keeping large amounts of data in memory.Defaults to
dict.authors (
S2AuthorMap, optional) –Stores
S2Authorobjects retrievable byAuthorId. Recommended to uses2.store.S2DataStoreto avoid keeping large amounts of data in memory.Defaults to
dict.
S2GraphBuilder¶
Warning
This module is under active development and may change significantly in future versions.
-
class
s2.graph.S2GraphBuilder(graph=<s2.graph.graph.S2Graph object>, hopper=<s2.graph.hopper.MaxHopHopper object>, queue=deque([]), discovered_from=None, not_found=None, colliding_paperIds=None, log_every=10, save_path=None, **api_kwargs)¶ Builds an
S2Graphobject.- Parameters
graph (
S2Graph) – TheS2Graphobject to build or continue buiding.hopper (
GraphHopper) – TheGraphHopperobject that defines the strategy for building the citation network subgraph.queue (
Deque) – A queue of papers that remain to be added. Everytime a paper is added, all its neighbours are added to the queue and thehopperwill decide whether these papers should also be added.discovered_from (
Optional[Dict[PaperId,HopFrom]]) – A dictionary for reconstructing graph paths.not_found (
Optional[Set]) – A set of paper identifiers that were not found, to allow subsequent follow-up.colliding_paperIds (
Optional[Dict[PaperId,Set[PaperId]]]) – A dictionary of papers with inconsistent identifiers.log_every (
int) – Log updates every x paper added.save_path (
Union[Path,str,None]) – Where to save progress in event of interruption.**api_kwargs – Additional kwargs for the ::`` module.
-
_add_to_queue(ref, source, edge_type)¶ Add ref to queue, record ref visit, and add edge from source to ref.
- Return type
-
_get_gpath(paperId)¶ Get the path that was traversed to reach
paperIdin theS2Graph.- Return type
GraphPath
-
from_paper_id(paperId)¶ Construct
S2GraphfromPaperIdbased onGraphHopperstrategy.- Parameters
paperId (
str) – S2 paper identifier (seeget_paper()for more info)
GraphHopper¶
Warning
This module is under active development and may change significantly in future versions.
-
class
s2.graph.hopper.GraphHopper¶ The primary role of this class is to implement the
hop()method, which decides whether to traverse an edge and hop to a new node.This simple binary decision-making process allows us to create diverse strategies for constructing a subset of a root paper’s citation graph, with specific properties. In particular, subclasses implementing
hop()can be passed toS2GraphBuilderto createS2Graphinstances with arbitrary structures.For example, this default implementation always hops, and will eventually cover the entire citation graph of the root paper (i.e. the connected component of the full citation graph that contains the root paper). However, the interface of
hop()allows complex decision-making based on the current state of the graph and the path traversed to reach the current candidate paper from the root paper.
-
class
s2.graph.hopper.MaxHopHopper(max_hops=1)¶ Hops until a max distance from the root paper is exceeded.
- Parameters
max_hops (
int, optional) – Max number of hops from the root paper (len(gpath) - 1) beyond whichGraphHopper.hop()returns False. Defaults to1.
-
class
s2.graph.hopper.MaxPaperHopper(max_papers=10)¶ Hops until a max number of papers are added to the graph.
- Parameters
max_papers (
int, optional) –Max number of papers in
graphbeyond whichGraphHopper.hop()returns False. Defaults to1.Note that the number of papers is
len(graph.edges), asgraph.papersis an instance of :class:S2DataStorewhich may contain papers not in the graph.
-
class
s2.graph.hopper.BowtieHopper(max_reference=1, max_citation=1, verify_gpath=False)¶ Creates a bowtie or funnel shaped subset of a citation graph.
i.e. hops only if the traversed path consists of citations of citations or of references of references, up to specified lengths.
- Parameters
max_reference (
int, optional) – Max distance allowed from the root paper in path of references. Defaults to1.max_citation (
int, optional) – Max distance allowed from the root paper in path of citations. Defaults to1.verify_gpath (
bool, optional) – IfFalse, then assume that the path leading to the current node already consists exclusively of citations of citations or of references of references. Otherwise, checks every paper ingpathto ensure this condition is met. Defaults toFalse.