| Title: | Snowball searches for OpenAlex using the openalexPro pipeline |
|---|---|
| Description: | Perform snowball searches on the OpenAlex citation graph using openalexPro's on-disk processing pipeline and store results in Parquet. |
| Authors: | Rainer M Krug |
| Maintainer: | Rainer M Krug <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.1.4 |
| Built: | 2026-06-02 18:43:44 UTC |
| Source: | https://github.com/openalexPro/openalexSnowball |
A function to perform a snowball search and convert the result to a tibble/data frame.
pro_snowball( identifier = NULL, doi = NULL, output = tempfile(fileext = ".snowball"), verbose = FALSE )pro_snowball( identifier = NULL, doi = NULL, output = tempfile(fileext = ".snowball"), verbose = FALSE )
identifier |
Character vector of openalex identifiers. |
doi |
Character vector of dois. |
output |
parquet dataset; default: temporary directory. |
verbose |
Logical indicating whether to show a verbose information.
Defaults to |
The folder of the results containing multiple subfolders.
A function to extract the edges from a parquet database containing the nodes
pro_snowball_extract_edges( nodes = NULL, output = tempfile(fileext = ".snowball"), verbose = FALSE )pro_snowball_extract_edges( nodes = NULL, output = tempfile(fileext = ".snowball"), verbose = FALSE )
nodes |
Path to the nodes parquet dataset |
output |
output folder, in which the parquet database containing the
edges called |
verbose |
Logical indicating whether to show a verbose information.
Defaults to |
A list containing 2 elements:
nodes: dataframe with publication records.
The last column oa_input indicates whether the work was one of the input
identifier(s).
edges: publication link dataframe of 2 columns from, to
such that a row A, B means A -> B means A cites B. In bibliometrics, the
"citation action" comes from A to B.
## Not run: snowball_docs <- pro_snowball( identifier = c("W2741809807", "W2755950973"), citing_params = list(from_publication_date = "2022-01-01"), cited_by_params = list(), verbose = TRUE ) # Identical to above, but searches using paper DOIs snowball_docs_doi <- oa_snowball( doi = c("10.1016/j.joi.2017.08.007", "10.7717/peerj.4375"), citing_params = list(from_publication_date = "2022-01-01"), cited_by_params = list(), verbose = TRUE ) ## End(Not run)## Not run: snowball_docs <- pro_snowball( identifier = c("W2741809807", "W2755950973"), citing_params = list(from_publication_date = "2022-01-01"), cited_by_params = list(), verbose = TRUE ) # Identical to above, but searches using paper DOIs snowball_docs_doi <- oa_snowball( doi = c("10.1016/j.joi.2017.08.007", "10.7717/peerj.4375"), citing_params = list(from_publication_date = "2022-01-01"), cited_by_params = list(), verbose = TRUE ) ## End(Not run)
A function to get the nodes for a snowball search
pro_snowball_get_nodes( identifier = NULL, doi = NULL, limit = NULL, output = tempfile(fileext = ".snowball"), verbose = FALSE )pro_snowball_get_nodes( identifier = NULL, doi = NULL, limit = NULL, output = tempfile(fileext = ".snowball"), verbose = FALSE )
identifier |
Character vector of openalex identifiers. |
doi |
Character vector of dois. |
limit |
If |
output |
parquet dataset; default: temporary directory. |
verbose |
Logical indicating whether to show a verbose information.
Defaults to |
Path to the nodes parquet dataset
This function reads a snowball from Apache Parquet format and returns a list
containing nodes and edges, which can be either Arrow Datasets or tibbles.
read_snowball( snowball = NULL, edge_type = c("core", "extended", "outside"), return_data = FALSE, shorten_ids = FALSE )read_snowball( snowball = NULL, edge_type = c("core", "extended", "outside"), return_data = FALSE, shorten_ids = FALSE )
snowball |
The directory of the Parquet files as poppulater by
|
edge_type |
type of the returned edges. Possible values are:
|
return_data |
Logical indicating whether to return an |
shorten_ids |
If |
A list containing two elements: nodes and edges, which are either
ArrowObject representing the corpus or tibbles containing the data.