Typical tasks and example network datasets.
Reading 1 (Chapter 9), Reading 2, Recording, Rmarkdown
Networks and trees can be used to represent information in a variety of contexts. Abstractly, networks and trees are types of graphs, which are defined by (a) a set \(V\) of vertices and (b) a set \(E\) of edges between pairs of vertices.
It is helpful to have a few specific examples in mind,
Either vertices or edges might have attributes. For example, in the directory tree, we might know the sizes of the files (vertex attribute), and in the disease transmission network we might know the duration of contact between individuals (edge attribute).
An edge may be either undirected or directed. In a directed edge, one vertex leads to the other, while in an undirected edge, there is no sense of ordering.
In R, the tidygraph
package can be used to manipulate graph data. It’s tbl_graph
class stores node and edge attributes in a single data structure. and ggraph
extends the usual ggplot2 syntax to graphs.
E <- data.frame(
source = c(1, 2, 3, 4, 5),
target = c(3, 3, 4, 5, 6)
)
G <- tbl_graph(edges = E)
G
# A tbl_graph: 6 nodes and 5 edges
#
# A rooted tree
#
# Node Data: 6 x 0 (active)
#
# Edge Data: 5 x 2
from to
<int> <int>
1 1 3
2 2 3
3 3 4
# … with 2 more rows
This tbl_graph
can be plotted using the code below. There are different geoms available for nodes and edges – for example, what happens if you replace geom_edge_link()
with geom_edge_arc()
?
ggraph(G, layout = 'kk') +
geom_edge_link() +
geom_node_point()
activate(edges)
.G <- G %>%
mutate(
id = row_number(),
group = id < 4
) %>%
activate(edges) %>%
mutate(width = runif(n()))
G
# A tbl_graph: 6 nodes and 5 edges
#
# A rooted tree
#
# Edge Data: 5 x 3 (active)
from to width
<int> <int> <dbl>
1 1 3 0.609
2 2 3 0.218
3 3 4 0.667
4 4 5 0.449
5 5 6 0.379
#
# Node Data: 6 x 2
id group
<int> <lgl>
1 1 TRUE
2 2 TRUE
3 3 TRUE
# … with 3 more rows
Now we can visualize these derived attributes using an aesthetic mapping within the geom_edge_link
and geom_node_point
geoms.
ggraph(G, layout = "kk") +
geom_edge_link(aes(width = width)) +
geom_node_label(aes(label = id))
What types of data that are amenable to representation by networks or trees? What visual comparisons do networks and trees facilitate?
Our initial examples suggest that trees and networks can be used to represent either physical interactions or conceptual relationships. Typical tasks include,
By “searching for groupings,” we mean finding clusters of nodes that are highly interconnected, but which have few links outside the cluster. This kind of modular structure might lend itself to deeper investigation within each of the clusters.
By “following paths,” we mean tracing the paths out from a particular node, to see which other nodes it is close to.
“Isolating key nodes” is a more fuzzy concept, usually referring to the task of finding nodes that are exceptional in some way. For example, it’s often interesting to find nodes with many more connections than others, or which link otherwise isolated clusters.