Research

Project Siteseeing

(For a full list of publications and codes see below or go to Google Scholar)

Imbalance Graph Classification via Graph Neural Network on Graph of Graphs

Graph Neural Networks (GNNs) have achieved unprecedented success in learning graph representations to identify categorical labels of graphs. However, most existing graph classification problems with GNNs follow a balanced data splitting protocol, which is misaligned with many real-world scenarios in which some classes have much fewer labels than others. Directly training GNNs under this imbalanced situation may lead to sub-optimal representations of graphs in minority classes, and compromise the overall performance of downstream classification, which signifies the importance of developing effective GNNs for handling imbalanced graph classification. Existing methods are either tailored for non-graph structured data or designed specifically for imbalance node classification while few focus on imbalance graph classification. To this end, we introduce a novel framework, Graph-of-Graph Neural Networks, which alleviates the graph imbalance issue by deriving extra supervision globally from neighboring graphs and locally from graphs themselves.

Yu Wang, Yuying Zhao, Neil Shah, Tyler Derr

Arxiv

Generating Synthetic Systems of Interdependent Critical Infrastructure Networks

The lack of data on critical infrastructure systems has hindered the research progress in modeling and optimizing the system performance. This work develops a method for generating Synthetic Interdependent Critical Infrastructure Networks (SICIN) using simulation and non-linear optimization techniques.

Yu Wang, Jinzhu Yu, Hiba Baroud

Arxiv

Distance-wise Prototypical Graph Neural Network for Imbalanced Node Classification

Recent years have witnessed the significant success of applying graph neural networks (GNNs) in learning effective node representations for classification. However, current GNNs are mostly built under the balanced data-splitting, which is inconsistent with many real-world networks where the number of training nodes can be extremely imbalanced among the classes. Thus, directly utilizing current GNNs on imbalanced data would generate coarse representations of nodes in minority classes and ultimately compromise the classification performance. This therefore portends the importance of developing effective GNNs for handling imbalanced graph data. In this work, we propose a novel Distance-wise Prototypical Graph Neural Network (DPGNN), which proposes a class prototype-driven training to balance the training loss between majority and minority classes and then leverages distance metric learning to differentiate the contributions of different dimensions of representations and fully encode the relative position of each node to each class prototype. Moreover, we design a new imbalanced label propagation mechanism to derive extra supervision from unlabeled nodes and employ self-supervised learning to smooth representations of adjacent nodes while separating inter-class prototypes.

Yu Wang, Charu Aggarwal, Tyler Derr

Arxiv

Tree Decomposed Graph Neural Network

Iterative propagation restricts the information of higher-layer neighborhoods to being transported through and first fused with the lower-layer neighborhoods’, which unavoidably results in feature smoothing between neighborhoods in different layers and can thus compromise the performance. Furthermore, most deep GNNs only recognize the importance of incorporating higher-layer neighborhoods while yet to fully explore the importance of multi-hop dependency within the context of different layer neighborhoods in learning better representations. In this work, we first theoretically analyze the feature smoothing between neighborhoods in different layers and empirically demonstrate the variance of the homophily level across neighborhoods at different layers. Then, motivated by these analyses, we propose a tree decomposition method to disentangle neighborhoods in different layers to help alleviate feature smoothing among these layers. Moreover, we capture and maintain the importance of multi-hop dependency via graph diffusion within our tree decomposition formulation to construct Tree Decomposed Graph Neural Network (TDGNN), which can flexibly incorporate information from large receptive fields and utilizing the multi-hop dependency.

Yu Wang, Tyler Derr

ACM CIKM 2021

Graph Neural Networks: Self-supervised Learning

Although deep learning has achieved state-of-the-art performance across numerous domains, these models generally require large annotated datasets to reach their full potential and avoid overfitting. However, obtaining such datasets can have high associated costs or even be impossible to procure. Self-supervised learning (SSL) seeks to create and utilize specific pretext tasks on unlabeled data to aid in alleviating this fundamental limitation of deep learning models. Although initially applied in the image and text domains, recent interest has been in leveraging SSL in the graph domain to improve the performance of graph neural networks (GNNs). For node-level tasks, GNNs can inherently incorporate unlabeled node data through the neighborhood aggregation unlike in the image or text domains; but they can still benefit by applying novel pretext tasks to encode richer information and numerous such methods have recently been developed. For GNNs solving graph-level tasks, applying SSL methods is more aligned with other traditional domains, but still presents unique challenges and has been the focus of a few works. In this chapter, we summarize recent developments in applying SSL to GNNs categorizing them via the different training strategies and types of data used to construct their pretext tasks, and finally discuss open challenges for future directions.

Yu Wang, Wei Jin, Tyler Derr

Springer 2021

A Data-Integration Analysis on Road Emissions and Traffic Patterns

Understanding human activities and urban mobility patterns is key to solving many urban issues such as congestion and emissions. With the abundant data sets available at different levels of fidelity, one of the main challenges is the sparsity and heterogeneity of data sources. The integration of such data sources is essential to better inform system design and community-level strategies. In this paper, we incorporate a variety of data sources including land use, vehicle emissions and building footprint to comprehensively visualize and analyze traffic patterns in the Chicago Loop area. We first implement and compare three different nearest-neighbor-search algorithms to determine building occupancy assignment, and then perform a spatial-temporal correlation analysis of vehicle emissions focusing on factors such as land use, public transit and demographic. Lastly, we discuss the traffic characteristics from data analysis, such as traffic congestion formation and rush hours etc.

Ao Qu, Yu Wang, Yue Hu, Yanbing Wang, Hiba Baroud

Smoky Mountains Computational Sciences and Engineering Conference 2020, Springer, Cham

 

Full list of publications

Imbalance Graph Classification via Graph Neural Network on Graph of Graphs
Yu Wang, Yuying Zhao, Neil Shah, Tyler Derr
Arxiv

Generating Synthetic Systems of Interdependent Critical Infrastructure Networks
Yu Wang, Jinzhu Yu, Hiba Baroud
Arxiv

Distance-wise Prototypical Graph Neural Network for Imbalanced Node Classification
Yu Wang, Charu Aggarwal, Tyler Derr
Arxiv

Tree Decomposed Graph Neural Network
Yu Wang, Tyler Derr
ACM CIKM 2021

Graph Neural Networks: Self-supervised Learning
Yu Wang, Wei Jin, Tyler Derr
Springer 2021

A Data-Integration Analysis on Road Emissions and Traffic Patterns
Ao Qu, Yu Wang, Yue Hu, Yanbing Wang, Hiba Baroud
Smoky Mountains Computational Sciences and Engineering Conference 2020, Springer, Cham