NSF Award Search: Award # 1149372 - CAREER: G*: A Parallel System for Efficiently Processing Large Graphs

Award Abstract # 1149372

CAREER: G*: A Parallel System for Efficiently Processing Large Graphs

NSF Org:	IIS Div Of Information & Intelligent Systems
Recipient:	RESEARCH FOUNDATION FOR THE STATE UNIVERSITY OF NEW YORK, THE
Initial Amendment Date:	January 4, 2012
Latest Amendment Date:	February 8, 2016
Award Number:	1149372
Award Instrument:	Continuing Grant
Program Manager:	Sylvia Spengler sspengle@nsf.gov �(703)292-7347 IIS �Div Of Information & Intelligent Systems CSE �Direct For Computer & Info Scie & Enginr
Start Date:	February 1, 2012
End Date:	January 31, 2019�(Estimated)
Total Intended Award Amount:	$496,648.00
Total Awarded Amount to Date:	$496,648.00
Funds Obligated to Date:	FY 2012 = $81,507.00 FY 2013 = $101,423.00 FY 2014 = $104,662.00 FY 2015 = $107,825.00 FY 2016 = $101,231.00
History of Investigator:	Jeong-Hyon Hwang (Principal Investigator) jhh@cs.albany.edu
Recipient Sponsored Research Office:	SUNY at Albany 1400 WASHINGTON AVE ALBANY NY �US �12222-0100 (518)437-4974
Sponsor Congressional District:	20
Primary Place of Performance:	SUNY at Albany NY �US �12222-0100
Primary Place of Performance Congressional District:	20
Unique Entity Identifier (UEI):	NHH3T1Z96H29
Parent UEI:	GMZUKXFDJMA9
NSF Program(s):	Info Integration & Informatics
Primary Program Source:	01001213DB�NSF RESEARCH & RELATED ACTIVIT 01001314DB�NSF RESEARCH & RELATED ACTIVIT 01001415DB�NSF RESEARCH & RELATED ACTIVIT 01001516DB�NSF RESEARCH & RELATED ACTIVIT 01001617DB�NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	1045
Program Element Code(s):	736400
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Complex networks such as human social groups, transportation networks and the World Wide Web are frequently represented as graphs. The goal of this project is to construct a new system that both conveniently and efficiently executes queries on collections of large graphs. To achieve this goal, the project develops data storage and processing techniques that can effectively use a server cluster.

The project's outcomes include (1) efficient data storage techniques that take advantage of commonalities between graphs, (2) methods that allow simple implementations of parallel graph processing solutions, (3) a framework that accelerates queries on multiple graphs by sharing computations across graphs, (4) techniques that store data on disks in a manner optimized for query execution, (5) methods that optimize query execution by appropriately distributing data over servers, and (6) techniques that mask server failures while balancing recovery speed and required cost.

This project has significant impacts on many application areas where it is critical to understand networks of various types. Examples include national security, social and political studies, transportation and marketing. Programming assignments and term projects based on this research are developed for undergraduate/graduate courses on data management systems, distributed systems and social network analysis. This project also offers research opportunities to both high school and minority students through summer programs at the University at Albany, State University of New York. The software, experimental data and research papers that result from this project will be disseminated through the project website (http://www.cs.albany.edu/~jhh/research/G_star/).

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Jonathan Muckell, Paul Olsen Jr., Jeong-Hyon Hwang, Catherine Lawson, S. S. Ravi "Compression of Trajectory Data: A Comprehensive Evaluation and New Approach" GeoInformatica , 2013

Chan-Myung Kim, Yong-Hwan Kim, Youn-Hee Han, and Jeong-Hyon Hwang "Betweenness Centrality Estimation for Social-Aware Routing in Delay-Tolerant Networks" Springer Mobile Networks and Applications (MONET) , v.21 , 2016 , p.469

Aparna Joshi, Yu Zhang, Petko Bogdanov, and Jeong-Hyon Hwang, "An Efficient System for Subgraph Discovery" 2018 IEEE International Conference on Big Data (BigData) , 2018

Alan Labouseur, Jeremy Birnbaum, Paul Olsen Jr., Sean Spillane, Jayadevan Vijayan, Wook-Shin Han, Jeong-Hyon Hwang "The G* Graph Database: Efficiently Managing Large Distributed Dynamic Graphs" Distributed and Parallel Databases (DAPD) , v.33 , 2015 , p.479

Seungkeol Kim, Dongeun Kim, Jeong-Hyon Hwang, Wook-Shin Han, Wei Chen, Hwanjo Yu "Scalable and Parallelizable Influence Maximization with Random Walk Ranking and Rank Merge Pruning" Elsevier Information Sciences (IS) , v.415 , 2017

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The goal of the project is to construct a system for efficiently storing and querying series of graphs that represent large, evolving networks at different points in time. This system can support applications where it is critical to understand a certain type of network, such as a social network, transportation network, or the World Wide Web. These applications include national security, social and political studies, transportation, and marketing.

Our early work in this project focused on developing techniques that efficiently store graphs by taking advantage of the commonalities among them and accelerate queries on multiple graphs by sharing computations across graphs. Our experiments showed that these graph storage and query processing techniques have substantial benefits over approaches that store and process each graph individually.

We also developed techniques for distributing a series of graphs over multiple servers, sharply contrasting with existing solutions that can distribute only one graph at a time. Our techniques strive to maximize the speed of graph queries by distributing each graph over an appropriate subset of servers. These techniques can also dynamically re-distribute graph data over servers to avoid performance bottlenecks and replicate data to mask server failures and improve performance.

In addition to the above techniques, we developed solutions for various types of graph queries/problems such as finding the most central node(s) in a graph, identifying a set of nodes with the highest influence in a graph, efficiently finding regions in a graph that match user-specified criteria, and estimating the centrality of each node in wireless sensor networks to enable efficient and reliable communication.

The source code of our system has been made available to the public (http://www.cs.albany.edu/~gstar). Research papers summarizing our research outcomes have been published by academic journals and conferences, and also posted on the above website. Parts of our system code have been used in projects developed for courses on database systems, distributed systems, and object-oriented programming. Three former PhD students who contributed to the project joined academia as tenure-track faculty members.

Last Modified: 05/29/2019
Modified by: Jeong-Hyon Hwang

Please report errors in award information by writing to: awardsearch@nsf.gov.