Home > News & Events > News > Developing a Better Way to Integrate Data from Different Sources

Developing a Better Way to Integrate Data from Different Sources

Nina Welding • DATE: November 16, 2017

Society has always used materials as the “building blocks” for civilization. Roads, homes, electronics and medicines — everything was fashioned from available substances. Today researchers are creating totally new materials for a variety of applications. More precisely, they are designing materials with specific properties to address distinct tasks. Engineers and scientists are not only creating these novel materials, but they are also documenting their processes and the results, along with related patents and publications.

In fact, the field of materials science [and the accompanying literature] has grown exponentially in the last few decades. The problem with this incredible progress is that the information being shared by the materials science community is stored across multiple sources in various formats. There are no standard protocols to manage the data or evaluate the metrics, making it difficult to establish connections that could suggest new uses for a material or unique relationships between different types of materials in order to keep making progress.

A solution, according to Pingjie Tang, a Ph.D. candidate in the Department of Computer Science and Engineering; Nitesh V. Chawla, the Frank M. Freimann Professor of Computer Science and Engineering and director of the Interdisciplinary Center for Network Science and Applications (iCeNSA); and their collaborators Jed Pitera and Dmitry Zubarev at the IBM Research -Almaden, is a heterogeneous materials information network (HMIN). They discussed their data model in “Materials Science Literature-Patent Relevance Search: A Heterogeneous Network Analysis Approach,” which won the Best Application Paper award during IEEE DSAA2017, Oct. 19-21, in Tokyo.

“The biggest impact of our research,” says Tang, “is that we provide a novel and effective paradigm to organize and analyze data for the material science community, and this provides a natural structure to save interconnected material data and integrate data from different sources.” Even though the HMIN network they proposed is constructed to address a specific task, the team believes it can be applied to other domains and databases to create more effective information correlation and retrieval.

A research assistant in iCeNSA, Tang has also served as an intern at IBM Research-Almaden working on various projects, ranging from the HMIN network to using deep learning techniques to predict and detail categories for businesses across YELP, to find proper classification codes for patents.  

Pitera notes that “at IBM Research we really value and encourage collaborative projects with universities and other research institutions. We benefit not just from the scientific collaboration but also from the energy and new ideas that students like Pingjie bring to the lab.”

Chawla is Tang’s adviser. Since joining the Notre Dame faculty in 2007, he has focused his efforts in making fundamental advances in machine learning, and network and data sciences, and transformative interdisciplinary applications. In addition to his other responsibilities, he serves as the director of the Data Inference Analysis and Learning Lab in the College of Engineering.