Predicting New Collaborations in Academic Citation Networks of IEEE and ACM Conferences

I. Bukhari, M. U. Ilyas, M. Raja, Saad Saleh, M. M. Khan, A. M. Qamar, M. Z. Shafiq, A. X. Liu, H. Radha

Research output: Chapter in Book/Report/Conference proceedingConference contributionProfessional

20 Downloads (Pure)

Abstract

In this paper we study the time evolution of academic collaboration networks by predicting the appearance of new links between authors. The accurate prediction of new collaborations between members of a collaboration network can help accelerate the realization of new synergies, foster innovation, and raise productivity. For this study, the authors collected a large data set of publications from 630 conferences of the IEEE and ACM of more than 257, 000 authors, 61, 000 papers, capturing more than 818, 000 collaborations spanning a period of 10 years. The data set is rich in semantic data that allows exploration of many features that were not considered in previous approaches. We considered a comprehensive set of 98 features, and after processing identified eight features as significant. Most significantly, we identified two new features as most significant predictors of future collaborations; 1) the number of common title words, and 2) number of common references in two authors’ papers. The link prediction problem is formulated as a binary classification problem, and three different supervised learning algorithms are evaluated, i.e. Na¨ıve Bayes, C4.5 decision tree and Support Vector Machines. Extensive efforts are made to ensure complete spatial isolation of information used in training and test instances, which to the authors’ best knowledge is unprecedented. Results were validated using a modified form of the classic 10-fold cross validation (the change was necessitated by the way training, and test instances were separated). The Support Vector Machine classifier performed the best among tested approaches, and correctly classified on average more than 80% of test instances and had a receiver operating curve (ROC) area of greater than 0.80.
Original languageEnglish
Title of host publicationProceedings of The Sixth IEEE/ASE International Conference on Social Computing
PublisherIEEE
Pages1-9
ISBN (Electronic)978-1-62561-000-3
Publication statusPublished - 2014
Externally publishedYes
EventSocialCom 2014: The 6th International Conference on Social Computing - Stanford University, USA, Stanford, CA, United States
Duration: 27-May-201431-May-2014

Conference

ConferenceSocialCom 2014
Country/TerritoryUnited States
CityStanford, CA
Period27/05/201431/05/2014

Cite this