Semantic Interpretation of Social Content

Much of the work in the information processing and retrieval literature for mining social content is focused on probabilistic frequency-based techniques that are sensitive to the lexico-syntactic structures used by the users. Such methods are agnostic to the semantics of the content and rather look for recurrent discriminative patterns. Our work strongly advocates for the need to develop methods that are cognizant of the semantics of the content that are being processed. To this end, we have developed techniques that are able to automatically provide knowledge and semantic grounding for user-generated textual content. This includes domain-independent semantic entity linking techniques that are able to ground textual content in well-established knowledge graphs such as DBpedia (with Wikipedia’s 5+ million entities) and Unified Medical Language System ([UMLS] with 3+ million entities).

The development of such semantic linking techniques has enabled  (a) collaboration with Women’s College Hospital where we have been investigating how semantics of biomedical social data can be analyzed in contrast to peer-reviewed literature, and (b) joint work with St. Michael’s Hospital, where we are improving knowledge synthesis processes based on the semantic interpretation of medical literature.

We have also systematically extended a strong recurrent model for mappings UMLS and DBpedia entities onto each other. This is the first work to map these knowledge graphs on such a large scale. Furthermore, and with the purpose of integrating semantics within social content, we have explored knowledge base-agnostic entity linking methods. By mining senses from text rather than by searching an existing knowledge graph, this type of entity linking reduces the disambiguation search space. Additionally, we have worked on implicit entity linking techniques within an ad hoc retrieval framework to identify the central concept of a short, informal, user-generated text without an explicit clue, e.g., an implicit entity linking model would interpret a tweet saying “I wish my phone wasn’t bent” as referencing an iPhone 6. Recently, we have studied how different features within the context of a learn to rank framework can be used to effectively perform implicit entity linking. Implicit entity linking enables access to information about implied subjects that are missing an explicit reference to be legible, e.g., 40% of tweets about books contain implicit references but do not explicitly mention the book itself. Finally, we have built techniques that perform open information extraction for relation identification in textual content based on both grammatical clause patterns and feature-enhanced matrix factorization. Such work enables the extraction of semantically meaningful relations from textual content such those in social user-generated textual content.

Sample Publications

Cuzzola, J., Jovanovic, and E. Bagheri (2017). “RysannMD: A biomedical semantic annotator balancing speed and accuracy”. Journal of Biomedical Informatics, 71, 91–109, IF: 2.950.

Vigod, S., Bagheri, F. Zarrinkalam, H. Brown, M. Mamdani, and J. G. Ray (2018). “Online social network response to studies on antidepressant use in pregnancy”. Journal of Psychosomatic Research 106(3), 70–72, IF: 2.722.

Pham, B., Bagheri, P. Rios, A. Pourmasoumi, R. C. Robson, J. Hwee, W. Isaranuwatchai, M. P. Nazia Darvesh, and A. Tricco (2018). “Improving the Conduct of Systematic Reviews: A Process Mining Perspective”. Journal of Clinical Epidemiology, 103: 101-111, IF: 4.65.

Cuzzola, J., Bagheri, and J. Jovanovic (2018). “UMLS to DBPedia Link Discovery Through Circular Resolution”. Journal of the American Medical Informatics Association, 25(7), 819–826, IF: 4.292.

Feng, Y. *, F. Zarrinkalam, Bagheri, H. Fani, and F. Al-Obeidat (December 2018). “Entity Linking of Tweets based on Dominant Entity Candidates”. Social Network Analysis and Mining, 8(1): 46 1-16. IF: 1.61.

Hosseini, H., Nguyen, T. T., Wu, J.*, Ebrahim Bagheri (2019), “Implicit Entity Linking in Tweets: An Ad-hoc Retrieval Approach”. Applied Ontology 14(4): 451-477. IF: 0.75.

Hosseini, H., and Bagheri (2020). “From Explicit to Implicit Entity Linking: A Learn to Rank Framework”. In: Advances in Artificial Intelligence – 33rd Canadian Conference on Artificial Intelligence, (Canadian AI).

Vo, D.-T. and Bagheri (July 2018). “Self-Training on Refined Clause Patterns for Relation Extraction”. Information Processing and Management, 54(4): 686–706, IF: 3.892.

Ebrahim Bagheri, John Cuzzola, Zoran Jeremic, and Reza Bashash, Method and System of Intelligent Generation of Structured Data and Object Discovery from the Web using Text, Images and Video and other Data, US Patent App. 14/892,976, 21 May 2014.