Talks and presentations

LLMs Clinical Trial Matching

December 02, 2023

Stanford Research Project, Bay Area Population Genomics Conference, Stanford University

Matching patients to clinical trials is a major hurdle in medical research. Current methods, relying on manual screening of medical records, are slow and expensive. This talk explores the potential of Large Language Models (LLMs) to automate this process.

Pensive Project

March 01, 2022

Internship Project, Vector Institute of AI, Toronto, Canada

Our project was developing and enhancing a recommendation system working with the Reddit dataset. Later, we used GitHub dataset as well to train our models. In our model, we used neural language models to enhance the performance of the vector representations.

CONTEXT-AWARE SEMANTIC TEXT MINING AND REPRESENTATION LEARNING FOR TEXT DISAMBIGUATION AND ONLINE HARASSMENT CLASSIFICATION

March 01, 2022

Research, Computer Science Department, Dalhousie University, Canada

Our contribution can be divided into two main parts; the first part focuses on the text ambiguity problem, and the second part focuses on the text classification problem, which are two related tasks in NLP. While analyzing and designing algorithms for text understanding and representation learning, we introduce algorithms to better understand the text and its exact meaning when there are different possible meanings for words present in the text. This problem has been known in NLP as a Word Sense Disambiguation (WSD) problem. In the first part of this thesis, we analyze the effect of different current available methods in text embedding on the WSD task, and based on the observations and experiments, we introduce a new method for text representation learning. In addition to general English text, to evaluate our method, we analyze the effect of our representation on Biomedical text as an application. This analysis shows how effective these embeddings are in capturing the context when we are looking to find the correct meaning behind the words in biomedical texts. We also investigate the problem of text classification in this study. Text classification is one other relevant problem in NLP to the problem of WSD. We consider a collection of tweet posts and try to classify them into two classes, one if a tweet includes harassment, and the other is the class of tweets without harassment. We apply classical machine learning approaches and show the effects and differences between them. In addition to this binary classification investigation, we focus on the first class, the tweets including harassment. We analyze the tweets and classify which type of harassment is a tweet using classical machine learning approaches, including logistic regression, Gaussian naive Bayes, decision trees, support vector machines, random forest, multi-layer perceptron, and AdaBoost in chapter five. The last chapter uses a deep learning approach, the graph convolution approach, to solve this problem. Our experiments show how effective using this deep learning method is compared to the previous classical machine learning approache

Ruler Wrapping

February 01, 2022

Research, EuroCG, Canada

In 1985 Hopcroft, Joseph and Whitesides showed it is NP-complete to decide whether a car- penter’s ruler with segments of given positive lengths can be folded into an interval of at most a given length, such that the folded hinges alternate between 180 degrees clockwise and 180 degrees counter-clockwise. At the open-problem session of 33rd Canadian Conference on Com- putational Geometry (CCCG ’21), O’Rourke proposed a natural variation of this problem called ruler wrapping, in which all folded hinges must be folded the same way. In this paper we show O’Rourke’s variation has a linear-time solution

Biomedical Word Sense Disambiguation with Contextualized Representation Learning

March 01, 2021

Research, Computer Science Department, Dalhousie University, Canada

Representation learning is an important component in solving most Natural Language Process- ing (NLP) problems, including Word Sense Disambiguation (WSD). The WSD task tries to find the best meaning in a knowledge base for a word with multiple meanings (ambiguous word). WSD methods choose this best meaning based on the context, i.e., the words around the am- biguous word in the input text document. Thus, word representations may improve the effec- tiveness of the disambiguation models if they carry useful information from the context and the knowledge base. Most of the current representation learning approaches are that they are mostly trained on the general English text and are not domain specified. In this paper, we present a novel contextual-knowledge base aware sense representation method in the biomedical domain. The novelty in our representation is the integration of the knowledge base and the context. This representation lies in a space comparable to that of contextualized word vectors, thus allowing a word occurrence to be easily linked to its meaning by applying a simple nearest neighbor ap- proach. Comparing our approach with state-of-the-art methods shows the effectiveness of our method in terms of text coherence