Online Event - CIRCL: Ekaterina Levitskaya (Computational Linguistics)

MAR 24, 2020 | 6:30 PM TO 8:30 PM



The Graduate Center
365 Fifth Avenue




March 24, 2020: 6:30 PM-8:30 PM




Ekaterina Levitskaya will be presenting on a topic in Computational Linguistics.

Applied Computational Linguistics Approaches Using Grant and Patent Text Data


The UMETRICS database contains rich information on grants from federal and non-federal sponsored research for 32 U.S. universities over a 15-year period, however, it doesn’t contain information on research fields. Using a supervised machine learning approach (text classification task), we train an ensemble model on the labeled publications in the Web of Science database in order to predict research fields for the UMETRICS grants text data.

The second project touches upon exploratory analysis of patent text data, using publicly available USPTO data. We examine different measures of linguistic complexity in the patent text data, in order to understand whether any significant differences can be found between texts of breakthrough vs. non-breakthrough patents, depending on a patent field or a gender composition of a team of inventors.

