Show The Graduate Center Menu

Structure, Geometry, Topology in Machine Lrng

Course Rationale


In many applications, we encounter very complex data such as text,

images, videos, etc. Classic parametric or nonparametric models are

limited in these cases. Powerful tools targeting the underlying structures

have been developed. This course introduces the theoretical foundations

and applications of these tools. A central theme is a unique structural

perspective of different models in modern machine learning.


Course Description


This graduate-level course introduces tools to model the underlying

structures of data. A probabilistic graphical model captures the pairwise

or even higher order relations between variables. It has been widely used

in applications such as computer vision, natural language processing, etc.

It is also closely related to deep learning. The classic manifold learning

explores even more global information, that is, the underlying geometry

of the data. In topology data analysis, topological information of the data

is extracted and used. We also briefly introduce how the geometry and

topology of high dimensional data can be visualized for data exploration.

Students are expected to have a solid background in the analysis of

algorithms, discrete mathematics, and elementary probability. Basic

knowledge in machine learning is preferred.


Topic List


Topics include but are not limited to:

• An introduction to different structures: structures of the data (feature

space), structures of the label space, and structures of the parameter

space in training.

• Probabilistic graphical models

      o Markov random field (undirected)

      o Bayesian network (directed)

• Structured learning and prediction

      o Conditional random field

      o Inference and training of CRF

      o Structured SVM

      o Diverse prediction

• Hidden variables/layers

     o restricted Boltzmann machine

     o deep belief network

     o geometry of the loss function

• Geometry of the data

    o Manifold learning

    o Laplacian eigenmaps and diffusion maps

• Topology data analysis

    o A brief introduction of simplicial homology

    o Computation of topological structures

    o Persistent homology

    o Applications

• Visualizing the data: PCA, t-sne, topological mapper, etc


Learning Objectives


Students who successfully complete this course will be able to:

• Have a principled understanding of probabilistic graphical models,

CRF, and structured SVM.

• Apply these models in practical problems arise in applications such

as computer vision, NLP, etc.

• Master a geometric perspective in prediction and training.

• Understand the fundamental challenge in training complex models

with non-convex loss, e.g., deep learning models.

• Understand the basic ideas of geometric methods such as manifold

learning, Laplacian eigenmaps, etc.

• Understand the basic ideas of topology data analysis



• A midterm programming assignment (20%)

• A middle term exam (20%)

• A presentation (20%) (in person or in group, depending on the

class size): read and present a few papers of a selected topic.

• A final project (40%): a research project related to tools learned in

class. Requires a proposal, a final project report, and maybe a short