# Module 6: Big Data and Machine Learning

### Location

### Big Data in Finance

- What is Data science?
- Supervised and unsupervised learning
- Structured and unstructured data
- Introduction to Classification
- Bayesian Models and inference using Markov chain Monte-Carlo
- Introduction to graphical models: Bayesian networks, Markov networks, inference in graphical models
- Optimisation techniques
- Examples: Predictive analytics/trading & Pricing

### Classification, Clustering and filtering

- Classification: K-nearest neighbours, optimal Bayes classifier, naïve Bayes, LDA and QDA, reduced rank LDA, Logistic regression, Support Vector Machines
- Cluster analysis: BIRCH, Hierarchical, K-mean, Expectation-maximization, DBSCAN, OPTICS and Mean-shift
- Kalman filtering
- Examples (2 worked practical examples)

### Machine Learning & Predictive Analytics

- Regression: liner regression, bias-variance decomposition, subset selection, shrinkage methods, regression in high dimensions
- Support Vectors Machines: Classification and regression using SVM’s and kernel methods
- Dimension reduction: Principal component analysis (PCA), kernel PCA, non-negative matrix decomposition, PageRank
- Examples (2 worked examples)

### Machine Learning and Data Lab

- Sandbox: conda, environments, Python and R packages, MLib. Data sources
- Logistic regression as a classifier: loss function, transition probabilities, softmax and appropriate penalty (Ridge regression)
- Crossvalidation: samples selection and reshuffling. Precision and recall. Is the classifier random?
- Support Vector Machines: hyperplane intuition, soft vs hard margin. Choice of kernel to tackle non-linear problems
- Random Forest Classifiers: regression versions of Decision Trees and AdaBoost
- Vignettes on neural nets to predict market returns, probabilistic programming, and Markov-switching GARCH

### Co-Integration using R

- Multivariate time series analysis
- Financial time series: stationary and unit root
- Vector Autoregression, a theory-free model
- Equilibrium and Error Correction Model
- Eagle-Granger Procedure
- Cointegrating relationships and their rank
- Estimation of reduced rank regression: Johansen Procedure
- Stochastic modelling of equilibrium: Orstein-Uhlenbeck process
- Statistical arbitrage using mean reversion

### From Zero to AI

- Machine learning methodologies and techniques
- Supervised classification and prediction
- Unsupervised feature identification
- Sequence prediction and computer vision

### Statistical Methods for Data Analysis

- Learning and linear models
- Linear and multiple linear regression
- Inference
- Key assumptions: Linearity; IID random error; Independence of the predictors
- Diagnostics tests: how to troubleshoot your model
- Pitfalls in predictions: Confidence interval vs prediction interval; Selection bias; Linear locally, nonlinear globally
- Beyond linear models
- Regularization: Ridge regression and the Lasso; Cross validation

*Lecture order and content may occasionally change due to circumstances beyond our control; however this will never affect the quality of the program.*

*Lecture order and content may occasionally change due to circumstances beyond our control; however this will never affect the quality of the program.*