# Module 4: Data Science and Machine Learning

### An Introduction to Machine Learning

• What is mathematical modeling?
• Classic tools
• How is machine learning different?
• The pros and cons of delegating to a machine
• Simple methods (reinforcement; Unsupervised; Supervised learning)

### Data Science in Finance

• Supervised vs unsupervised learning
• Primer on loss functions and essentials you need
• Learning and linear models. Multiple linear regression diagnostics
• Generalised least squares. Mahalanobis distance method
• Dangers of overfitting. Decomposing estimation error

### Classification and Clustering

• K-Nearest Neighbours
• Logistic Classifier and explicit maximum likelihood method
• Principles of Bayesian classification, lowest possible error
• Discriminant Analysis: Linear (LDA) and Quadratic (QDA)
• K-mean clustering and hierarchical clustering

### Practical Filtering Methods

• How to specify dynamic systems (states, Markov properties)
• Weighted least squares method
• Time-varying regression estimation (Kalman Filtering)
• Applications of filtering: CAPM betas, continuous-time filtering
• Introduction to Markov Chains. Hidden Markov Models

### Machine Learning & Predictive Analytics

• Regression: liner regression, bias-variance decomposition, subset selection, shrinkage methods, regression in high dimensions
• Support Vectors Machines: Classification and regression using SVM’s and kernel methods
• Dimension reduction: Principal component analysis (PCA), kernel PCA, non-negative matrix decomposition, PageRank
• Examples (2 worked examples)

### Reinforcement Learning

• What is Reinforcement Learning
• Reinforcement Learning in terms of classical techniques for pricing derivatives
• Pricing exotic options using Reinforcement Learning

### AI Based Algo Trading Strategies Using Python

• Basic financial data analysis with Python and pandas
• Creating features and label data from financial time series for market prediction
• Application of classification algorithms from machine learning to predict market movements
• Vectorized backtesting of algorithmic trading strategies based on the predictions
• Risk analysis for the algorithmic trading strategies

### Digital Signal Processing for Finance

• Importance of Signal Processing (SP); Characterisation and classification of signals
• Discrete Time Signals
• Fourier Transforms and z-Transforms
• Continuous-time signals and sampling
• Discrete Fourier Transforms

### Co-Integration using R

• Multivariate time series analysis
• Financial time series: stationary and unit root
• Vector Autoregression, a theory-free model
• Equilibrium and Error Correction Model
• Eagle-Granger Procedure
• Cointegrating relationships and their rank
• Estimation of reduced rank regression: Johansen Procedure
• Stochastic modelling of equilibrium: Orstein-Uhlenbeck process
• Statistical arbitrage using mean reversion

### Machine Learning Lab

• Sandbox: conda, environments, Python and R packages, MLib. Data sources
• Logistic regression as a classifier: loss function, transition probabilities, softmax and appropriate penalty (Ridge regression)
• Crossvalidation: samples selection and reshuffling. Precision and recall. Is the classifier random?
• Support Vector Machines: hyperplane intuition, soft vs hard margin. Choice of kernel to tackle non-linear problems
• Random Forest Classifiers: regression versions of Decision Trees and AdaBoost
• Vignettes on neural nets to predict market returns, probabilistic programming, and Markov-switching GARCH