The University of Sydney

School of Mathematics and Statistics

Computer Project

MATH2070/2970: Optimisation and Financial Mathematics Semester 2, 2020

Web Page: http://www.maths.usyd.edu.au/u/IM/MATH2070/

Lecturer: Chunxi Jiao, Desmond Ng and Nicholas James (Computer Lab Designer)

Due on 11.59pm Friday 20th November via Turnitin.

Late assignments are not accepted without prior arrangement well before the deadline!

You must attach a scanned copy of the signed cover-sheet to the front of your assignment (see

over)!

This is mostly a computational project so you must submit all computer programs with your

project formulations, descriptions and outputs. Assessment will be based on: accuracy,

programming and presentation.

MATH2070: Do all questions except Questions x for x ≥ 7.

MATH2970: Do all questions.

This assignment involves analyzing real country index market data downloaded from Bloomberg.

The file ‘Country Indices.xlsx’ which you can download from Ed Workpaces, contains the daily closing

prices of equity indices of 20 countries in the worksheet named ‘Bloomberg Values’. Only use the data

contained in this worksheet. We draw your attention to the following points relating to the data:

(a) Prices are recorded on a (business)-daily basis between 3/01/2000 – 8/10/2020 where dates are

formatted using the DD/MM/YYYY convention.

(b) The price of each index is denominated in that country’s currency.

(c) This price data includes five (5) periods of interest:

(i) The Global Financial Crisis (GFC). We roughly identify the GFC in this data as the period

from 03/01/2007 – 31/05/2010, with the peak of the GFC (GFC Peak) as the period from

02/09/2008 – 01/06/2009.

(ii) The ongoing COVID-19 global pandemic (COVID-19) which began on 11/03/2020 according

to the World Health Organization. Since the pandemic is ongoing we shall use the 31/08/2020

as the data cutoff point. Thus, for this assignment we shall identify COVID-19 in this data

as the period from 11/03/2020 – 31/08/2020, with the peak of the COVID-19 (COVID-19

Peak) as the period from 11/03/2020 – 29/05/2020.

(iii) An interim market period (INTERIM) between post-GFC and pre-COVID-19. We shall

identify this period in the data as as the period from 01/06/2010 – 10/03/2020.

There are two further items to note with this data:

(1) There is no data for the China equity index ‘SHSZ300’ prior to 04/01/2002.

(2) There is no data for the Mexico equity index ‘FTBIVA’ prior to 19/09/2003.

Both (1) and (2) do not affect this assignment since they are outside our scope of interest i.e., (c)(i) –

(c)(iii).

Covariance and Correlation

1. Import the data into ipython as shown in computer labs. This question will investigate the

correlation between the return rates of the country indices over each of the five periods of

Copyright c 2020 The University of Sydney 1

interest i.e., GFC, GFC Peak, INTERIM, COVID-19 and COVID-19 Peak. There are several

choices when analyzing return rate data. A commonly used variable is the logarithmic change

of price or the so called log return rate: Let Yti be the price at time t of the i-th country’s index

for i = 1, 2, . . . , 20, then consider the log return rate (w.r.t. the natural base) given by

ξti = log Yti - log Yti-1, for t ≥ 1.

(i) Justify the use of the log return rate. What are the advantages of using it?

(ii) Due to the different currency denominations pricing each country’s index, in order to analyze across indices we must start each index from a common starting point. Thus, for the

entire period of interest i.e., 03/01/2007 – 31/08/2020 rebase each country’s index to start

at 100. Then plot on a single graph each country’s rebased index values for the entire period

of interest. Use the following colours when plotting each period of interest: GFC (Blue),

GFC Peak (Yellow), INTERIM (Green), COVID-19 (Orange), COVID-19 Peak (Red).

Where periods overlap, plot showing the most number of colours. For e.g., since GFC

Peak sits within GFC the plot colours would be, Blue – Yellow – Blue.

(iii) Construct and visualize the correlation matrices for the five periods GFC, GFC Peak,

INTERIM, COVID-19 and COVID-19 Peak. Comment on how the correlations between

country indices change over these five periods.

(iv) Plot the histogram of the correlation coefficients ρij for 1 ≤ i, j ≤ 20 for the five periods

GFC, GFC Peak, INTERIM, COVID-19 and COVID-19 Peak. Comment on your results.

Portfolio Theory In this question using your results from Question 1. you will construct the

optimal portfolio P∗, the Minimum Variance Frontier (MVF), the Efficient Frontier (EF) and consider

types of investors during the different market periods.

2. For the INTERIM period only i.e., from 01/06/2010 – 10/03/2020, carry out the following

computational tasks to compute the optimal portfolio P∗, consisting of the 20 country indices

for an agent who wishes to invest $1,000,000 with a risk-aversion coefficient of t = 0.20.

(i) Compute the dollar amount invested in each country’s index and obtain the corresponding

expected return, µ∗ and risk, σ∗ of the optimal portfolio, P∗.

(ii) Compute the µσ-plane graphical representation and include the following, all on the same

graphical plot:

(a) All 20 country indices; and

(b) The Minimum Variance Frontier (MVF) and Efficient Frontier (EF). When displaying

your plot, use a t-range of |t| ≤ 0.40 for displaying your plot; and

(c) Generate a plot of n = 1000 random feasible portfolios with individual country index weights satisfying |xni| ≤ 20 (for each of the i = 1, 2, ..., 20 country indices) and

σn ≤ 0.10 for n = 1, 2, . . . , 1000.

You might notice that the random points occupy some region well-separated from the

Minimum Variance Frontier (MVF) - comment on this observation and explain why

this occurs. An explanation for this observation is a/the major part of the question; and

(d) Plot the indifference curve of an investor with t = 0.20 and their optimal portfolio P∗.

3. (i) For the INTERIM period from 01/06/2010 – 10/03/2020, determine which investors short

sell in a global investment market consisting only of the 20 country indices and which indices

they short sell. Are there any country indices which no investors short sell or which all

investors will short sell?

2

(ii) Repeat question 3(i) for the COVID-19 period from 11/03/2020 – 31/08/2020. Offer an

explanation for any differences observed.

For questions 4. – 6. consider only the INTERIM period from 01/06/2010 – 10/03/2020.

4. Adding a Riskless Cash Fund and Constructing the Market Portfolio: Consider the

position of an investor from the United States (US) investing in each country’s index. In addition

to the country indices available for investment, a US riskless cash fund P0 is also available to

this investor. The risk free interest rate on P0 was r0 = 0.05 before the GFC and was lowered

to r0 = 0.0025 in December 2008, for both lending and borrowing. Assume that r0 = 0.0025

remains constant over the INTERIM period.

(i) Obtain the investor’s new allocation of their investment to the 21 available assets i.e., the

20 country indices plus a riskless cash fund, P0. State clearly the investor’s position in the

riskless cash fund.

(ii) Describe in detail the Capital Market Line and the tangency portfolio. What can you say

about the tangency portfolio? Explain your result.

(iii) Using the Gross Domestic Product (GDP) of each country devise a method of computing

the market portfolio. Clearly describe and explain your methodology including the data

sources used.

Note: It is strongly suggested that data sources from reputable public institutions and

organisations be used to ensure the accuracy and correctness of results. GDP data is a key

economic figure for nation states and is publicly available and widely reported. Suggested

data sources include the IMF, World Bank, The Economist, OECD etc.

Capital Market Theory

5. The Capital Market Line: Generate a new µσ-plane graphical plot showing the riskless cash

fund P0, tangency portfolio, market portfolio and the Capital Market Line relative to the risky

EF. Calculate the investor’s new optimal portfolio. If the original indices have a net worth of

$100 million, estimate (to the nearest $0.1 million) the total value of each index.

6. The Security Market Line: Compute the β’s of all 20 indices and any other relevant assets

in this project and clearly display them on the Security Market Line. Comment on the result

identifying the country indices with a β > 1 and those with a β < 1 and describe what action

Portfolio Theory would recommend an investor to take.

Principal Components Analysis (PCA)

Often when dealing with very large data sets i.e., data sets with high dimensionality and many obervations, we want to know what are the underlying factors which produced that data in order to build

predictive models to predict future observations. Furthermore, we want to know which subset of the

underlying factors are responsible for the majority of the variation that we see in the data. This is

so that, instead of building a model with many variables or factors accounting for every observation,

we seek to reduce the dimensionality of the data to a subset of underlying factors and build models

using those factors which explain the essence or most of the observations that we see. This is what

we aim to do in this section with Principal Components Analysis or PCA, where you will implement

PCA from scratch i.e., without using built-in function calls.

In practical applications the singular value decomposition is the main tool for performing PCA

given by the following theorem:

3

The Singular Value Decomposition (SVD)

Let A be an m × n matrix with rank r. Then, there exists an m × n matrix Σ given by

Σ = �D0 0 0� ,

where D is an r × r matrix with r ≤ min{m, n} (if r = m or n or both, some or all of the zero

matrices do not appear) for which the diagonal entries in D are the first r singular values of A,

σ1 ≥ σ2 ≥ ... ≥ σr > 0, and there exist an m × m orthogonal matrix U and an n × n orthogonal matrix

V such that

A = UΣV ⊤.

The singular values of A are the square roots of the eigenvalues of A⊤A denoted by σ1, σ2, ..., σn,

arranged in decreasing order. That is, σi = √λi with σ1 ≥ σ2 ≥ ... ≥ σn > 0 for 1 ≤ i ≤ n.

7. (i) Prove the following theorem:

Suppose {v1, v2, ..., vn} is an orthonormal basis of Rn consisting of eigenvectors of A⊤A,

arranged so that the corresponding eigenvalues of A⊤A satisfy λ1 ≥ λ2 ≥ ... ≥ λn, and

suppose A has r nonzero singular values. Then {Av1, Av2, ..., Avr} is an orthogonal basis

for the column space, Col A and rank A = r.

(ii) Prove the singular value decomposition using the previous result.

Suppose now that you are given an N × t matrix of observational data

[X1 X2 . . . Xt]

where each Xk is a N × 1 observation vector for k = 1, 2, . . . , t. To prepare the data for PCA we first

put it in mean-deviation form i.e., our data should have a sample mean of zero. Let the sample mean

be given by

X¯ = 1

t

(X1 + X2 + . . . + Xt)

and for k = 1, 2, . . . , t let

Xˆ k = Xk - X¯ .

Then the following N × t matrix G given by

G = [Xˆ 1 Xˆ 2 . . . Xˆ t],

has columns of observations in mean-deviation form. Lastly, define the N × N sample covariance

matrix by

S = 1

t - 1

GG⊤.

Let X be a vector varying over the set of observation vectors and denote the coordinates of X by

x1, x2, . . . , xN. Then, the goal of PCA is to find an orthogonal N × N matrix P = [u1 u2 . . . uN]

that determines a change of variable, X = PY, or

x1

x2

...

xN

= [u1 u2 . . . uN]

y1

y2

...

yN

with the property that the new variables y1, y2, . . . , yN are uncorrelated and are arranged in order of

decreasing variance. The unit eigenvectors u1, u2, . . . , uN of the covariance matrix S are called the

4

principal components of the data matrix of observations. The new uncorrelated variables y1, y2, . . . , yN

can be determined using

X = P Y ⇒ Y = P -1X = P ⊤X.

You will now apply PCA to the five periods: GFC, GFC Peak, INTERIM, COVID-19 and COVID-19

Peak.

8. (i) Apply PCA to the country indices data for each of the five periods by setting

A = √t1- 1G⊤

and performing SVD without using a direct SVD function call. Show the full steps, code,

any workings and outputs and present the following for each of the five periods:

• The sample covariance matrix S.

• The N = 20 eigenvalues of S.

• The principal components of the data.

(ii) Plot the ordered spectrum λi for i = 1, 2, . . . , 20 for the five periods. What observations

can you make about how much of the total variance is explained by a subset of the principal

components for each period?

(iii) Determine the minimum number of principal components needed to explain at least 95%

of the variation in the data observed for the five periods. Offer an explanation for the

differences (if any), in the minimum number of prinicpal components required over each

period. What conclusions can you make about periods of crisis i.e., the GFC and COVID-19

periods compared to periods of non-crisis i.e., INTERIM?

(iv) Compare the percentage of variation that the largest principal component accounts for,

during the five periods. Do countries behave more similarly during market crises or stable

market conditions? Do you think this would be more or less pronounced if we were to look

at portfolios of equities? Refer to systematic and unsystematic risks in your response.

(v) During which market crisis would you have seen more benefits in portfolio diversification?

Justify your answer by referring to the principal components seen in the different crisis

periods.

5

The University of Sydney

School of Mathematics and Statistics

Assignment Cover Sheet

MATH2070/2970: Optimisation and Financial Mathematics Semester 2, 2020

Web Page: http://www.maths.usyd.edu.au/u/IM/MATH2070/

Lecturer: Chunxi Jiao, Desmond Ng and Nicholas James (Computer Lab Designer)

Family Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Given Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Legitimate cooperation between students on assignments is encouraged, since it can be a real aid to

understanding. It is legitimate for students to discuss assignment questions at a general level, provided

everybody involved makes some contribution. However, students must produce their own individual

written solutions. Copying someone else’s work is plagiarism, and is unacceptable.

I certify that:

• I have read and understood the University of Sydney Student Plagiarism: Coursework Policy

and Procedure at

http://sydney.edu.au/policies/showdoc.aspx?recnum=PDOC2012/254&RendNum=0.

• this assignment is all my own work, and that no part of this assignment has been copied from

another person.

• I have not allowed my work to be copied by another person.

Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The University may impose severe penalties for plagiarism

This part to be completed by the marker:

Grand total out of 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6