DM561 - Linear Algebra with Applications

Sheet 5

Task 1

Write a function that accepts an $m \times n$ matrix $A$ of rank $n$ and a vector $\mathbf{b}$ of length $n$. Use the QR decomposition to solve the normal equations corresponding to $A\mathbf{x} = \mathbf{b}$.

You may use either SciPy’s QR routine or write your own QR routine. In addition, you may use la.solve_triangular(), SciPy’s optimized routine for solving triangular systems. Compare the results of your function with the hand-computed results on the small example generated by the data set $\{(0,1),(2,1),(2,2)\}$:

Task 2

The file housing.npy (Numpy provides methods for saving and loading arrays. This file has been saved with np.save() and the array can be loaded with np.load("housing.npy")) contains the purchase-only housing price index, a measure of how housing prices are changing, for the United States from 2000 to 2010. Each row in the array is a separate measurement; the columns are the year and the price index, in that order. To avoid large numerical computations, the year measurements start at 0 instead of 2000.

Find the least squares line that relates the year to the housing price index (i.e., let year be the $x$-axis and index the $y$-axis).

Task 3

The data in housing.npy is nonlinear, and might be better fit by a polynomial than a line.

Write a function that uses the procedure presented in the slides to calculate the polynomials of degree $3$, $6$, $9$, and $12$ that best fit the data. Plot the original data points and each least squares polynomial together in individual subplots. (Hint: define a separate, refined domain with np.linspace() and use this domain to smoothly plot the polynomials.)

Instead of using the procedure developed in Task 1 to solve the normal equations, you may use SciPy’s least squares routine, scipy.linalg.lstsq().

>>> from scipy import linalg as la

# Define A and b appropriately. Get help with the `np.vander` function
# for the construction of A

# Solve the normal equations using SciPy's least squares routine.
# The least squares solution is the first of four return values.
>>> x = la.lstsq(A, b)[0]

Compare your results to np.polyfit(). This function receives an array of $x$ values, an array of $y$ values, and an integer for the polynomial degree, and returns the coefficients of the best fit polynomial of that degree.

Task 4

Consider a case of multiple regression, that is, a prediction task with more than one predictor. Construct a set of training data randomly generated with Numpy functions to create matrices. Use two input variables $x_1$ and $x_2$ and one output variable $y$.

For example, you can take $x_1\in [0,1]$ $x_2\in [0,1]$ and compute the corresponding $y$ values with an unknown target function:

where $\epsilon$ is an exponentially distributed noise with mean 1.

Fit the model

using the routines developed in the previous tasks, preferably the ones from Task 1, alternatively scipy.linalg.lstsq().

Use the model found to calculate the training error, that is the sum of squared errors (or the root of it) on the same set of data used for the estimation of the model coefficients.

Plot the points and the surface of the fitted model in a 3D plot.

Repeat the operations above with the model

Which of the two models is the best?

Task 5 (Optional)

For this task you need to study the slides on “Fitting a circle” that were not presented in class.

The general equation for an ellipse is [ax^2 + bx + cxy + dy + ey^2 = 1.] Write a function that calculates the parameters for the ellipse that best fits the data in the file ellipse.npy. Plot the original data points and the ellipse together, using the following function to plot the ellipse.

def plot_ellipse(a, b, c, d, e):
    """Plot an ellipse of the form ax^2 + bx + cxy + dy + ey^2 = 1."""
    theta = np.linspace(0, 2*np.pi, 200)
    cos_t, sin_t = np.cos(theta), np.sin(theta)
    A = a*(cos_t**2) + c*cos_t*sin_t + e*(sin_t**2)
    B = b*cos_t + d*sin_t
    r = (-B + np.sqrt(B**2 + 4*A)) / (2*A)
    plt.plot(r*cos_t, r*sin_t, lw=2)
    plt.gca().set_aspect("equal", "datalim")