## Sheet 5

### Task 1

Write a function that accepts an $m \times n$ matrix $A$ of rank $n$ and a vector $\mathbf{b}$ of length $n$. Use the QR decomposition to solve the normal equations corresponding to $A\mathbf{x} = \mathbf{b}$.

You may use either SciPy’s QR routine or write your own QR routine.
In addition, you may use `la.solve_triangular()`

, SciPy’s optimized
routine for solving triangular systems. Compare the results of your
function with the hand-computed results on the small example generated
by the data set $\{(0,1),(2,1),(2,2)\}$:

### Task 2

The file `housing.npy`

(Numpy provides methods for saving and loading arrays. This file has
been saved with `np.save()`

and the array can be loaded with
`np.load("housing.npy")`

) contains the purchase-only housing price
index, a measure of how housing prices are changing, for the
United States
from 2000 to 2010. Each row in the array is a separate measurement; the
columns are the year and the price index, in that order. To avoid large
numerical computations, the year measurements start at 0 instead of
2000.

Find the least squares line that relates the year to the housing price index (i.e., let year be the $x$-axis and index the $y$-axis).

- Construct the matrix $A$ and the vector $\mathbf{b}$ described in the
slides. (Hint:
`np.vstack()`

,`np.column_stack()`

, and/or`np.ones()`

may be helpful.) - Use your function from the previous task to find the least squares solution.
- Plot the data points as a scatter plot.
- Plot the least squares line with the scatter plot.

### Task 3

The data in `housing.npy`

is nonlinear, and might be better fit by a
polynomial than a line.

Write a function that uses the procedure presented in the slides to
calculate the polynomials of degree $3$, $6$, $9$, and $12$ that best
fit the data. Plot the original data points and each least squares
polynomial together in individual subplots. (Hint: define a separate,
refined domain with `np.linspace()`

and use this domain to smoothly
plot the polynomials.)

Instead of using the procedure developed in Task 1 to solve the normal
equations, you may use SciPy’s least squares routine,
`scipy.linalg.lstsq()`

.

```
>>> from scipy import linalg as la
# Define A and b appropriately. Get help with the `np.vander` function
# for the construction of A
# Solve the normal equations using SciPy's least squares routine.
# The least squares solution is the first of four return values.
>>> x = la.lstsq(A, b)[0]
```

Compare your results to `np.polyfit()`

. This function receives an
array of $x$ values, an array of $y$ values, and an integer for the
polynomial degree, and returns the coefficients of the best fit
polynomial of that degree.

### Task 4

Consider a case of multiple regression, that is, a prediction task with more than one predictor. Construct a set of training data randomly generated with Numpy functions to create matrices. Use two input variables $x_1$ and $x_2$ and one output variable $y$.

For example, you can take $x_1\in [0,1]$ $x_2\in [0,1]$ and compute the corresponding $y$ values with an unknown target function:

where $\epsilon$ is an exponentially distributed noise with mean 1.

Fit the model

using the routines developed in the previous tasks, preferably the ones
from Task 1, alternatively `scipy.linalg.lstsq()`

.

Use the model found to calculate the training error, that is the sum of squared errors (or the root of it) on the same set of data used for the estimation of the model coefficients.

Plot the points and the surface of the fitted model in a 3D plot.

Repeat the operations above with the model

Which of the two models is the best?

### Task 5 (Optional)

For this task you need to study the slides on “Fitting a circle” that were not presented in class.

The general equation for an ellipse is
[ax^2 + bx + cxy + dy + ey^2 = 1.] Write a function that calculates
the parameters for the ellipse that best fits the data in the file
`ellipse.npy`

. Plot the
original data points and the ellipse together, using the following
function to plot the ellipse.

```
def plot_ellipse(a, b, c, d, e):
"""Plot an ellipse of the form ax^2 + bx + cxy + dy + ey^2 = 1."""
theta = np.linspace(0, 2*np.pi, 200)
cos_t, sin_t = np.cos(theta), np.sin(theta)
A = a*(cos_t**2) + c*cos_t*sin_t + e*(sin_t**2)
B = b*cos_t + d*sin_t
r = (-B + np.sqrt(B**2 + 4*A)) / (2*A)
plt.plot(r*cos_t, r*sin_t, lw=2)
plt.gca().set_aspect("equal", "datalim")
```