source("../R/utilities.R")
  1. The point of this unit (lesson or lessons) is to illustrate the basic concepts of linear algebra visually, using plane figures. The basic concepts include
    • Translations, rotations and scaling.
    • Matrix multiplication; composition of operators.
    • Matrix inversion
    • Linearity.
  2. Translation
    • We can specify any point in the plane by its x and y coordinates, say x=1, y=2.
    • If the x and y coordinates are concatenated to form a vector, say, c(1, 2), this vector represents a point’s position.
    • Addition of vectors represents movement, a.k.a. “translation.”
    • If we start at position c(1, 2) and add c(-3, 1.5) we move the x coordinate 3 units to the left, and the y coordinate 1.5 units up.
    • Vectors are added by adding their coordinates. Does the order of vector addition matter?
makegrid(-5:5, main="Vectors represent position.\nVector addition represents movement.", xlab="", ylab="")
points(1, 2, cex=2, pch=19, col="red")
points(1-3, 2+1.5, cex=2, pch=19, col="blue")
arrows(1, 2, 1-3, 2+1.5, lty=1, lwd=2)
text(1, 2, expression(c(1,2)), pos=1, col="red")
text(1-3, 2+1.5, expression(c(1,2) + c(-3, 1.5)), pos=3, col="blue")

  1. Scaling
    • Vectors represent positions, and vector addition represents movement or translation.
    • Matrices represent rotation and scaling. Let’s begin with scaling.
    • To scale a vector you multiply its x coordinate by one number and its y coordinate by another.
    • Multiplication by a diagonal matrix represents this process.
    • Make a 2x2 diagonal matrix like this D <- diag(c(0.75, 0.25), 2, 2) and print it.
    • The matrix is called diagonal because the only non-zero elements are on its diagonal.
    • In R %*% represents matrix multiplication. It is a different operation than *.
    • Multiply the vector c(1,1) by D. You get a row vector if you use c(1,1) %*% D and a column vector if you use D %*% c(1,1). Note the correspondence between the resulting coordinates and the matrix diagonals.
    D <- diag(c(0.75, 0.25), 2, 2)
    D
    ##      [,1] [,2]
    ## [1,] 0.75 0.00
    ## [2,] 0.00 0.25
    D %*% c(1,1)
    ##      [,1]
    ## [1,] 0.75
    ## [2,] 0.25
    • Scaling stretches or squashes things. Here’s what happens when we take a circle of radius 1 and multiply every point on its circumference by your matrix D.
    makegrid(seq(-2, 2, by=1), main="Scaling a circle by D", major_axes=FALSE)
    circle <- ellipse(1)
    polygon(circle, lwd=2)
    polygon(circle %*% D, lwd=2, border="red")
    temp <- seq(0, 2*pi, by=pi/4)
    arrows(cos(temp), sin(temp), D[1,1]*cos(temp), D[2,2]*sin(temp), lwd=2, col="blue", len=sqrt(sum(par("fin")^2))/100)
    text(-1.5, 0, "D[1,1]*x", col="blue")
    text(0, 1, "D[2,2]*y", pos=3, col="blue")

  2. Rotation
    • We’ve seen that our diagonal matrix, D, makes a circle into an ellipse. But D only has two non-zero entries. What are its other entries for?
    • To answer that question consider matrices of a different special form. \[U = \left[ \begin{array}{ccc} cos(\theta) & -sin(\theta) \\ sin(\theta) & cos(\theta) \end{array} \right]\]
    # NOTE: To display this as a plot use something like
    plot(0:1, 0:1, type='n', xlab="", ylab="", xaxt="n", yaxt="n", bty="n")
    text(.5,.5, expression(U == bgroup("[", atop(list(cos(theta), -sin(theta)), list(sin(theta), cos(theta))),"]")), cex=3)
    • We’ve created such a matrix, U, with \(\theta = \pi/3 \: radians\), which is the same as \(60 \: degrees.\)
    • Beginning with p <- c(1,0), we’ll repeatedly set p <- U %*% p and plot the result of each iteration.
    U <- matrix(c(cos(pi/3), sin(pi/3), -sin(pi/3), cos(pi/3)), 2, 2)
    makegrid(-2:2, main="Rotation matrix")
    p <- c(1,0)
    text(1,1.5, expression(U == bgroup("[", atop(list(cos(theta), -sin(theta)), list(sin(theta), cos(theta))),"]")), cex=1)
    for(i in 1:6){
      points(p[1], p[2], cex=2, pch=19, col="red")
      text(p[1], p[2], as.character(i-1), pos=2)
      ptemp <- U %*% p
      p <- ptemp
      Sys.sleep(.25) # Make a movie
    }
    • The matrix, U, rotated the first point, labeled 0, 60 degrees to point 1. Then it rotated point 1 another 60 degrees to point 2, point 2 to point 3, and so on.
    • Suppose we multiply U by itself using U2 <- U %*% U, and suppose we repeat the process with U2.
    U2 <- U %*% U
    makegrid(-2:2, main="Rotation matrix squared")
    p <- c(1,0)
    for(i in 1:6){
      points(p[1], p[2], cex=2, pch=19, col="red")
      p <- U %*% p
    }
    p <- c(1,0)
    for(i in 1:3){
      points(p[1], p[2], cex=3, pch=21, col="blue", lwd=3)
      text(p[1]-.1, p[2], as.character(i-1), pos=2)
      p <- U2 %*% p
      Sys.sleep(.5)
    }
    • The squared matrix, U2, rotated the first point 120 degrees to point 1. Then it rotated point 1 another 120 degrees to point 2. Applying the square of U once is like applying U twice.
    • This is a general rule: applying the product of two matrices is like applying first the one and then the other. As we’ll see, though, we must pay careful attention to order.
    • We’ll show this by creating a circle, then applying our two matrices, D, and U, to it in two different orders.
    circle <- ellipse(1)
    makegrid(-2:2, major_axes=FALSE, main="Matrices: order matters")
    polygon(circle, lwd=2)
    # Function polygon expects a matrix whose rows contain the x and y coordinates
    # of vertices. Hence the use of transpose.
    # TODO: clean up the use of left and right multiplication by matrices.
    polygon(t(D %*% U %*% t(circle)), lwd=2, border="red")
    text(0, 1.5, "D %*% U %*% circle", cex=1.5, col="red")
    polygon(t(U %*% D %*% t(circle)), lwd=2, border="blue")
    text(0, -1.5, "U %*% D %*% circle", cex=1.5, col="blue")
    • U is a rotation, and rotating a circle about its center doesn’t change it; U %*% circle = circle. So, D %*% U %*% circle gives the same result as D %*% circle.
    • However, if we apply D first, we change the circle into an ellipse by scaling. If we then apply U to the ellipse, we rotate it by 60 degrees.
    • Show that D %*% U != U %*% D. (They are almost transposes but not quite.)
  3. The general case
    • We’ve seen that some 2x2 matrices represent scaling and some represent rotation. We’ve also seen that multiplying matrices corresponds to applying them in sequence, and that the order in which they are applied matters.
    • Scaling and rotation are special cases. Is it possible that any 2x2 matrix can be represented as a product of scaling and rotation matrices?
    • As it turns out, the answer is yes provided complex numbers are allowed. Any matrix can be represented as a “rotation”, followed by “scaling”, followed by another “rotation” as long as the quoted words are properly generalized.
    • The generalized form of a “scaling” matrix is a diagonal matrix which may contain complex numbers.
    • The generalized form of a “rotation” matrix is a matrix which does not change the length, that is the distance from the origin, of any vector to which it is applied.
    • A representation in terms of “rotation”, then “scaling”, then “rotation”, is called a Singular Value Decomposition. (The term “singular value” refers to an element of the diagonal matrix.) We will not cover singular value decompositions in these introductory lessons, except to point out R’s svd function.
    • svd will perform a singular value decomposition. For illustration, form a random 2x2 matrix, N, as follows N <- matrix(rnorm(4), 2, 2), and let s <- svd(N). Inspect s, compare s$u %*% s$d %*% s$v with N, and so on.
  4. Inverses
    • A rotation can clearly be reversed: you just apply a rotation of the same amount in the opposite direction. To reverse a rotation of 60 degrees, you apply a rotation of minus 60 degrees. Two operations which cancel each other out like this are called inverse operations. Their associated matrices are called inverses.
    • If you simply exchange, or transpose, the rows and columns of a rotation matrix you get its inverse. R’s function, t, will form the transpose of a matrix. See what happens when you multiply R %*% t(R) and t(R) %*% R.
    • You get a diagonal matrix which scales both coordinates by 1. Since multiplication by 1 doesn’t change anything, this matrix is called the identity.
    • If rotations always have inverses, and if every matrix can be written as a rotation times a diagonal matrix times another rotation, does every diagonal matrix have an inverse?
    • The inverse of a diagonal matrix can be formed by taking the reciprocals of its diagonal entries.
    • Recall that D is diagonal. Form the matrix iD <- matrix(c(1/D[1,1], 0, 0, 1/D[2,2])) and look at the matrices D %*% iD and iD %*% D.
    • Does every diagonal matrix have an inverse? When not?
    • Does every matrix have an inverse?
    • Introduce solve(M) which returns the inverse of M.
  5. Linearity
    • Linear algebra called linear because matrices are linear operators. A formal definition of linearity is given in the figure.
    # NOTE: To display this as a plot use something like
    plot(0:1, 0:1, type='n', xlab="", ylab="", xaxt="n", yaxt="n", bty="n", main="Linearity", cex.main=2)
    text(0, .9, expression(paste("For any matrix, ", M, ",")), cex=1.5, pos=4)
    text(0, .8, expression(paste("and any vectors, ", hat(u)," and ", hat(v), ",")), cex=1.5, pos=4)
    text(0, .7, expression(paste("and any numbers, ", alpha," and ", beta, ",")), cex=1.5, pos=4)
    text(.5, .4, expression(M %*% bgroup("(", alpha*hat(u)+beta*hat(v),")") == alpha*M%*%hat(u)+beta*M%*%hat(v)), cex=2)
    text(0, .1, "where x means matrix multiplication.", cex=1.5, pos=4)
    • A closer look at matrix multiplication suggests another reason. Each component of the result on the right is a linear combination, or weighted sum, of x and y. In other words, no squares, products, powers, or more complicated functions of x and y appear in the result.
    # NOTE: To display this as a plot use something like
    plot(0:1, 0:1, type='n', xlab="", ylab="", xaxt="n", yaxt="n", bty="n", main="Linearity", cex.main=2)
    text(.5, .5, expression(bgroup("[", atop(list(M[1][1], M[1][2]), list(M[2][1], M[2][2])),"]") %*%  bgroup("[", atop(x, y), "]") == bgroup("[", atop(M[1][1]*x+M[1][2]*y, M[2][1]*x + M[2][2]*y), "]")), cex=2)
    • Linearity is a very handy property. Earlier we saw that when you apply two matrices in sequence, the result is the same as applying the product of those matrices once. The same applies to three, or four, if you think about it: applying any number of matrices in sequence is like applying their product once, as long as you are careful to multiply them in the right order. If you start with matrices, you end up with matrices. You never get anything more complicated.
  6. I had originally planned to include quadratic forms and distance in this unit, but I now think it makes more sense to cut off the visual introductions to calculus and linear algebra at this point for fear of overdoing it. More advanced topics can always be illustrated with visual examples later on.