next up previous contents
Next: About this document ... Up: Preface Previous: Preface   Contents

Symbols and Units

In the first chapter, we went over numbers of various sorts. We stopped short of fully deriving the rules of arithmetic, and in this course we will assume - for better or worse - that the reader can do basic arithmetic without using a calculator. That is, if you cannot yet add, subtract, multiply and divide ordinary (e.g. integer, or rational) numbers, it is time for you to back up and start acquiring these basic life skills before tackling a physics course. A calculator doing ill-understood magic is no substitute for an educated human brain.

Similarly, if you don't know what square roots (or cube roots or nth roots) or general powers of numbers are, and can't find or recognize (for example) the square root of 25 or the cube root of 27 when asked, if you don't know (or cannot easily compute) that $ 2^5 = 32$ , that $ 10^3
= 1000$ , that $ 5^{3/2} = 5\sqrt{5}$ , it's going to be very difficult to follow algebra that requires you to take symbols that stand for these quantities and abstract them as $ \sqrt{a}$ , or $ b^{3/2}$ .

The good news, though, is that the whole point of algebra is to divorce doing arithmetic to the greatest extent possible from the process of mathematical reasoning needed to solve any given problem. Most physicists, myself included, hate doing boring arithmetic as much as you do. That's why we don't spend much time doing it - that's what abaci and slide rules (just kidding), calculators, and computers are for. But no arithmetic-doing machine will give you the right answer to a complex problem unless you have solved the problem so that you know exactly what to ask it to do.

Enter algebra.

Suppose I want to add two numbers, but I don't know what they are yet. I know, this sounds a bit bizarre - how can I know I'll need to add them without knowing what they are? If you think about it, though, this sort of thing happens all of the time! A relay race is run, and you are a coach timing the splits at practice. You click a stop watch when each baton is passed and when the final runner crosses the finish line. You are going to want to know the total time for the race, but used up all of your stop watches timing the splits. So you are going to need to add the four times even though you don't yet know what they are.

Except that you don't want to do the addition. That's what junior assistant coaches are for. So you give the junior assistant coach a set of instructions. Verbally, they are "Take the time for the first runner, the second runner, the third runner, and the fourth runner and add them to find the total time for the relay team in the practice race".

But that's awfully wordy. So you invent symbols for the four times that need to be summed. Something pithy, yet informative. It can't be too complex, as your junior assistant coach isn't too bright. So you use (say) the symbol `$ t$ ' for time, and use an index to indicate the runner/stop watch. The splits become: $ t_1, t_2, t_3,
t_4$ . Your rule becomes:

$\displaystyle t_\tot = t_1 + t_2 + t_3 + t_4$ (2)

But that is still a bit long, and what happens if you want to form the average of all of the splits run during a day? There might be 40 or 50 of them! That's way to tedious to write out. So you invent another symbol, one that stands for summation. Summation begins with an `S', but if we use English alphabet symbols for operations, we risk confusing people if we also use them for quantities. So we pick a nice, safe Greek letter S, the capital sigma: $ \Sigma$ :

$\displaystyle t_\tot = \sum_{i=1}^4 t_i = t_1 + t_2 + t_3 + t_4$ (3)

Note that we've invented/introduced another symbol, $ i$ . $ i$ stands for the index of the time being added. We decorate the $ \Sigma$ in the summation symbol with the limits of the sum. We want to start with the first ($ i = 1$ ) time and end the repetitive summing with the last ($ i = 4$ ) time. Now we can easily change our instructions to the junior assistant coach:

$\displaystyle t_\tot = \sum_{i=1}^{52} t_i = t_1 + t_2 + ... + t_{52}$ (4)

for the four splits each in each of the thirteen races being run in the relay.

OK, so at this point we'll abandon the coaching metaphor, but the point should be clear. Accountants need ways to represent adding up your bank accounts, even though they don't know how many accounts you have coming into the office. The army needs ways of representing how much food to get for $ N$ soldiers who will spend $ D$ days in the field, each of them eating $ M$ meals per day - and at the end needs to be able to compute the total cost of providing it, given a cost per meal of $ Q$ . At some point, algebra becomes more than a vehicle for defining arithmetic that needs to be done - it becomes a way of reasoning about who systems of things - like that last example. One can reduce the army's food budget for a campaign to an algebraic form for the cost $ C$ :

$\displaystyle C = N*D*M*Q$ (5)

THat's a lot shorter, and easier to check, than the paragraphs of text that would be needed to describe it.

Coordinate Systems, Points, Vectors

Review of Vectors


Most motion is not along a straight line. If fact, almost no motion is along a line. We therefore need to be able to describe motion along multiple dimensions (usually 2 or 3). That is, we need to be able to consider and evaluate vector trajectories, velocities, and accelerations. To do this, we must first learn about what vectors are, how to add, subtract or decompose a given vector in its cartesian coordinates (or equivalently how to convert between the cartesian, polar/cylindrical, and spherical coordinate systems), and what scalars are. We will also learn a couple of products that can be constructed from vectors.

A vector in a coordinate system is a directed line between two points. It has magnitude and direction. Once we define a coordinate origin, each particle in a system has a position vector (e.g. - $ \vec{A}$ ) associated with its location in space drawn from the origin to the physical coordinates of the particle (e.g. - ( $ A_x,A_y,A_z$ )): A = A_x x + A_y y + A_z z

The position vectors clearly depend on the choice of coordinate origin. However, the difference vector or displacement vector between two position vectors does not depend on the coordinate origin. To see this, let us consider the addition of two vectors:

A + B = C


Note that vector addition proceeds by putting the tail of one at the head of the other, and constructing the vector that completes the triangle. To numerically evaluate the sum of two vectors, we determine their components and add them componentwise, and then reconstruct the total vector: C_x & = & A_x + B_x
C_y & = & A_y + B_y
C_z & = & A_z + B_z

If we are given a vector in terms of its length (magnitude) and orientation (direction angle(s)) then we must evaluate its cartesian components before we can add them (for example, in 2D): A_x = A (&thetas#theta;_A) &     & B_x = B &thetas#theta;_B
A_y = A (&thetas#theta;_A) &     & B_y = B &thetas#theta;_B This process is called decomposing the vector into its cartesian components.

The difference between two vectors is defined by the addition law. Subtraction is just adding the negative of the vector in question, that is, the vector with the same magnitude but the opposite direction. This is consistent with the notion of adding or subtracting its components. Note well: Although the vectors themselves may depend upon coordinate system, the difference between two vectors (also called the displacement if the two vectors are, for example, the postion vectors of some particle evaluated at two different times) does not.

When we reconstruct a vector from its components, we are just using the law of vector addition itself, by scaling some special vectors called unit vectors and then adding them. Unit vectors are (typically perpendicular) vectors that define the essential directions and orientations of a coordinate system and have unit length. Scaling them involves multiplying these unit vectors by a number that represents the magnitude of the vector component. This scaling number has no direction and is called a scalar. Note that the product of a vector and a scalar is always a vector: B = CA where $ C$ is a scalar (number) and $ \vec{A}$ is a vector. In this case, $ \vec{A}\ \vert\vert\ \vec{B}$ ($ \vec{A}$ is parallel to $ \vec{B}$ ).

In addition to multiplying a scalar and a vector together, we can define products that multiply two vectors together. By ``multiply'' we mean that if we double the magnitude of either vector, we double the resulting product - the product is proportional to the magnitude of either vector. There are two such products for the ordinary vectors we use in this course, and both play extremely important roles in physics.

The first product creates a scalar (ordinary number with magnitude but no direction) out of two vectors and is therefore called a scalar product or (because of the multiplication symbol chosen) a dot product. A scalar is often thought of as being a ``length'' (magnitude) on a single line. Multiplying two scalars on that line creates a number that has the units of length squared but is geometrically not an area. By selecting as a direction for that line the direction of the vector itself, we can use the scalar product to define the length of a vector as the square root of the vector magnitude times itself: A = +A ·A


From this usage it is clear that a scalar product of two vectors can never be thought of as an area. If we generalize this idea (preserving the need for our product to be symmetrically proportional to both vectors, we obtain the following definition for the general scalar product: A ·B & = & A_x*B_x + A_y*B_y ...
& = & A B (&thetas#theta;_AB)

This definition can be put into words - a scalar product is the length of one vector (either one, say $ \vert\vec{A}\vert$ ) times the component of the other vector ( $ \vert\vec{B}\vert\cos(\theta_{AB}$ ) that points in the same direction as the vector $ \vec{A}$ . Alternatively it is the length $ \vert\vec{B}\vert$ times the component of $ \vec{A}$ parallel to $ \vec{B}$ , $ \vert\vec{A}\vert\cos(\theta_{AB})$ . This product is symmetric and commutative ($ \vec{A}$ and $ \vec{B}$ can appear in either order or role).

The other product multiplies two vectors in a way that creates a third vector. It is called a vector product or (because of the multiplication symbol chosen) a cross product. Because a vector has magnitude and direction, we have to specify the product in such a way that both are defined, which makes the cross product more complicated than the dot product.

As far as magnitude is concerned, we already used the non-areal combination of vectors in the scalar product, so what is left is the product of two vectors that makes an area and not just a ``scalar length squared''. The area of the parallelogram defined by two vectors is just: Area in A ×B parallelogram = A B sin(&thetas#theta;_AB) which we can interpret as ``the magnitude of $ \vec{A}$ times the component of $ \vec{B}$ perpendicular to $ \vec{A}$ '' or vice versa. Let us accept this as the magnitude of the cross product (since it clearly has the proportional property required) and look at the direction.

The area is nonzero only if the two vectors do not point along the same line. Since two non-colinear vectors always lie in (or define) a plane (in which the area of the parallelogram itself lies), and since we want the resulting product to be independent of the coordinate system used, one sensible direction available for the product is along the line perpendicular to this plane. This still leaves us with two possible directions, though, as the plane has two sides. We have to pick one of the two possibilities by convention so that we can communicate with people far away, who might otherwise use a counterclockwise convention to build screws when we used a clockwise convention to order them, whereupon they send us left handed screws for our right handed holes and everybody gets all irritated and everything.

We therefore define the direction of the cross product using the right hand rule:

Let the fingers of your right hand lie along the direction of the first vector in a cross product (say $ \vec{A}$ below). Let them curl naturally through the small angle (observe that there are two, one of which is larger than $ \pi$ and one of which is less than $ \pi$ ) into the direction of $ \vec{B}$ . The erect thumb of your right hand then points in the general direction of the cross product vector - it at least indicates which of the two perpendicular lines should be used as a direction, unless your thumb and fingers are all double jointed or your bones are missing or you used your left-handed right hand or something.

Putting this all together mathematically, one can show that the following are two equivalent ways to write the cross product of two three dimensional vectors. In components: A ×B = (A_x*B_y - A_y*B_x)z + (A_y*B_z - A_z*B_y)x + (A_z*B_x - A_x*B_z)y where you should note that $ x,y,z$ appear in cyclic order (xyz, yzx, zxy) in the positive terms and have a minus sign when the order is anticyclic (zyx, yxz, xzy). The product is antisymmetric and non-commutative. In particular A ×B = - B ×A or the product changes sign when the order of the vectors is reversed.

Alternatively in many problems it is easier to just use the form: A ×B = A B sin(&thetas#theta;_AB) to compute the magnitude and assign the direction literally by (right) ``hand'', along the right-handed normal to the $ AB$ plane according to the right-hand rule above.

Note that this axial property of cross products is realized in nature by things that twist or rotate around an axis. A screw advances into wood when twisted clockwise, and comes out of wood when twisted counterclockwise. If you let the fingers of your right hand curl around the screw in the direction of the twist your thumb points in the direction the screw moves, whether it is in or out of the wood. Screws are therefore by convention right handed.

One final remark before leaving vector products. We noted above that scalar products and vector products are closely connected to the notions of length and area, but mathematics per se need not specify the units of the quantities multiplied in a product (that is the province of physics, as we shall see). We have numerous examples where two different kinds of vectors (with different units but referred to a common coordinate system for direction) are multiplied together with one or the other of these products. In actual fact, there often is a buried squared length or area (which we now agree are different kinds of numbers) in those products, but it won't always be obvious in the dimensions of the result.

Two of the most important uses of the scalar and vector product are to define the work done as the force through a distance (using a scalar product as work is a scalar quantity) and the torque exerted by a force applied at some distance from a center of rotation (using a vector product as torque is an axial vector). These two quantities (work and torque) have the same units and yet are very different kinds of things. This is just one example of the ways geometry, algebra, and units all get mixed together in physics.

At first this will be very confusing, but remember, back when you where in third grade multiplying integer numbers was very confusing and yet rational numbers, irrational numbers, general real numbers, and even complex numbers were all waiting in the wings. This is more of the same, but all of the additions will mean something and have a compelling beauty that comes out as you study them. Eventually it all makes very, very good sense.


One of the most important concepts in algebra is that of the function. The formal mathematical definition of the term functionFunction (mathematics) is beyond the scope of this short review, but the summary below should be more than enough to work with.

A function is a mapping between a set of coordinates (which is why we put this section after the section on coordinates) and a single value. Note well that the ``coordinates'' in question do not have to be space and/or time, they can be any set of parameters that are relevant to a problem. In physics, coordinates can be any or all of:

Note well that many of these things can equally well be functions themselves - a potential energy function, for example, will usually return the value of the potential energy as a function of some mix of spatial coordinates, mass, charge, and time. Note that the coordinates can be continuous (as most of the ones above are classically) or discrete - charge, for example, comes only multiples of $ e$ and color can only take on three values.

One formally denotes functions in the notation e.g. $ F(\Vec{x})$ where $ F$ is the function name represented symbolically and $ \Vec{x}$ is the entire vector of coordinates of all sorts. In physics we often learn or derive functional forms for important quantities, and may or may not express them as functions in this form. For example, the kinetic energy of a particle can be written either of the two following ways: K(m,\Vec{v} ) & = & 12m v^2
K & = & 12m v^2 These two forms are equivalent in physics, where it is usually ``obvious'' (at least when a student has studied adequately and accumulated some practical experience solving problems) when we write an expression just what the variable parameters are. Note well that we not infrequently use non-variable parameters - in particular constants of nature - in our algebraic expressions in physics as well, so that: U = - G m_1 m_2r is a function of $ m_1, m_2$ , and $ r$ but includes the gravitational constant $ G = 6.67\times 10^{-11}$ N-m$ ^2$ /kg$ ^2$ in symbolic form. Not all symbols in physics expressions are variable parameters, in other words.

One important property of the mapping required for something to be a true ``function'' is that there must be only a single value of the function for any given set of the coordinates. Two other important definitions are:

The domain of a function is the set of all of the coordinates of the function that give rise to unique non-infinite values for the function. That is, for function $ f(x)$ it is all of the $ x$ 's for which $ f$ is well defined.
The range of a function is the set of all values of the function $ f$ that arise when its coordinates vary across the entire domain.
For example, for the function $ f(x) = \sin(x)$ , the domain is the entire real line $ x \in (-\infty,\infty)$ and the range is $ f \in
[-1,1]$ 4.

Two last ideas that are of great use in solving physics problems algebraically are the notion of composition of functions and the inverse of a function.

Suppose you are given two functions: one for the potential energy of a mass on a spring: U(x) = 12 k x^2 where $ x$ is the distance of the mass from its equilibrium position and: x(t) = x_0 (&omega#omega;t) which is the position as a function of time. We can form the composition of these two functions by substituting the second into the first to obtain: U(t) = 12k x_0^2 ^2(&omega#omega;t) This sort of ``substitution operation'' (which we will rarely refer to by name) is an extremely important part of solving problems in physics, so keep it in mind at all times!

With the composition operation in mind, we can define the inverse. Not all functions have a unique inverse function, as we shall see, but most of them have an inverse function that we can use with some restrictions to solve problems.

Given a function $ f(x)$ , if every value in the range of $ f$ corresponds to one and only one value in its domain $ x$ , then $ f^{-1} =
x(f)$ is also a function, called the inverse of $ f$ . When this condition is satisfied, the range of $ f(x)$ is the domain of $ x(f)$ and vice versa. In terms of composition: x_0 = x(f(x_0)) and f_0 = f(x(f_0)) for any $ x_0$ in the domain of $ f(x)$ and $ f_0$ in the range of $ f(x)$ are both true; the composition of $ f$ and the inverse function for some value $ f_0$ yields $ f_0$ again and is hence an ``identity'' operation on the range of $ f(x)$ .

Many functions do not have a unique inverse, however. For example, the function: f(x) = (x) does not. If we look for values $ x_m$ in the domain of this function such that $ f(x_m) = 1$ , we find an infinite number: x_m = 2&pi#pi;m for $ m = 0, \pm 1, \pm 2, \pm 3...$ The mapping is then one value in the range to many in the domain and the inverse of $ f(x)$ is not a function (although we can still write down an expression for all of the values that each point in the range maps into when inverted).

We can get around this problem by restricting the domain to a region where the inverse mapping is unique. In this particular case, we can define a function $ g(x) = \sin^{-1}(x)$ where the domain of $ g$ is only $ x \in [-1,1]$ and the range of $ g$ is restricted to be $ g \in [-\pi/2,\pi/2)$ . If this is done, then $ x = f(g(x))$ for all $ x \in [-1,1]$ and $ x = g(f(x))$ for all $ x \in [-\pi/2,\pi/2)$ . The inverse function for many of the functions of interest in physics have these sorts of restrictions on the range and domain in order to make the problem well-defined, and in many cases we have some degree of choice in the best definition for any given problem, for example, we could use any domain of width $ \pi$ that begins or ends on an odd half-integral multiple of $ \pi$ , say $ x \in (\pi/2,3\pi/2]$ or $ x
\in [9\pi/2,11\pi/2)$ if it suited the needs of our problem to do so when computing the inverse of $ \sin(x)$ (or similar but different ranges for $ \cos(x)$ or $ \tan(x)$ ) in physics.

In a related vein, if we examine: f(x) = x^2 and try to construct an inverse function we discover two interesting things. First, there are two values in the domain that correspond to each value in the range because: f(x) = f(-x) for all $ x$ . This causes us to define the inverse function: g(x) = ±x^1/2 = ±x where the sign in this expression selects one of the two possibilities.

The second is that once we have defined the inverse functions for either trig functions or the quadratic function in this way so that they have restricted domains, it is natural to ask: Do these functions have any meaning for the unrestricted domain? In other words, if we have defined: g(x) = +x for $ x \ge 0$ , does $ g(x)$ exist for all $ x$ ? And if so, what kind of number is $ g$ ?

This leads us naturally enough into our next section (so keep it in mind) but first we have to deal with several important ideas.

Polynomial Functions

A polynomial function is a sum of monomials: f(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + ...+ a_n x^n + ... The numbers $ a_0,a_1,\ldots,a_n,\ldots$ are called the coefficients of the polynomial.

This sum can be finite and terminate at some $ n$ (called the degree of the polynomial) or can (for certain series of coefficients with ``nice'' properties) be infinite and converge to a well defined function value. Everybody should be familiar with at least the following forms: f(x) & = & a_0     (0th degree, constant)
f(x) & = & a_0 + a_1 x     (1st degree, linear)
f(x) & = & a_0 + a_1 x + a_2 x^2     (2nd degree, quadratic)
f(x) & = & a_0 + a_1 x + a_2 x^2 + a_3 x^3     (3rd degree, cubic) where the first form is clearly independent of $ x$ altogether.

Polynomial functions are a simple key to a huge amount of mathematics. For example, differential calculus. It is easy to derive: x^nx = n x^n-1 It is similarly simple to derive &int#int;x^n dx = 1n+1 x^n+1 + constant and we will derive both below to illustrate methodology and help students remember these two fundamental rules.

Next we note that many continuous functions can be defined in terms of their power series expansion. In fact any continuous function can be expanded in the vicinity of a point as a power series, and many of our favorite functions have well known power series that serve as an alternative definition of the function. Although we will not derive it here, one extremely general and powerful way to compute this expansion is via the Taylor series. Let us define the Taylor series and its close friend and companion, the binomial expansion.

The Taylor Series and Binomial Expansion

Suppose $ f(x)$ is a continuous and infinitely differentiable function. Let $ x = x_0 + \Delta x$ for some $ \Delta x$ that is ``small''. Then the following is true: f(x_0 + &Delta#Delta;x) & = & f(x)|_x = x_0 + fx|_x = x_0 &Delta#Delta;x + 12! ^2fx^2|_x = x_0 &Delta#Delta;x^2
& & + 13! ^3fx^3|_x = x_0 &Delta#Delta;x^3 + ... This sum will always converge to the function value (for smooth functions and small enough $ \Delta x$ ) if carried out to a high enough degree. Note well that the Taylor series can be rearranged to become the definition of the derivative of a function: fx|_x = x_0 = _&Delta#Delta;x &rarr#to;0 f(x_0 + &Delta#Delta;x) - f(x_0)&Delta#Delta;x + O(&Delta#Delta;x) where the latter symbols stands for ``terms of order $ \Delta x$ or smaller'' and vanishes in the limit. It can similarly be rearranged to form formal definitions for the second or higher order derivatives of a function, which turns out to be very useful in computational mathematics and physics.

We will find many uses for the Taylor series as we learn physics, because we will frequently be interested in the value of a function ``near'' some known value, or in the limit of very large or very small arguments. Note well that the Taylor series expansion for any polynomial is that polynomial, possibly re-expressed around the new ``origin'' represented by $ x_0$ .

To this end we will find it very convenient to define the following binomial expansion. Suppose we have a function that can be written in the form: f(x) = (c + x)^n where $ n$ can be any real or complex number. We'd like expand this using the Taylor series in terms of a ``small'' parameter. We therefore factor out the larger of $ x$ and $ c$ from this expression. Suppose it is $ c$ . Then: f(x) = (c + x)^n = c^n(1 + xc)^n where $ x/c < 1$ . $ x/c$ is now a suitable ``small parameter'' and we can expand this expression around $ x = 0$ : f(x) & = & c^n( 1 + n xc + 12!n(n-1)(xc)^2 .
& & .             + 13!n(n-1)(n-2)(xc)^3 + ...) Evaluate the derivatives of a Taylor series around $ x = 0$ to verify this expansion. Similarly, if $ x$ were the larger we could factor out the $ x$ and expand in powers of $ c/x$ as our small parameter around $ c =
0$ . In that case we'd get: f(x) & = & x^n( 1 + n cx + 12!n(n-1)(cx)^2 .
& & .             + 13!n(n-1)(n-2)(cx)^3 + ...)

Remember, $ n$ is arbitrary in this expression but you should also verify that if $ n$ is any positive integer, the series terminates and you recover $ (c + x)^n$ exactly. In this case the ``small'' requirement is no longer necessary.

We summarize both of these forms of the expansion by the part in the brackets. Let $ y < 1$ and $ n$ be an arbitrary real or complex number (although in this class we will use only $ n$ real). Then: (1 + y)^n = 1 + ny + 12!n(n-1)y^2 + 13!n(n-1)(n-2)y^3 + ... This is the binomial expansion, and is very useful in physics.

Quadratics and Polynomial Roots

As noted above, the purpose of using algebra in physics is so that we can take known expressions that e.g. describe laws of nature and a particular problem and transform these ``truths'' into a ``true'' statement of the answer by isolating the symbol for that answer on one side of an equation.

For linear problems that is usually either straightforward or impossible. For ``simple'' linear problems (a single linear equation) it is always possible and usually easy. For sets of simultaneous linear equations in a small number of variables (like the ones represented in the course) one can ``always'' use a mix of composition (substitution) and elimination to find the answer desired5.

What about solving polynomials of higher degree to find values of their variables that represent answers to physics (or other) questions? In general one tries to arrange the polynomial into a standard form like the one above, and then finds the roots of the polynomial. How easy or difficult this may be depends on many things. In the case of a quadratic (second degree polynomial involving at most the square) one can - and we will, below - derive an algebraic expression for the roots of an arbitrary quadratic.

For third and higher degrees, our ability to solve for the roots is not trivially general. Sometimes we will be able to ``see'' how to go about it. Other times we won't. There exist computational methodologies that work for most relatively low degree polynomials but for very high degree general polynomials the problem of factorization (finding the roots) is hard. We will therefore work through quadratic forms in detail below and then make a couple of observations that will help us factor a few e.g. cubic or quartic polynomials should we encounter ones with one of the ``easy'' forms.

In physics, quadratic forms are quite common. Motion in one dimension with constant acceleration (for example) quite often requires the solution of a quadratic in time. For the purposes of deriving the quadratic formula, we begin with the ``standard form'' of a quadratic equation: a x^2 + b x + c = 0 (where you should note well that $ c = a_0$ , $ b = a_1$ , $ a = a_2$ in the general polynomial formula given above).

We wish to find the (two) values of $ x$ such that this equation is true, given $ a, b, c$ . To do so we must rearrange this equation and complete the square. a x^2 + b x + c & = & 0
a x^2 + b x & = & -c
x^2 + ba x & = & -ca
x^2 + ba x + b^24a^2 & = & b^24a^2 - ca
(x + b2a)^2 & = & b^24a^2 - ca
(x + b2a) & = & ±b^24a^2 - ca
x & = & - b2a ±b^24a^2 - ca
x_± & = & - b ±b^2 - 4ac2a

This last result is the well-known quadratic formula and its general solutions are complex numbers (because the argument of the square root can easily be negative if $ 4ac > b^2$ ). In some cases the complex solution is desired as it leads one to e.g. a complex exponential solution and hence a trigonometric oscillatory function as we shall see in the next section. In other cases we insist on the solution being real, because if it isn't there is no real solution to the problem posed! Experience solving problems of both types is needed so that a student can learn to recognize both situations and use complex numbers to their advantage.

Before we move on, let us note two cases where we can ``easily'' solve cubic or quartic polynomials (or higher order polynomials) for their roots algebraically. One is when we take the quadratic formula and multiply it by any power of $ x$ , so that it can be factored, e.g. a x^3 + b x^2 + cx & = & 0
(a x^2 + b x + c)x & = & 0 This equation clearly has the two quadratic roots given above plus one (or more, if the power of $ x$ is higher) root $ x = 0$ . In some cases one can factor a solvable term of the form $ (x + d)$ by inspection, but this is generally not easy if it is possible at all without solving for the roots some other way first.

The other "tricky" case follows from the observation that: x^2 - a^2 = (x+a)(x-a) so that the two roots $ x = \pm a$ are solutions. We can generalize this and solve e.g. x^4 - a^4 = (x^2 - a^2)(x^2 + a^2) = (x-a)(x+a)(x-ia)(x+ia) and find the four roots $ x = \pm a, \pm ia$ . One can imagine doing this for still higher powers on occasion.

In this course we will almost never have a problem that cannot be solved using ``just'' the quadratic formula, perhaps augmented by one or the other of these two tricks, although naturally a diligent and motivated student contemplating a math or physics major will prepare for the more difficult future by reviewing the various factorization tricks for ``fortunate'' integer coefficient polynomials, such as synthetic division. However, such a student should also be aware that the general problem of finding all the roots of a polynomial of arbitrary degree is difficultPolynomial. So difficult, in fact, that it is known that no simple solution involving only arithmetical operations and square roots exists for degree 5 or greater. However it is generally fairly easy to factor arbitrary polynomials to a high degree of accuracy numerically using well-known algorithms and a computer.

Now that we understand both inverse functions and Taylor series expansions and quadratics and roots, let us return to the question asked earlier. What happens if we extend the domain of an inverse function outside of the range of the original function? In general we find that the inverse function has no real solutions. Or, we can find as noted above when factoring polynomials that like as not there are no real solutions. But that does not mean that solutions do not exist!

Complex Numbers and Harmonic Trigonometric Functions

Figure 1: A complex number maps perfectly into the two-dimensional $ xy$ coordinate system in both Cartesian and Plane Polar coordinates. The latter are especially useful, as they lead to the Euler representation of complex numbers and complex exponentials.

We already reviewed very briefly the definition of the unit imaginary number $ i = +\sqrt{-1}$ . This definition, plus the usual rules for algebra, is enough for us to define both the imaginary numbers and a new kind of number called a complex number $ z$ that is the sum of real and imaginary parts, $ z = x + iy$ .

If we plot the real part of $ z$ ($ x$ ) on the one axis and the imaginary part ($ y$ ) on another, we note that the complex numbers map into a plane that looks just like the $ x$ -$ y$ plane in ordinary plane geometry. Every complex number can be represented as an ordered pair of real numbers, one real and one the magnitude of the imaginary. A picture of this is drawn above.

From this picture and our knowledge of the definitions of the trigonometric functions we can quickly and easily deduce some extremely useful and important True Facts about:

Complex Numbers

This is a very terse review of their most important properties. From the figure above, we can see that an arbitrary complex number $ z$ can always be written as:

$\displaystyle z$ $\displaystyle =$ $\displaystyle x + i y$ (7)
  $\displaystyle =$ $\displaystyle \vert z\vert\left( \cos(\theta) + i \vert z\vert \sin(\theta)\right)$ (8)
  $\displaystyle =$ $\displaystyle \vert z\vert e^{i\theta}$ (9)

where $ x = \vert z\vert\cos(\theta)$ , $ y = \vert z\vert\sin(\theta)$ , and $ \vert z\vert = \sqrt{x^2
+ y^2}$ . All complex numbers can be written as a real amplitude $ \vert z\vert$ times a complex exponential form involving a phase angle. Again, it is difficult to convey how incredibly useful this result is without devoting an entire book to this alone but for the moment, at least, I commend it to your attention.

There are a variety of ways of deriving or justifying the exponential form. Let's examine just one. If we differentiate $ z$ with respect to $ \theta$ in the second form (73) above we get: z&thetas#theta; = |z|(-(&thetas#theta;) + i (&thetas#theta;)) = i |z|((&thetas#theta;) + i(&thetas#theta;)) = iz

This gives us a differential equation that is an identity of complex numbers. If we multiply both sides by $ d\theta$ and divide both sizes by $ z$ and integrate, we get: z = i&thetas#theta;+ constant If we use the inverse function of the natural log (exponentiation of both sides of the equation: e^z & = & e^(i&thetas#theta;+ constant) = e^ constant e^i&thetas#theta;
z & = & |z|e^i&thetas#theta; where $ \vert z\vert$ is basically a constant of integration that is set to be the magnitude of the complex number (or its modulus) where the complex exponential piece determines its complex phase.

There are a number of really interesting properties that follow from the exponential form. For example, consider multiplying two complex numbers $ a$ and $ b$ : a & = & |a|e^i&thetas#theta;_a = |a|(&thetas#theta;_a) + i|a|(&thetas#theta;_a)
b & = & |b|e^i&thetas#theta;_b = |b|(&thetas#theta;_b) + i|b|(&thetas#theta;_b)
ab & = & |a||b| e^i(&thetas#theta;_a + &thetas#theta;_b) and we see that multiplying two complex numbers multiplies their amplitudes and adds their phase angles. Complex multiplication thus rotates and rescales numbers in the complex plane.

Trigonometric and Exponential Relations

$\displaystyle e^{\pm i \theta}$ $\displaystyle =$ $\displaystyle \cos(\theta) \pm i \sin(\theta)$ (10)
$\displaystyle \cos(\theta)$ $\displaystyle =$ $\displaystyle \frac{1}{2} \left(e^{+i \theta} + e^{-i
\theta}\right)$ (11)
$\displaystyle \sin(\theta)$ $\displaystyle =$ $\displaystyle \frac{1}{2i} \left(e^{+i \theta} - e^{-i
\theta}\right)$ (12)

From these relations and the properties of exponential multiplication you can painlessly prove all sorts of trigonometric identities that were immensely painful to prove back in high school

There are a few other trig relations (out of a lotList of trigonometric identities that can be derived) that are very useful in physics. For example:

$\displaystyle \sin(A) \pm \sin(B)$ $\displaystyle =$ $\displaystyle 2 \sin\left(\frac{A \pm B}{2}\right)
\cos\left(\frac{A \mp B}{2}\right)$ (13)
$\displaystyle \cos(A) + \cos(B)$ $\displaystyle =$ $\displaystyle 2 \cos\left(\frac{A + B}{2}\right)
\cos\left(\frac{A - B}{2}\right)$ (14)
$\displaystyle \cos(A) - \cos(B)$ $\displaystyle =$ $\displaystyle - 2 \sin\left(\frac{A + B}{2}\right)
\sin\left(\frac{A - B}{2}\right)$ (15)

These expressions are used (along with the superposition principle) to add waves with identical amplitudes in the study of beats, interference, and diffraction. However, they are somewhat limited - we don't have a good trigonometric identity for adding three or four or N sine wave, even when the amplitudes are the same and they angles differ by simple multiples of a fixed phase. For problems like this, we will use phasor diagrams to graphically find new identities that solve these problems, often with an elegant connection back to complex exponentials.

Power Series Expansions

These can easily be evaluated using the Taylor series discussed in the last section, expanded around the origin $ z = 0$ , and are an alternative way of seeing that $ z = e^{i\theta}$ . In the case of exponential and trig functions, the expansions converge for all $ z$ , not just small ones (although they of course converge faster for small ones).

$\displaystyle e^{x}$ $\displaystyle =$ $\displaystyle 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \ldots$ (16)
$\displaystyle \cos(x)$ $\displaystyle =$ $\displaystyle 1 - \frac{x^2}{2!} + \frac{x^4}{4!} + \ldots$ (17)
$\displaystyle \sin(x)$ $\displaystyle =$ $\displaystyle x - \frac{x^3}{3!} + \frac{x^5}{5!} + \ldots$ (18)

Depending on where you start, these can be used to prove the relations above. They are most useful for getting expansions for small values of their parameters. For small x (to leading order):
$\displaystyle e^{x}$ $\displaystyle \approx$ $\displaystyle 1 + x$ (19)
$\displaystyle \cos(x)$ $\displaystyle \approx$ $\displaystyle 1 - \frac{x^2}{2!}$ (20)
$\displaystyle \sin(x)$ $\displaystyle \approx$ $\displaystyle x$ (21)
$\displaystyle \tan(x)$ $\displaystyle \approx$ $\displaystyle x$ (22)

We will use these fairly often in this course, so learn them.

An Important Relation

A relation I will state without proof that is very important to this course is that the real part of the $ x(t)$ derived above:

$\displaystyle \Re(x(t))$ $\displaystyle =$ $\displaystyle \Re(x_{0+} e^{+i\omega t} + x_{0-} e^{-i\omega t})$ (23)
  $\displaystyle =$ $\displaystyle X_0 \cos(\omega t + \phi)$ (24)

where $ \phi$ is an arbitrary phase. You can prove this in a few minutes or relaxing, enjoyable algebra from the relations outlined above - remember that $ x_{0+}$ and $ x_{0-}$ are arbitrary complex numbers and so can be written in complex exponential form!


In this section we present a lightning fast review of calculus. It is most of what you need to do well in this course.

Differential Calculus

The slope of a line is defined to be the rise divided by the run. For a curved line, however, the slope has to be defined at a point. Lines (curved or straight, but not infinitely steep) can always be thought of as functions of a single variable. We call the slope of a line evaluated at any given point its derivative, and call the process of finding that slope taking the derivative of the function.

Later we'll say a few words about multivariate (vector) differential calculus, but that is mostly beyond the scope of this course.

The definition of the derivative of a function is: fx = _&Delta#Delta;x &rarr#to;0f(x+&Delta#Delta;x) - f(x)&Delta#Delta; x This is the slope of the function at the point $ x$ .

First, note that: (a f)x = a fx for any constant $ a$ . The constant simply factors out of the definition above.

Second, differentiation is linear. That is: x(f(x)+g(x)) = f(x)x + g(x)x

Third, suppose that $ f = g h$ (the product of two functions). Then fx & = & (gh)x = _&Delta#Delta;x &rarr#to;0g(x+&Delta#Delta;x)h(x + &Delta#Delta; x) - g(x)h(x)&Delta#Delta;x
& = & _&Delta#Delta;x &rarr#to;0(g(x) + gx&Delta#Delta;x)(h(x) + hx&Delta#Delta;x) - g(x)h(x) )&Delta#Delta;x
& = & _&Delta#Delta;x &rarr#to;0(g(x)hx&Delta#Delta;x + gxh(x)&Delta#Delta;x + gxhx(&Delta#Delta;x)^2)) &Delta#Delta;x
& = & g(x)hx + gxh(x) where we used the definition above twice and multiplied everything out. If we multiply this rule by $ dx$ we obtain the following rule for the differential of a product: d(gh) = g dh + h dg This is a very important result and leads us shortly to integration by parts and later in physics to things like Green's theorem in vector calculus.

We can easily and directly compute the derivative of a mononomial: x^nx & = & _&Delta#Delta;x &rarr#to;0x^n + nx^n-1&Delta#Delta;x + n(n-1)x^n-2(&Delta#Delta;x)^2 ...+ (&Delta#Delta;x)^n) - x^2&Delta#Delta;x
& = & _&Delta#Delta;x &rarr#to;0(nx^n-1 + n(n-1)x^n-2(&Delta#Delta;x) ... + (&Delta#Delta;x)^n-1)
& = & n x^n-1 or we can derive this result by noting that $ \deriv{x}{x} = 1$ , the product rule above, and using induction. If one assumes $ \deriv{x^n}{x}
= nx^{n-1}$ , then x^n+1x & = & (x^n·x)x
& = & nx^n-1·x + x^n·1
& = & nx^n + x^n = (n+1)x^n and we're done.

Again it is beyond the scope of this short review to completely rederive all of the results of a calculus class, but from what has been presented already one can see how one can systematically proceed. We conclude, therefore, with a simple table of useful derivatives and results in summary (including those above): ax & = & 0         a constant
(a f(x)x & = & af(x)x         a constant
x^nx & = & n x^n - 1
x(f(x)+g(x)) & = & f(x)x + g(x)x
fx & = & fu ux         chain rule
(gh)x & = & ghx + gxh         product rule
(g/h)x& = & gxh - g hxh^2
e^xx & = & e^x
e^(ax)x & = & ae^ax         from chain rule, u = ax
(ax)x & = & a(ax)
(ax)x & = & -a(ax)
(ax)x & = & a^2(ax) = a ^2(ax)
(ax)x & = & -a^2(ax) = -a^2(ax)
(x)x & = & 1x
There are a few more integration rules that can be useful in this course, but nearly all of them can be derived in place using these rules, especially the chain rule and product rule.

Integral Calculus

With differentiation under our belt, we need only a few definitions and we'll get integral calculus for free. That's because integration is antidifferentiation, the inverse process to differentiation. As we'll see, the derivative of a function is unique but its integral has one free choice that must be made. We'll also see that the (definite) integral of a function in one dimension is the area underneath the curve.

There are lots of ways to facilitate derivations of integral calculus. Most calculus books begin (appropriately) by drawing pictures of curves and showing that the area beneath them can be evaluated by summing small discrete sections and that by means of a limiting process that area is equivalent to the integral of the functional curve. That is, if $ f(x)$ is some curve and we wish to find the area beneath a segment of it (from $ x = x_1$ to $ x = x_2$ for example), one small piece of that area can be written: &Delta#Delta;A = f(x)&Delta#Delta;x The total area can then be approximately evaluated by piecewise summing $ N$ rectangular strips of width $ \Delta x = (x_2 - x_1)/N$ : A &ap#approx;&sum#sum;_n=1^N f(x_1 + n·&Delta#Delta;x) &Delta#Delta;x (Note that one can get slightly different results if one centers the rectangles or begins them on the low side, but we don't care.)

In the limit that $ N \to \infty$ and $ \Delta x \to 0$ , two things happen. First we note that: f(x) = Ax by the definition of derivative from the previous section. The function $ f(x)$ is the formal derivative of the function representing the area beneath it (independent of the limits as long as $ x$ is in the domain of the function.) The second is that we'll get tired adding teensy-weensy rectangles in infinite numbers. We therefore make up a special symbol for this infinite limit sum. $ \Sigma$ clearly stands for sum, so we change to another stylized ``ess'', $ \int$ , to also stand for sum, but now a continuous and infinite sum of all the infinitesimal pieces of area within the range. We now write: A = &int#int;_x_1^x_2 f(x) dx as an exact result in this limit.

The beauty of this simple approach is that we now can do the following algebra, over and over again, to formulate integrals (sums) of some quantity. Ax & = & f(x)
dA & = & f(x)dx
&int#int;dA & = & &int#int;f(x)dx
A & = & &int#int;_x_1^x_2 f(x)dx

This areal integral is called a definite integral because it has definite upper and lower bounds. However, we can also do the integral with a variable upper bound: A(x) = &int#int;_x_0^x f(x')dx' where we indicate how $ A$ varies as we change $ x$ , its upper bound.

We now make a clever observation. $ f(x)$ is clearly the function that we get by differentiating this integrated area with a fixed lower bound (which is still arbitrary) with respect to the variable in its upper bound. That is f(x) = A(x)x This slope must be the same for all possible values of $ x_0$ or this relation would not be correct and unique! We therefore conclude that all the various functions $ A(x)$ that can stand for the area differ only by a constant (called the constant of integration): A'(x) = A(x) + C so that f(x) = A'(x)x = A(xx + Cx = A(x)x

From this we can conclude that the indefinite integral of $ f(x)$ can be written: A(x) = &int#int;^x f(x)dx + A_0 where $ A_0$ is the constant of integration. In physics problems the constant of integration must usually be evaluated algebraically from information given in the problem, such as initial conditions.

From this simple definition, we can transform our existing table of derivatives into a table of (indefinite) integrals. Let us compute the integral of $ x^n$ as an example. We wish to find: g(x) = &int#int;x^n dx where we will ignore the constant of integration as being irrelevant to this process (we can and should always add it to one side or the other of any formal indefinite integral unless we can see that it is zero). If we differentiate both sides, the differential and integral are inverse operations and we know: g(x)x = x^n Looking on our table of derivatives, we see that: x^n+1x = (n+1)x^n or g(x)x = x^n = 1n+1x^n+1x and hence: g(x) = &int#int;^x x^n dx = 1n+1x^n+1 by inspection.

Similarly we can match up the other rules with integral equivalents. (a f(x))x = af(x)x leads to: &int#int;a f(x) dx = a&int#int;f(x) dx

A very important rule follows from the rule for differentiating a product. If we integrate both sides this becomes: &int#int;d(gh) = gh = &int#int;g dh + &int#int;h dg which we often rearrange as: &int#int;g dh = &int#int;d(gh) - &int#int;h dg = gh - &int#int;h dg the rule for integration by parts which permits us to throw a derivative from one term to another in an integral we are trying to do. This turns out to be very, very useful in evaluating many otherwise extremely difficult integrals.

If we assemble the complete list of (indefinite) integrals that correspond to our list of derivatives, we get something like: &int#int;0 dx & = & 0 + c = c         with c constant
&int#int;a f(x) dx & = & a &int#int;f(x) dx
&int#int;x^n dx & = & 1n+1x^n+1 + c
&int#int;(f + g) dx & = & &int#int;f dx + &int#int;g dx
&int#int;f(x) dx & = & &int#int;f(u) xu du         change variables
&int#int;d(gh) & = & gh = &int#int;g dh + &int#int;h dg         or
&int#int;g dh & = & gh - &int#int;h dg         integration by parts
&int#int;e^x dx & = & e^x + a         or change variables to
&int#int;e^ax dx & = & 1a &int#int;e^ax d(ax) = 1a e^ax + c
&int#int;(ax) dx & = & 1a &int#int;(ax) d(ax) = 1a (ax) + c
&int#int;(ax) dx & = & 1a &int#int;(ax) d(ax) = - 1a (ax) + c
&int#int;dxx & = & (x) + c

It's worth doing a couple of examples to show how to do integrals using these rules. One integral that appears in many physics problems in E&M is: &int#int;_0^R r dr(z^2 + r^2)^3/2 This integral is done using u substitution - the chain rule used backwards. We look at it for a second or two and note that if we let u = (z^2 + r^2) then du = 2 r dr and we can rewrite this integral as: &int#int;_0^R r dr(z^2 + r^2)^3/2 & = & 12 &int#int;_0^R 2 r dr(z^2 + r^2)^3/2
& = & 12 &int#int;_z^2^(z^2+R^2) u^-3/2 du
& = & - u^-1/2 |_z^2^(z^2+R^2)
& = & 1z - 1(z^2 + R^2)^1/2 The lesson is that we can often do complicated looking integrals by making a suitable $ u$ -substitution that reduces them to a simple integral we know off of our table.

The next one illustrates both integration by parts and doing integrals with infinite upper bounds. Let us evaluate: &int#int;_0^&infin#infty;x^2 e^-ax dx Here we identify two pieces. Let: h(x) = x^2 and d(g(x)) = e^-ax dx = -1a e^-ax d(-ax) = - 1ad(e^-ax) or $ g(x) = -(1/a)e^{-ax}$ . Then our rule for integration by parts becomes: &int#int;_0^&infin#infty;x^2 e^-ax dx & = & &int#int;_0^&infin#infty;h(x) dg
& = & h(x)g(x) |_0^&infin#infty;- &int#int;_0^&infin#infty;g(x) dh
& = & -1a x^2 e^-ax |_0^&infin#infty;+ 1a&int#int;_0^&infin#infty;e^-ax 2x dx
& = & 2a&int#int;_0^&infin#infty;x e^-ax dx
We repeat this process with $ h(x) = x$ and with $ g(x)$ unchanged: &int#int;_0^&infin#infty;x^2 e^-ax dx & = & 2a&int#int;_0^&infin#infty;x e^-ax dx
& = & -2a^2 x e^-ax |_0^&infin#infty;+ 2a^2&int#int;_0^&infin#infty;e^-ax dx
& = & 2a^2&int#int;_0^&infin#infty;e^-ax dx
& = & - 2a^3&int#int;_0^&infin#infty;e^-ax d(-ax)
& = & - 2a^3 e^-ax |_0^&infin#infty;= 2a^3

If we work a little more generally, we can show that: &int#int;_0^&infin#infty;x^n e^-ax dx = (n+1)!a^n This is just one illustration of the power of integration by parts to help us do integrals that on the surface appear to be quite difficult.

Vector Calculus

This book will not use a great deal of vector or multivariate calculus, but a little general familiarity with it will greatly help the student with e.g. multiple integrals or the idea of the force being the negative gradient of the potential energy. We will content ourselves with a few definitions and examples.

The first definition is that of the partial derivative. Given a function of many variables $ f(x,y,z...)$ , the partial derivative of the function with respect to (say) $ x$ is written: fx and is just the regular derivative of the variable form of $ f$ as a function of all its coordinates with respect to the $ x$ coordinate only, holding all the other variables constant even if they are not independent and vary in some known way with respect to $ x$ .

In many problems, the variables are independent and the partial derivative is equal to the regular derivative: fx = fx

In other problems, the variable $ y$ might depend on the variable $ x$ . So might $ z$ . In that case we can form the total derivative of $ f$ with respect to $ x$ by including the variation of $ f$ caused by the variation of the other variables as well (basically using the chain rule and composition): fx = fx + fyyx + fzzx + ... Note the different full derivative symbol on the left. This is called the ``total derivative'' with respect to $ x$ . Note also that the independent form follows from this second form because $ \partialdiv{y}{x} = 0$ and so on are the algebraic way of saying that the coordinates are independent.

There are several ways to form vector derivatives of functions, especially vector functions. We begin by defining the gradient operator, the basic vector differential form: = x \Hat{x} + y \Hat{y} + z \Hat{z} This operator can be applied to a scalar multivariate function $ f$ to form its gradient: f = fx \Hat{x} + fy \Hat{y} + fz \Hat{z} The gradient of a function has a magnitude equal to its maximum slope at the point in any possible direction, pointing in the direction in which that slope is maximal. It is the ``uphill slope'' of a curved surface, basically - the word ``gradient'' means slope. In physics this directed slope is very useful.

If we wish to take the vector derivative of a vector function there are two common ways to go about it. Suppose $ \Vec{E}$ is a vector function of the spatial coordinates. We can form its divergence: \Vec{E} = E_xx + E_yy + E_zz or its curl: ×\Vec{E} = (E_yz - E_zy)\Hat{x} + (E_zx - E_xz)\Hat{y} + (E_xy - E_yx)\Hat{z} These operations are extremely important in physics courses, especially the more advanced study of electromagnetics, where they are part of the differential formulation of Maxwell's equations, but we will not use them in a required way in this course. We'll introduce and discuss them and work a rare problem or two, just enough to get the flavor of what they mean onboard to front-load a more detailed study later (for majors and possibly engineers or other advanced students only).

Multiple Integrals

The last bit of multivariate calculus we need to address is integration over multiple dimensions. We will have many occasions in this text to integrate over lines, over surfaces, and over volumes of space in order to obtain quantities. The integrals themselves are not difficult - in this course they can always be done as a series of one, two or three ordinary, independent integrals over each coordinate one at a time with the others held "fixed". This is not always possible and multiple integration can get much more difficult, but we deliberately choose problems that illustrate the general idea of integrating over a volume while still remaining accessible to a student with fairly modest calculus skills, no more than is required and reviewed in the sections above.

[Note: This section is not yet finished, but there are examples of all of these in context in the relevant sections below. Check back for later revisions of the book PDF (possibly after contacting the author) if you would like this section to be filled in urgently.]

next up previous contents
Next: About this document ... Up: Preface Previous: Preface   Contents
Robert G. Brown 2014-08-13