Day 03 in Mat360

Last time:

Next time:

Today:

Announcements:
- Your first homework set is due.
  As a rule, please set that on the table, folded in half the long way, at the beginning of class.
- You have an assignment due next time, this Thursday, as well. Questions?
- There's a new assignment on the assignment page (it's due next Thursday).
Section 1.4: Error

Okay: we know that we're going to make errors. Even the Bible got ${\pi}$ wrong ( ${{\pi}=3}$ )!
But it won't always be so simple as a botched calculation of characteristics of a circle (1 Kings 7:23). There are other sorts of errors we're going to make through the the necessity of approximations:
- Approximating general functions with polynomials.
- All reals are rational, to a computer.
So, some important questions:
- How do we define errors?
- How do we measure errors?
- How do we avoid errors?
- How do we prepare for errors?
- How do we minimize errors (since we're guaranteed to get some)?
Our authors want to make sure that we are all on the same page with respect to errors, so we start with some definitions:
- Section 1.4.1: Definitions
  1. ${error\equiv}$ approximate value - true value
    Our authors suggest that it is sometimes useful to have a name for the negative of the error:
  2. ${remainder\equiv}$ true value - approximate value
    Or, to turn things around,
    
    ${true\ value\equiv}$ approximate value + remainder
    However, "What we want is the error to be relatively small." (p. 15):
  3. ${relative\ error\equiv\frac{error}{true\ value}$
    And our authors then note that, combining these relationships, that we can write
    
    approximate value = (true value)(1 + relative error)
  4. ulp: unit in the last place.
    The "unit in the last place" is what we think of as the error when we look at an approximation, such as this approximation of e (= 2.718281828459045....):
    
    ${e{\approx}2.71828182}$
    obtained by truncation (losing .000000008459045....)
    I might have rounded, instead:
    
    ${e{\approx}2.71828183}$
    In either case, I believe in my heart that the error I'm making is less than .00000001 (one "unit in the last place") in size. Notice that, by specifying "in size", I'm ignoring the sign of the number.
    Question: put those two approximations and what I "believe in my heart" together, and what do you get?
    As the authors say, "An error no greater than 0.5 ulp is always possible, but, as stated in Section 1.2, it is not always practical." In particular, we can get it down to a tenth of an ulp by moving to the next place value -- but that costs something.
  5. ${decimal\ places\ of\ accuracy\equiv}-\log_{10}|error|$
  6. ${digits\ of\ accuracy\equiv}-\log_{10}|relative\ error|$
    Let's consider an example to discuss these two: Suppose that the true value is 1.2345, and that the approximate value found is 1.2346.
    
    Using properties of logs, notice that
    ${digits\ of\ accuracy=-\log_{10}|error|+{\log_{10}|true\ value|}$ or ${digits\ of\ accuracy=decimals\ of\ accuracy+\log_{10}|true\ value|$ and that ${\log_{10}|true\ value|$ Question: tells you what?
- Section 1.4.2: Sources of error
  One of the things that the authors do in this section is discuss errors that we won't concern ourselves with: modeling errors, or programming errors, or hardware errors.
  The errors they do care about they break into two types:
  - Data errors:
    - measurement errors, or
    - errors passed to us by previous calculations.
  - Computational errors:
    - roundoff errors, or
    - truncation errors.
  I might add "output errors" to their Figure 1.7, p. 17:
  
  We don't provide as many decimals as we've obtained, say, in a table. I guess we won't worry about those, either (but, in reality, you need to). It's related to that question "how many digits do I give my answer to?" that students ask.... We don't have a good simple answer.
  The authors make an amusing argument that just because we get error-riddled input doesn't mean that we can be sloppy. "It is best to treat the input values as exact and not make judgments about their accuracy or the morality of computing accurate answers to inaccurate problems." (my emphasis)
  Data errors are propagated through further calculations (error propagation).
  Let ${\hat{a}}$ be an approximation to datum ${a}$ : then
  
  ${propagated\ data\ error{\equiv}f(\hat{a})-f(a)}$
  Then
  
  ${computational\ error{\equiv}{\hat{f}}(\hat{a})-f({\hat{a}})}$
  caused by inaccurately computing f. Figure 1.8 is a good one to illustrate these various sources of error, and the discussion:
  
  By the way, notice that if we add the errors we get
  
  ${total\ error{\equiv}{\hat{f}}(\hat{a})-f(a)}$
  The total error is made up of the sum of the computational and data propagation errors.
- Section 1.4.3: Specifying uncertain quantities
  The authors consider a case where a quantity x does not come to us as a value, but rather as a set of values:
  
  1.423, 1.399, 1.491, 1.449, 1.480
  How do we proceed?
  Obviously we could do a computation for each value. However then our audience is going to ask us for an answer, and we're going to have a bunch (a set)! They may not be okay with that....
  We could use some statistics to give an answer. We could use the mean of the set of potential values. But we want to avoid losing track that there was some variance in that input (and failing to warn the user at the output end).
  We could treat the value as an interval (from lowest to highest data values), although we'd still have the problem of what to report.
  Let's assume that the true value lies within the data range: we assume that x lies in the interval [1.399,1.491]. So now the question is, what value might we assign to x?
  This section provides a reality check on the term "significant digits" -- "there is no consensus on what this means" -- and they provide an example which illustrates the danger of going with the notion that the least significant digit should be in error by at most 5 units in that place.
  See Figure 1.9:
  
  The question the authors ask (and answer) is this:
  Can you assign a digit to the hundredths place that will not be in error by any more than five hundredths?
Let's take a look at one problem from the text: it's kind of interesting, because it introduces the floor and ceiling functions, and makes use of logs: #3, p. 20.
Section 1.5: Error Propagation

The authors give three examples of propagation of errors:
1. Computation of a side of a triangle from two others.
2. Roots of Wilkinson's polynomial
3. Weather prediction (and chaos)
They define the condition number of procedure is defined as

${\frac{|relative\ error\ in\ results|}{|relative\ error\ in\ data|} >> 1}$
Ill-conditioned describes a problem that is sensitive to small changes in initial conditions; well-conditioned describes a problem that is insensitive to small changes in initial conditions.
- In the first example, a change of .1% in the length of a side leads to a 2.3% error in the calculation of the side of interest.
- In the second, a microscopic change in a single coefficient (-210 to -210+10^-7) leads to 20% relative errors in the magnitude of some roots.
- In weather prediction, small changes in initial conditions can quickly lead to utterly different weather projections. (Authors claim 15 day limit in weather forecasting). There's nothing to do about this system, as it is chaotic.
There's some good news, however: it's true that in some cases errors don't actually propagate. Data errors to the computation of the square roots as computed by Babylonians don't grow -- they die out. The same root is obtained with a bad starting approximation or a good one. Small errors made up front in choice of the starting value fail to propagate.
We emphasize that, in this chapter, we are going to assume that the procedure itself is carried out exactly (e.g. square roots); the propagated error is simply a consequence of the procedure acting on error (not of how the procedure is implemented).
As examples, the authors consider the ordinary arithmetic operations (which are so-called "binary operations" (taking two inputs).
The first equation demonstrates the value of one of the equations we looked at last time: adding a and b, each with a little relative error, we get a sum which is in relative error by

${(a+b)(1+\epsilon_{a+b}){=}a(1+\epsilon_a)+b(1+{\epsilon_b})$
or
${(a+b)\epsilon_{a+b}{=}a\epsilon_a+b{\epsilon_b}$
so that
${\epsilon_{a+b}{=}\frac{a\epsilon_a+b{\epsilon_b}}{a+b}$

Imagine that there's no relative error in b. Then

${\epsilon_{a+b}{=}\frac{a{\epsilon_a}}{a+b}$
and
${\frac{|\epsilon_{a+b}|}{|{\epsilon_a}|}}{=}\frac{|a|}{|a+b|}$

Question: Under what conditions will this be much much greater (>>) than 1?
Let's take a look at an example where the errors cause trouble:

Interestingly enough, multiplication and division are robust ("well-conditioned") to data errors: performing a similar analysis (#3, p. 28), we have

${(a*b)(1+\epsilon_{a*b}){=}a(1+\epsilon_a)*b(1+{\epsilon_b})$
or
${(1+\epsilon_{a*b}){=}(1+\epsilon_a)*(1+\epsilon_b)$
so that
${\epsilon_{a*b}{=}\epsilon_a+\epsilon_b+{\epsilon_a}*{\epsilon_b}$

Again, imagine that there's no relative error in b. Then

${\frac{|\epsilon_{a*b}|}{|{\epsilon_a}|}}{=}1$
Another way to simplify the analysis is to assume that the data errors are roughly the same size: then

${\epsilon_{a*b}{=}2\epsilon_a+{\epsilon_a}^2$
${\frac{|\epsilon_{a*b}|}{|{\epsilon_a}|}}{=}2+\epsilon_a$
which, again, is not big (assuming small data errors, which is generally the case).
- Section 1.5.1: Condition number of a unary operation
  So now let's talk about some unary operations (e.g. cosine, rather than the binary operations we just considered). Again, we're going to assume that computations are carried out precisely -- it's just data error propagation we're concerned with at the moment.
  The example our authors consider is one of a function ${f({a})}$ , given data ${\hat{a}{=}a(1+{\epsilon_a})}$ : then
  
  ${f({a})(1+{\epsilon_{f({a})}}){=}f({a(1+{\epsilon_a})})}$
  Let's suppose that f is differentiable. If so, then we might make use of the Taylor Series Expansion. I told you earlier to expect it -- here it is, and it will occur over and over again.
  If
  ${|a\epsilon_a|<<1}$ then we can approximate
  
  ${f({a(1+{\epsilon_a})}){\approx}f({a})+a{\epsilon_a}f'(a)}$
  and then
  ${f({a})(1+{\epsilon_{f({a})}}){\approx}f({a})+a{\epsilon_a}f'(a)}$
  so that
  ${\frac{\epsilon_{f({a})}}{\epsilon_a}{\approx}\frac{af'(a)}{f(a)}$
  or
  ${\frac{\epsilon_{f({a})}}{\epsilon_a}{\approx}\frac{a}{f(a)}f'(a)$
  
  which our authors call the relative derivative of ${f({x})}$ at ${a}$ , and the condition number of ${f({x})}$ at ${a}$ is the absolute value of the relative derivative of ${f({x})}$ at ${a}$ .
  In particular, this works better and better as ${\epsilon_a\to{0}}$ .
  We can say, in particular, that the condition is directly proportional to the derivative, and to the size of a; it's inversely proportional to the size of ${f({a})}$ .
  Examples: compute the relative derivative of
  - ${f(x){=}x^2}$
  - ${f(x){=}\cos(x)}$
  How do these two differ in their condition?
- Section 1.5.2: Interval analysis
  I've got to say that I'm not a huge fan of interval analysis, in general. For example, in a problem exhibiting chaos, intervals can go haywire (source).
  Interval analysis is well-behaved when the functions involved are monotonic (increasing or decreasing). But, as the authors illustrate, two different expressions for the same quantity can yield different intervals.
  Think about the interval ${[0,-{\pi}]}$ for ${\sin(x)}$ . If you just check the endpoints, you get the single point 0.
  And if a function is discontinuous, all hell can break loose!
Section 2.1:

"The main idea is that the computer works with a finite subset of the reals known as machine numbers." (p. 33)
- Section 2.1: Positional Number Systems
  I might have started this chapter with Figure 2.3, on page 41:
  
  It gives a picture of machine numbers on a "toy binary computer". Section 2.1 makes a point about the need to consider other bases (other than 10). Among these other bases, the base 2 is probably the most important.
  Question: why does base 2 figure so prominently in computer science?
  There's a beautiful example here of how base 3 is used for tagging hogs:
  
  Question: I grabbed the image above off the interwebs, but I'd have liked a better one, that illustrates something in the book -- what would I have liked?
  Question: What other bases do you use? Why?
  I've asked you to write a base converter for homework. Do you know how to convert from a one base to another?
  - Suppose, for example, that you want to convert from 10 to 2; how do you do that?
  - Then, how do you convert from 2 to 10?
  Let's try a few. What's
  - ${87_{10}}$ in base 2?
  - ${10010100111_2}$ in base 10?
  - One more, with a twist: ${87_{10}}$ in base 8?
- Section 2.2: Floating-Point Numbers
  In this section the authors describe the manner in which machines are stored in the computer. They focus on "floating-point numbers", which are represented by three parts:
  1. The sign of the number
  2. The position of the "radix" point (aka decimal point in base 10)
  3. The mantissa (the known digits)
  So a number is given in what looks like scientific notation, e.g. +2.99792458E08 (m/s) Question: What is that number?
  Definition 2.1: A real number is said to be an n-digit number if it can be expressed as
  ${{\pm}10^e{\times}d_1.d_2\ldots{d_n}}$
  Question: They then ask "What's an n-bit number?" (p. 39) What do you tell them?
  Let's imagine that our machine has base-10 architecture, with ${-9\le{e}\le{9}}$ , and ${n=4}$ . Then we know exactly which numbers may be represented: numbers from
  
  Largest magnitude numbers -9.999x10⁹ +9.999x10⁹
  
  Smallest magnitude numbers -1.000x10^-9 +1.000x10^-9
  
  You might think that we could consider smaller numbers (e.g. +0.001x10^-9), but our numbers are frequently normalized so that the leading digit () is non-zero (except for 0 itself). This gives us exactly the same number of digits at each order of magnitude of 10.
  If we allowed non-zero leading digits, then there would be redundant representations for many numbers (e.g. +1.000x10^-9=+0.100x10^-8)
  Question: what do all the machine numbers look like if we restrict a machine so that it has architecture ${-1\le{e}\le{1}}$ , and ${n=2}$ ?
  Now, in reality, computations are usually done in base 2, and the IEEE standard for single precision and double precision are
  
  Single Double
  
  Base 2 2
  
  n 24 53
  
  e [-126:127] [-1022:1023]
  
  Question: in each case, how many exponents are there in the exponent range? Of what significance is that number?
  Our authors describe the difference between precision and accuracy at this point: I think that it's best done graphically:
- Section 2.3: Rounding
  "The purpose of rounding in computation is to turn any real number into a machine number, preferably the nearest one." (p. 43)
  But there are different ways to do it. You're familiar with the "ordinary rounding" (but how do you handle ties -- that is, how do we round 19.5 to an integer?)). The authors suggest several strategies (p. 43)
  1. Rule 1: "round-to-even": if the digits following the nth digit are
    - less than 500000....
      then discard these digits
    - greater than 500000....
      then discard these digits and add 1 to the nth digit
    - exactly equal to 500000....
      then discard these digits and add 1 to the nth digit if it is odd.
    "Round-to-even" because if nth digit is even, do nothing; add 1 if odd, making it even. All nth digits become even.
  2. Rule 2: "round-to-nearest, with round away from zero in case of a tie"
    Same as Rule 1, except when exactly equal to 500000....
    then round UP (away from zero).
    Inferior to Rule 1, as ever-so-slightly biased away from zero.
  3. Rule 3: chopping -- "round-to-zero" -- truncation.
    Whatever comes after nth, just drop it.
    Inferior to Rule 1, as slightly biased toward zero.
  The biases are illustrated nicely in Figure 2.5, p. 45:
  
  There is some vocabulary here with which we should be familiar: sometimes rounding results in
  - overflow ( ${\pm{\infty}}$ )
  - underflow (below the smallest possible machine number)
  - NaN (not a number)
  Definition: A real number is said to be representable if it rounds to a machine number (neither overflow nor underflow, nor NaN).

Website maintained by Andy Long. Comments appreciated.

how many base-10 digits of accuracy you have to the left of the decimal point.

An interval approximation for e: ${e{\in}[2.71828182{,2.71828183}]}$

One that showed 0, 1, or 2 for positions; we don't see a "2" here.

The figure illustrates two cases which won't work -- 1.44 and 1.45 -- each producing an interval beyond what the data support (the top line).

If not, and we must use 1.4, then another user would assume that we only know the 4 in the tenths place to within five units, and you can see what happens to the actual uncertainty -- it expands grossly, to the interval (0.9,1.9).

The speed of light in a vacuum.

When

${|a+b|<<|a|}$ ,

and, in particular, when ${a+b{\approx}{0}}$ , i.e. when

${a{\approx}{-b}}$ .

Symmetrically, subtraction will suffer the problem when a and b are approximately equal.

Largest magnitude numbers	-9.999x10⁹	+9.999x10⁹
Smallest magnitude numbers	-1.000x10^-9	+1.000x10^-9

	Single	Double
Base	2	2
n	24	53
e	[-126:127]	[-1022:1023]

Day 03 in Mat360

Section 1.4: Error

Section 1.5: Error Propagation

Section 2.1: