%\tracingstats=2 % see just how much memory is left
\input pictex
\hsize=5.75in % Set the horizontal size to accomodate CDVI, the public domain
%             % viewer.  Use CDVI-2 for VGA monitors with VGA cards.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 
% Set up an "expressmode" for plots to speed up typing and TeXing
% until a final draft is ready.  This is done as per the PiCTeX manual.
%
\newif\ifexpressmode  % default = \expessmodefalse
%
% See explanations on pages 215ff of the TeXBook.
%
\def\yes{y }
\def\Yes{Y }
\message{Type Y for EXPRESSMODE (no plotting) now: }
\read-1 to\answer
\ifx\answer\yes\expressmodetrue
  \else\ifx\answer\Yes\expressmodetrue\fi\fi % The answer is Yes (Y or y).
\ifexpressmode
  \message {EXPRESSMODE --- No plotting}
\fi
%\expressmodetrue
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
  \newdimen\xposition
  \newdimen\yposition
  \newdimen\dyposition
  \newdimen\crossbarlength
  \def\puterrorbar at #1 #2 with fuzz #3 {%
    \xposition=\Xdistance{#1}
    \yposition=\Ydistance{#2}
    \dyposition=\Ydistance{#3}
  \setdimensionmode
  \put {$\bullet$} at {\xposition} {\yposition}
  \dimen0 = \yposition %                 ** Determine the y-location of
    \advance \dimen0 by -\dyposition %   **   the lower cross bar.
  \dimen2 = \yposition %                 **  Determine the y-location of
    \advance \dimen2 by  \dyposition %   **   the upper cross bar.
  \putrule from {\xposition} {\dimen0} % **  Place vertical rule.
    to {\xposition} {\dimen2}
  \dimen4 = \xposition
    \advance \dimen4 by -.5\crossbarlength
  \dimen6 = \xposition
    \advance \dimen6 by  .5\crossbarlength
  \putrule from {\dimen4} {\dimen0} to {\dimen6} {\dimen0}
  \putrule from {\dimen4} {\dimen2} to {\dimen6} {\dimen2}
  \setcoordinatemode}
%
\def\dev{\mathop{\rm dev}\nolimits} % Define special math. function.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\nopagenumbers
\hsize=5.75in
\centerline{\quad}
\vskip4in
\centerline{\bf Mathematical Approximation and Documentation}
\bigskip
\centerline{Harry A. Watson, Jr.}
\smallskip
\centerline{Quality Assessment Directorate}
\centerline{Naval Warfare Assessment Center}
\centerline{Corona, California 91718-5000}
\centerline{(909) 273-4787}
\vfill\eject
\centerline{\quad}
\vskip 7.5in
{\narrower\noindent
This book and the associated software fit the description in
the U. S. Copyright Act of a ``United States Government Work.''
It was written as a part of the author's official duties as
a government employee.  This means it cannot be copyrighted.  This
document and the software are freely available to the public for use
without a copyright notice, and there are no restrictions
on its use, now or subsequently.}
\vfill\eject
\footline={\hss\tenrm\folio\hss} %
\pageno=-1
\centerline{\bf Preface}
\centerline{to}
\centerline{\bf Mathematical Approximations and Documentation}
\bigskip
There are several elementary mathematical approximations that occur
on every scientific and engineering project at irregular intervals.
Frequently the same approximations appear again and again as {\it ad hoc\/}
report requests (and often with short notice).
While the programming and documentation is straightforward, the more
elaborate and expensive commercial software is often unable to deliver
suitable output.  This text includes the basic theory and elementary
computer programs in BASIC (Beginners All-purpose Symbolic Instruction
Code\footnote{$^1$}{The coding is actually done in MS-DOS QBasic 
(MicroSoft-Disk Operating System).  
There should be little problem in conversion to
GW-BASIC.}) and ``{C}.''\footnote{$^2$}{The ``C'' language is written for the
{\it Microsoft QuickC Compiler}.}  Documentation, including graphs, 
is provided in \TeX\ %
and \PiCTeX.  A later chapter is devoted to more complex graphs done in
the device-independent software {\bf METAFONT}, which can be included
in \TeX\ files.  No deep knowledge of programming is required, nor is
the user expected to be a ``\TeX-nician;'' indeed, the user is expected to
lift the coding ``as is'' and make the obvious substitutions. 
This book and the associated software fit the description in
the U. S. Copyright Act of a ``United States Government Work.''
It was written as a part of the author's official duties as
a government employee.  This means it cannot be copyrighted.  This
document and the software are freely available to the public for use
without a copyright notice, and there are no restrictions
on its use, now or subsequently.
Anyone using this document or these programs
and files does so entirely at her/his own risk.
Anyone who demands payment for the distribution of this
material must make clear that the charge 
is for distribution only and is in
no sense a license fee or purchase fee.  


The three topics covered in this textbook are (1) Simpson's rule for
numeric integration, (2) Newton's method for approximation of roots,
and (3) least-squares curve fit with a test to verify the 
choice of the underlying statistical distribution.
Many examples are provided with hand-calculations and tables.  It is
expected that the user will find portions of the \TeX\ file useful.
With this in mind, the macros ({\tt \char92 def}) are kept to a minimum
and the displayed portions are self-contained.

As often as possible, references are made to textbooks universally
available, such as outline series books found in any college bookstore
or encyclopedia articles.  The author has attempted to locate examples
of particular interest rather than pathological cases.  This is to
ensure a greater likelihood that an example may be copied directly
into a scientific or engineering paper.
\medskip
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                                                  %
% Here is a first graph that you can cut and paste.%
%                                                  %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
$$\beginpicture                                    % Draw the graph of a
  \setcoordinatesystem units <.78539in,.5in>       % Sine curve using the
  \setplotarea x from -2.25 to 4.5, y from -1 to 1 % macros of PiCTeX.
  \plotheading {The graph of $y = \sin x$}
  \axis bottom shiftedto y=0  label {$\scriptstyle y = \sin x$} ticks in
   withvalues {$\scriptstyle -\pi$} {$-{\pi\over2}$}
              {\quad\ O} {$\pi\over2$} {$\scriptstyle\pi$}
              {${3\pi\over2}$} {$\scriptstyle2\pi$} /
              at -2 -1 0 1 2 3 4 / /
  \axis left shiftedto x=0 /
\ifexpressmode
  \put {\bf EXPRESSMODE} at 2.00 0.5
\else
  \setquadratic
  \plot 0 0   0.2 0.30902  0.4 0.58779  0.6 0.80902  0.8 0.95106
              1.0 1.00000  1.2 0.95106  1.4 0.80902  1.6 0.58779
              1.8 0.30902  2.0 0.00000  2.2 -.30902  2.4 -.58779
              2.6 -.80902  2.8 -.95106  3.0 -1.0000  3.2 -.95106
              3.4 -.80902  3.6 -.58779  3.8 -.30902  4.0 0.00000 / %
  \plot 0 0   -0.2 -0.30902  -0.4 -0.58779  -0.6 -0.80902  -0.8 -0.95106
              -1.0 -1.00000  -1.2 -0.95106  -1.4 -0.80902  -1.6 -0.58779
              -1.8 -0.30902  -2.0 -0.00000  / %

\fi
\endpicture$$
\centerline{\rm F{\sevenrm IGURE} 1}  
\medskip
%
%  The sine (not the "sin" or moral failing) is used over and over
%  in scientific and engineering texts.  It is always a good "filler"
%  in any scientific document.
%
A word about fonts is in order.  The only fonts employed are from the
standard distribution (16 fonts) for \TeX.  All computer program statements
are typeset in the so-called {\tt typewriter font\/} to provide contrast
and to allow alignment of the characters for keyboard input.

The copyright for the program \TeX\ belongs to D.~E.~Knuth; the trademark
\TeX\ belongs to the American Mathematical Society; and {\tt MS-DOS} is
a trademark of the Microsoft Corporation.
\vfill\eject
\pageno=-2
\centerline{\bf TABLE OF CONTENTS}
\bigskip
{\narrower
\settabs 6 \columns
\+ Preface &&&&&\ i\cr
\medskip
\+ Table of Contents &&&&&\/ ii\cr
\medskip
\+  && Chapter 1---Simpson's Rule &&&\cr
\+ 1.0 & Introduction &&&&\ 1\cr
\+ 1.1 & Simpson's Rule &&&&\ 2\cr
\+ 1.2 & Integration Computer Programs &&&&\ 4\cr
\+ 1.3 & Vivasection of the Programs   &&&&\ 7\cr
\+ 1.4 & Numerical versus Exact Integration &&&&\ 9\cr
\+ 1.5 & Truncation Error  &&&&11\cr
\+ 1.6 & An Example &&&&12\cr
\+ 1.7 & Verifying Derivatives &&&&14\cr
\+ 1.8 & Generalizations &&&&19\cr
\medskip
\+  && Chapter 2---Newton's Method &&&\cr
\+ 2.0 & Introduction &&&& 21\cr
\+ 2.1 & Theory &&&&22\cr
\+ 2.2 & Applications &&&&23\cr
\+ 2.3 & The Algorithm &&&&26\cr
\+ 2.4 & A Second Opinion &&&&27\cr
\+ 2.5 & The Quasi-Newton Method &&&&28\cr
\+ 2.6 & Pathological Examples &&&&30\cr
\medskip
\+  && Chapter 3---Linear Least-squares Line &&&\cr
\+ 3.0 & Introduction &&&& 31\cr
\+ 3.1 & Dissection of a Graph &&&& 32\cr
\+ 3.2 & Formula Derivation &&&& 37\cr
\+ 3.3 & Goodness of Fit &&&& 39\cr
\+ 3.4 & More Theory &&&& 42\cr
\+ 3.5 & Problems &&&& 44\cr
\medskip
\+ && Chapter 4---An Adaptive Quadrature Routine &&&\cr
\+ 4.0 & Introduction &&&& 47\cr
\+ 4.1 & A Simple Approach &&&& 48\cr
\+ 4.2 & An Adaptive Quadrature Routine &&&& 49\cr
\medskip
\+ && Chapter 5---{\bf METAFONT} &&&\cr
\+ 5.0 & Introduction &&&& 51\cr
\+ 5.1 & A Metafont Program &&&& 52\cr
\medskip
\+ && Chapter 6---An Important Improper Integral &&&\cr
\+ 6.0 & Introduction  &&&& 54\cr
\+ 6.1 & Theoretical Evaluation &&&& 55\cr
\+ 6.2 & Numerical Evaluation   &&&& 57\cr
\bigskip
\+  Appendix A---Mathematical Preliminaries &&&&& 58\cr
\medskip
\+  Appendix B---References                 &&&&& 61\cr
\medskip
\+  Appendix C---A Geometric Persuasion     &&&&& 62\cr
\medskip
\+  Appendix D---A Contour Integral         &&&&& 64\cr 
\medskip
\+  Appendix E---The Malgrange-Ehrenpreis Theorem &&&&& 66\cr
}
\vfill\eject
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\pageno=1
%
\headline={\tenrm\hfill Simpson's Rule}
\centerline{\bf Chapter 1}
\bigskip
\noindent{\bf\llap{1.0\quad}Introduction.} 
When one finally finishes algebra and geometry and embarks into the
realm of calculus, all problems seem solvable.  The universe appears
to be reduced to mathematical constructs and it is only a matter of
time until mathematics can define everything.  There is a conceit
among those who first master the art and discipline of {\bf The Calculus}.
The tyro\footnote{$^3$}{A beginner in learning, a novice.  Don't rush
to your dictionary---footnotes will be provided for many words.}
feels a superiority to those who are ignornant of this
powerful tool, replete with its repertoire of skills, special symbols,
concise language, and geometric persuasions.  The euphoria which is
closely associated with the mastery of {\bf The Calculus} leaves its pupil
craving a repeat experience.  The classical development of calculus
has evolved over several centuries; the problems and examples are thus
arranged so as to direct the student along a path devoid of many of the
harsh pitfalls in applied science and engineering.  Embedded in this
development are the two classic encounters with the dirty reality of 
numerical approximation:  Newton's method for approximating roots and
Simpson's rule for approximating integrals.  These two techniques 
are ubiquitous\footnote{$^4$}{Being found everywhere at the same time.}
in numerical analysis, calculating devices, and
engineering in general.  Their study and development is the subject
of the first two chapters.  In this, the first chapter, the topic of
Simpson's rule is considered.  The coding is done in several different 
computer languages and the text edited in different formats to ensure 
complete generality.  Since Simpson's rule depends on properties of
the parabola, a general graph of a parabola is displayed below.
\bigskip
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%  The graph of a parabola.
%
\centerline{PARABOLA}
\medskip
$$\beginpicture
  \setcoordinatesystem units <1cm,1cm>
  \setplotarea x from -2.5 to 5.0, y from -1.5 to 5.0 %
  \axis bottom shiftedto y=0.0 ticks length <0pt>
    withvalues {$O$} {$x$} / at -.25 5.0 / / %
  \axis left shiftedto x=0.0 ticks length <0pt>
    withvalues {$y$} / at 5.0 / / %
  \put {$V$} [t] <2pt,-4pt>  at 1 1 %
  \put {$\scriptstyle\bullet$} at 1 1 %
\ifexpressmode
  \put {\bf EXPRESSMODE} at 3 3 %
\else
  \setquadratic
  \plot -2 5  1 1  4 5 / %  
\fi
\endpicture$$
$$y = f(x) = a x^2 + bx + c,\ a>0,$$
$$\hbox{Vertex } = V = \left(-{b\over2a},\ %
{b^2\over2a}\left({1\over2a}-1\right) + c\right).$$
\medskip
$$\hbox{F{\sevenrm IGURE} 2}$$
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\vfill\eject
\bigskip
\noindent{\bf\llap{1.1\quad}Simpson's rule.}  Most scientists and engineers
have completed elementary calculus and are aware of the technique for
numerical approximation by fitting arcs of parabolas; however, even
someone who has only completed high school mathematics (algebra and
plane geometry) will be able to apply this technique with success
(and produce a professional output that will amaze even the most
erudite\footnote{$^5$}{Possessing or displaying excessive knowledge
acquired mainly from books.} pedant on the staff).  
There are pitfalls; the expert as well as the tyro can be trapped
by some speciously simple-looking functions.\footnote{$^6$}{One pathological
example is the so-called {\it Quadratrix of Hippias}, 
$y=x\tan\big(\pi y/2\big)$.}  Having said all of this, let us begin the
basic theoretical development of Simpson's rule.\footnote{$^7$}{George B.
Thomas, Jr., {\it Calculus and Analytic Geometry}, (Reading, Massachusetts:
Add\-i\-son-Wes\-ley Publishing Company, Inc., 1966), pages 385-388.}

\medskip
\noindent
First of all, let $[\alpha,\beta]$ be a number interval, that is
$$[\alpha,\beta] = \{\,x\,\mid\,\alpha \le x \le \beta\,\}.$$
We'll use the Greek letters $\alpha$ and $\beta$ because later we'll
want to use $a$, $b$, and $c$ as coefficients of the quadratic
equation $y=ax^2+bx+c$.  
% Moreover, as we develop this method, the
% importance of the endpoints will diminish.
We will let $f$ denote a function, $f\colon [\alpha,\beta] \to {\bf R}$, where
${\bf R}=\{\,x\,\mid\,x$ is a real number $\}$.  For convenience, 
we will use let $y_\alpha = y_0 = f(\alpha)$, 
$y_\beta = y_2 = f(\beta)$, and $y_1 = f\big((\beta+\alpha)/2\big)$.
This is all bookingkeeping.  It is one of those mathematical concepts that
causes ``math anxiety;''  however, it doesn't need to.  Just consider $y$
to be the left-hand side of some expression, for instance
$$ y = x^2 - 4x +2.$$
When $x=2$ we might have $y = 2\cdot2 - 4\cdot2 + 2 = -2$.  This could
be written as $y(2)$ or $y(x=2)$ or even $y\big|_{x=2}$, which is
also written $y\big|_{2}$.  We will just
simplify things and write it as $y_2$.  This is a fundamental idea, if
the reader is confused or lost on this matter, it would be a good idea
to review an algebra book (or consult appendix A).

\medskip
\noindent
Simpson's rule follows from the so-called {\bf prismoidal formula} (for areas)
$$A_p\ =\ {h\over3}\big[\,y_0+4y_1+y_2\big]\ =\ {\beta-\alpha\over6}
\left[f(\alpha) + 4\cdot f\left({\beta+\alpha\over2}\right)
    +f(\beta)\right].$$
We will denote the left-hand side of this equation 
as $A_p$, which stands for 
the area under the arc of the parabola.  This simple formula gives the
exact area if the function, $f(x)$, is a polynomial of degree not
higher than 3.\footnote{$^8$}{Glenn James and James, Robert C., {\it
Mathematics Dictionary}, (Princeton, NJ: D. Van Nostrand Company, Inc., 1964),
pages 358-359.}
In particular, $A_p$, gives the area under the arch of the parabola
$$y\ =\ f(x)\ =\ ax^2 + bx + c$$
between $x=-h$ and $x=+h$, as in Fig. 3.\footnote{$^9$}{Without loss
of generality, we choose the interval $[a,b]$ to be $[-h,h]$.
This simplifies the algebra.  Everything is still completely general because
we can always ``shift'' a graph back and forth along the $x$-axis without
changing the area under the curve.}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%   The area under an arch of a parabola.
%
$$\beginpicture
  \setcoordinatesystem units <1cm,1cm>               % Use PiCTeX to
  \setplotarea x from -2.5 to 2.5, y from -.5 to 2.5 % construct this
  \axis bottom shiftedto y=0 label {F{\sevenrm IGURE} 3} %
     ticks length <0pt> withvalues {$\scriptstyle-h$}
     {$O$\quad} {$\scriptstyle h$} / at -2 0 2 / /
  \axis left shiftedto x=0 /
  \putrule from  -2 0  to  -2 1.5
  \putrule from  2 0  to 2 0.75
  \put {$\scriptstyle x$} at 2.75 0
  \put {$\scriptstyle y$} at 0 2.75
  \put {$\scriptstyle y_0$} at -1.75 0.75
  \put {$\scriptstyle y_1$} at  0.2 0.75
  \put {$\scriptstyle y_2$} at  1.75 0.375
\ifexpressmode
  \put {\bf EXPRESSMODE} at 1 1
\else
  \setquadratic
  \plot -2 1.5   0 1.6  2 0.75 /
\fi
\endpicture$$     
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\bigskip
\noindent
This result is readily established as follows:  in the first place, we have
$$A_p = \int_{-h}^h \left(ax^2+bx+c\right)\,dx =
\  {{ax^3\over3}+{bx^2\over2}+cx\,\bigg|}_{x=-h}^{x=h}
\ =\  {2ah^3\over3} + 2ch.$$
Since the curve passes through the three points $(-h,y_0)$, $(0,y_1)$, and
$(h,y_2)$, we also have
$$y_0 = ah^2-bh+c,\quad y_1=c,\quad y_2=ah^2+bh+c,$$
from which it follows that
$$\eqalign{  c\ &=\ y_1,\cr
ah^2 -bh      \ &=\ y_0-y_1,\cr
ah^2+bh       \ &=\ y_2-y_1,\cr
2ah^2         \ &=\ y_0+y_2-2y_1.\cr}$$
Hence, expressing the area $A_p$ in terms of the ordinates $y_0$,
$y_1$, and $y_2$, we have
$$A_p = {h\over3}\left[2ah^2 +6c\right]
  ={h\over3}\left[\big(y_0+y_2-2y_1\big) +6y_1\right]$$
or
$$A_p = {h\over3}\left[y_0 + 4y_1 + y_2\right]\ =
      \  {h\over3}\left[f(-h)+4\cdot f(0) + f(h)\right].\eqno(1)$$
%
\medskip
\noindent
Simpson's rule is derived by applying the prismoidal formula 
to successive subintervals of $[\alpha, \beta]$.  Each separate
subinterval has length $2h$.  So, we may write $[\alpha,\alpha+h]$
as $[x_0, x_1]$, $[\alpha+h, \alpha+2h]$ as $[x_1, x_2]$, and so on.
Now the curve $y=f(x)$ between $x=\alpha$ and $x=\beta$
is really $n/2$ curves.  Each separate piece of the
curve covers an $x$-subinterval of width $2h$; it is approximated by an
arch of a parabola through its ends and its mid-point.  The area under each
parabolic arc is computed as in Eq. (1) and the resulting
areas are summed together to give the rule
$$\frame <5pt> {$\displaystyle A_S = {h\over3}\big[\, y_0 + 4y_1 + 2y_2 +4y_3
   + \cdots + 2y_{n-2} + 4y_{n-1} + y_n \big].$}\eqno(2)$$
This result, $A_S$, is an approximate value of $\int_a^b f(x)\,dx$.
Recall our bookkeeping.  We let $y_0$, 
$y_1$, $y_2$, $\ldots$, $y_n$ stand for $f(x_0)$, $f(x_1)$, $\ldots$,
$f(x_n)$.  In other words,  $y_0$, $y_1$, $\ldots$, $y_n$ are the 
ordinates of the
curve $y=f(x)$ corresponding to the abscissas $x_0=a$, $x_1=a+h$, $x_2=a+2h$,
$\dots$, $x_n=a+nh=b$. 
Each abscissa point then corresponds to an endpoint of a
 subdivision of the interval
$\alpha \le x \le \beta$ into $n$ equal subintervals each of width 
$h=(\beta-\alpha)/n$.
(See Fig. 4.)  The number $n$ of subdivisions must be an {\it even\/}
integer in order for this method to work.  
Not all authors use this same scheme.  Some rely on subintervals of length
$h$ instead of $2h$.  When consulting a table book, be careful to examine
the setup for this rule.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%  Simpson's rule over a curve (after Thomas's Calculus).
%
$$\beginpicture
  \setcoordinatesystem units <1cm,1cm>
  \setplotarea x from -0.5 to 10, y from -0.5 to 3.0
  \axis bottom shiftedto y=0 label {F{\sevenrm IGURE} 4} %
     ticks length <0pt>
     withvalues {$O$} {$\alpha$} {$\leftarrow\! h\!\rightarrow$} {$\beta$} /
     at -0.25 1 3.5 7 / / %
  \axis left shiftedto x=0 / %
  \putrule from 1 0 to 1 2.8 
  \putrule from 2 0 to 2 2.9
  \putrule from 3 0 to 3 2.65
  \putrule from 4 0 to 4 2.3
  \putrule from 5 0 to 5 1.85
  \putrule from 6 0 to 6 1.5
  \putrule from 7 0 to 7 1.4
  \putrule from 3 -0.02 to 3 -0.4
  \putrule from 4 -0.02 to 4 -0.4
  \put {$ y=f(x)$} at 5 3
  \put {$ x$} at 10.25 0
  \put {$ y$} at 0 3.25
  \put {$\scriptstyle 1$} <0pt,2pt> at 1.0 2.9
  \put {$\scriptstyle 4$} <0pt,2pt> at 2.0 3.0
  \put {$\scriptstyle 2$} <0pt,2pt> at 3.0 2.75
  \put {$\scriptstyle 4$} <0pt,2pt> at 4.0 2.4
  \put {$\scriptstyle 2$} <0pt,2pt> at 5.0 1.95
  \put {$\scriptstyle 4$} <0pt,2pt> at 6.0 1.6
  \put {$\scriptstyle 1$} <0pt,2pt> at 7.0 1.5
  \put {$\scriptstyle y_0$} at 0.8 1
  \put {$\scriptstyle y_1$} at 1.8 1
  \put {$\scriptstyle y_2$} at 2.8 1
  \put {$\scriptstyle y_{n-1}$} at 5.7 1
  \put {$\scriptstyle y_{n}$} at 6.8 1
\ifexpressmode
  \put {\bf EXPRESSMODE} at 3 3
\else
  \setquadratic
    \plot  0.5 2.6  1.0 2.8  2.0 2.9  3.0 2.65  4.0 2.3
           5.0 1.85  6.0 1.5  7.0 1.4  7.5 1.5  / % 
\fi
\endpicture$$
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
It looks like the more subintervals we take, that is the smaller we
let $h$ become,
the more accurate the approximation will be to the area under
the curve (the value of the integral).  This is part of error 
estimation and will be done in a later paragraph.  In the next
paragraph we will do numerical examples and see how the method
behaves numerically.
\vfill\eject
\noindent{\bf\llap{1.2\quad}Integration Computer Programs.}  We'll list two
computer programs to do the same job---one in BASIC and one in {``C''}.
One integral that crops up again and again is the so-called elliptic integral.
We will look at a complete elliptic integral of the second kind, usually
denoted
$$E\left(k,{\pi\over2}\right)\ =\ \int\nolimits_0^{\pi/2}
   \sqrt{1-k^2\cdot\sin^2 \theta}\,d\theta.\eqno(3)$$
Finally, we need to choose a value for $k$.  If we let $e$ denote the
base of the natural logarithms, that is $e \approx 2.7182828459045\dots$,
then assign $k$ to be $k = \sin\big(e/2\big)$, we will have a good example
to experiment with.\footnote{$^{10}$}{Harry A. Watson, Jr., ``An
approximation to Planck's constant,'' {\it Abstracts\/}
of the American Mathematical Society (AMS) \# 81T--181, June 1981.}
\bigskip
{\parindent=0pt\tt\ttraggedright\obeylines
100 DEFDBL A-H, O-Z : REM DEF FNA defines the integrand (function).
105 DEF FNA (x AS DOUBLE) = SQR(1!-(SIN(EXP(1)/2)*SIN(x)){\char94}2) 
110 A = 0!: REM                      This is the lower endpoint of [a,b].
120 B = 3.141592653589793\#/2!: REM  This is the upper endpoint of [a,b].
130 SUM = 0!: REM                    This is an accumulator.
135 REM The user is asked for a number of intervals, n, even and > 0.
140 INPUT "Enter number of intervals (must be even) "; N
145 IF N <= 0 OR (2 * FIX(N / 2) - N < 0) THEN PRINT "Input error": GOTO 140
150 H = (B - A) / N: REM      The sub-interval size h = (b-a)/n.
160 FOR I = 1 TO N STEP 2: REM The "FOR" loop is done n/2 times.
170 SUM = SUM + FNA(A + (I - 1) * H)
175 PRINT A + (I - 1) * H, FNA(A + (I - 1) * H)
180 SUM = SUM + 4! * FNA(A + I * H)
185 PRINT A + I * H, FNA(A + I * H)
190 SUM = SUM + FNA(A + (I + 1) * H)
195 PRINT A + (I + 1) * H, FNA(A + (I + 1) * H)
200 NEXT I: REM After loop, the sum is y\_0+4*y\_1+2*y\_2+...+4*y\_{\char123}n-1{\char125}+y\_n.
210 PRINT "Sum = "; SUM: PRINT "  Value of integral = "; SUM * H / 3
220 REM This short program will integrate a function FNA(x) from x = a to b.
} 
\medskip
\noindent The ``C'' program is more complicated because it has the
``{\tt\#include}'' statements.  However, the ``C'' program executes faster
and is portable to other platforms.  In addition, the programs below
both use a math co-processor, which gives floating-point arithmetic.
Minor differences in the last digit may be due to the print (format) algorithm.
The BASIC program would not accept the longer value of $\pi$.
If one examines the \TeX\ file for these listings, it will be noted that
the control characters are input as {\tt\char92char92} for the backspace
({\tt\char92}), {\tt\char92char123} for the left brace ({\tt\char123}),
{\tt\char92char125} for the right brace ({\tt\char125}), and 
{\tt\char92char94} for the hat or circumflex ({\tt\char94}).
\medskip
{\parindent=0pt\tt\ttraggedright\obeylines
\#include <stdio.h>        /* Header for input/output subroutines. */
\#include <math.h>         /* Header for math subroutines. */
\#include <float.h>        /* Header for floating point subroutines. */
\medskip
\#define pi 3.141592653589793238462643383279     /* Accurate value for pi. */
\medskip
/* Simpson's rule for approximating integrals.
\qquad a:              left endpoint
\qquad b:              right endpoint
\qquad fc:             pointer to function to integrate
\qquad n:              number of subintervals
*/
double fc (double x);
\medskip
main()
{\char123}
double a,b,h,sum,x,y;     /* In 'C' all variables must be assigned */
double p1, p2, p3;
int i, n;
printf("{\char92}007");           /* Sound bell in computer. */
a = (double) 0.0;
printf("{\char92}nLeft end point    = \%.16lf",a);
b = (double) pi/2.0;
printf("{\char92}nRight end point   = \%.16lf",b);
i = -1;
while (i < 0){\char123}
printf("{\char92}nEnter number of subintervals (must be even) ");
scanf("{\char92}\%d",\&n);
i = n/2; i = 2*i - n;   /* Don't allow odd values of n.  */
if (n<=0) i = -1;       /* Don't allow zero or negative. */
{\char125}
printf("{\char92}nNumber of subintervals \%d",n);
h = (double) (b-a)/n;
for (i=1, sum=0.0; i<=n; i = i+ 2){\char123}
\quad p1 = fc((double) a+(i-1)*h);
\quad p2 = fc((double) a+i*h);
\quad p3 = fc((double) a+(i+1)*h);
\quad sum += p1 + 4.0 * p2 + p3;
/* printf("{\char92}n x, f(x) \%.16lf \%.16lf",a+(i-1)*h,p1); */
/* printf("{\char92}n x, f(x) \%.16lf \%.16lf",a+i*h,p2);     */
/* printf("{\char92}n x, f(x) \%.16lf \%.16lf",a+(i+1)*h,p3); */
{\char125}
\medskip
printf("{\char92}nValue of sum      = \%.16lf", (double) sum);
y = (double) h*sum/3.0;
printf("{\char92}nValue of integral = \%.16lf", (double) y);
printf("{\char92}n");
return(0);
{\char125}
\medskip
double fc (double x)
{\char123}
\qquad double y;
\qquad y = sqrt(1.0-sqrt(exp(1.)/2.)*sqrt(exp(1.)/2.)*sqrt(x)*sqrt(x));
\qquad return (y);
{\char125}
\medskip
/* End of file */
}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%  Parabolic curve fit for unequally spaced abscissas.
%
$$\beginpicture
  \setcoordinatesystem units <.5cm,.3cm>
  \setplotarea x from -0.25 to 20, y from -0.25 to 10.0
  \axis bottom 
     label {Parabolic curve fits unequally spaced abscissas} 
     shiftedto y=0 / %
  \axis left shiftedto x=0  / %
  \putrule from 2 0     to 2 3.8
  \putrule from 4 0     to 4 3.85
  \putrule from 5.3  0  to 5.3  3.6
  \putrule from 9 0     to 9 4.78571
  \putrule from 10. 0   to 10. 4.95
  \putrule from 14 0    to 14 8.35
  \putrule from 16 0    to 16 8.6
  \put {$O$} at 0.5 -0.75
  \put {$x$} at 19 -0.5
  \put {$f(x)$} [r] at -.25 9
\ifexpressmode
  \put {\bf EXPRESSMODE} at 10 5
\else
  \setquadratic
  \plot 1 3   3 4   7 3 / %
  \plot 5 3.5   8 4.5   12 5.5  / %
  \plot 10 5.0  13 8.0  17 8.5 / %
\fi
\endpicture$$
\centerline{F{\sevenrm IGURE} 5}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\vfill\eject
\bigskip
\noindent We'll make a table of values for various integer values
of $n$, that is, various numbers of subdivisions.  Much can be learned
from comparing the outputs of the two programs and examining their
behavior as $n$ becomes large.
\smallskip
$$\vbox{\offinterlineskip
\hrule
\halign{&\vrule#&\strut\quad\hfil#\quad&\vrule#
        &\quad\hfil#\quad&\vrule#&\quad\hfil#\quad&\vrule#
        &\quad\hfil#\quad&\vrule#\cr
height2pt&\omit&&\omit&&\omit&&\omit&\cr
&$n =$\hfil&&BASIC Program&&``C'' Program\quad&&Difference\quad&\cr
height2pt&\omit&&\omit&&\omit&&\omit&\cr
\noalign{\hrule}
height2pt&\omit&&\omit&&\omit&&\omit&\cr
\noalign{\hrule}
height2pt&\omit&&\omit&&\omit&&\omit&\cr
&  2   &&1.073441760234934 &&1.0734417602349340&&0.00000000000000&\cr
&  4   &&1.057800052054470 &&1.0578000520544700&&0.00000000000000&\cr
&  8   &&1.054898082319603 &&1.0548980823196030&&0.00000000000000&\cr
& 10   &&1.054750510268630 &&1.0547505102686300&&0.00000000000000&\cr
& 20   &&1.054686162116484 &&1.0546861621164840&&0.00000000000000&\cr
& 40   &&1.054685849666779 &&1.0546858496667790&&0.00000000000001&\cr
& 80   &&1.054685849645436 &&1.0546858496454370&&0.00000000000001&\cr
&100   &&1.054685849645437 &&1.0546858496454360&&0.00000000000001&\cr
&200   &&1.054685849645437 &&1.0546858496454370&&0.00000000000000&\cr
&500   &&1.054685849645437 &&1.0546858496454370&&0.00000000000000&\cr
&1,000 &&1.054685849645437 &&1.0546858496454370&&0.00000000000000&\cr
&2,000 &&1.054685849645436 &&1.0546858496454370&&0.00000000000001&\cr
&10,000 &&1.054685849645435 &&1.0546858496454330&&0.00000000000002&\cr
height2pt&\omit&&\omit&&\omit&&\omit&\cr}
\hrule}$$
\smallskip\noindent
Notice, if you will, two things are happening:  (1) after $n=40$ there
is no change in the first five decimal places; and, (2) after $n=80$
there is almost no change at all!  So, after $n=80$, successive terms
become closer and closer together and the calculated value converges
to a given number.  This phenomenon is called, obviously enough,
convergence.  It would be nice if we could tell {\sl beforehand\/}
just what value of $n$ to choose to ensure the accuracy we need.
Clearly, if all we want is five decimal place accuracy, $n=40$ will
suffice.  If we are using a double-precision calculation, then $n=80$
will do.  Of course, you might say why not just take $n=10000$ to start
with?  Just run the program and see how long it takes to get an answer!
It is possible that time is also a factor---even the fastest computers
take time to do the involved mathematical operations required in numerical
integration.  
\medskip\noindent
The next item to be introduced is error estimation.  It will be shown that
for $n$ subintervals, there is a number $\xi\,\in\,(\alpha, \beta)$
such that the truncation error is
$$-{(\beta-\alpha)^5\over 180n^4}\,f^{(4)}(\xi).$$
If the fourth derivative of $f(x)$ is bounded on $[\alpha, \beta]$, that
is, if there exists a number $M$ such that
$$\big| f^{(4)}(x) \big|\,\le\,M\quad\hbox{for all } x\in [\alpha,\beta],$$
then the error is bounded by
$${(\beta-\alpha)^5\,M \over 180 n^4}.$$
At this point in our development, the importance of the error bound is that
we will be assured that the larger the value of $n$, the closer the answer
will be to the exact solution.  This says that for any given degree of
accuracy, there is a positive integer $N$ such that for all $n\ge N$,
each approximation computed by Simpson's rule with $n$ terms will lie
within the given degree of accuracy to the exact solution.
\medskip\noindent
This error estimation sounds like pure theory, and so it could be skipped
without losing anything.  That's really not the case.  What we have is 
exactly equivalent to the mechanical ``tolerance.''  If you want something
within a particular tolerance, say $\epsilon$, then you must calibrate
your tools to within some accuracy, say $\delta$.  It is exactly this
so-called $\epsilon$-$\delta$ philosophy that is the basis for theoretical
analysis.  One of the great failings of pure analysis has been its inability
to relate to the industrial quality efforts and statistical error estimations.
\vfill\eject
\noindent{\bf\llap{1.3\quad}Vivasection\footnote{$^{11}$}{\rm Disection
of a living (as opposed to inoperable) computer program.} of the
Programs.}  The first item to note in the computer programs is the
definition of the function $f(x)$\footnote{$^{12}$}{Mathematicians
prefer to call a function by its single letter abbreviation, e.g.
$f$, $g$, etc.; however, since the independent variable, say $x$, is
required in the computer definitions, it seems better to use the
convention of complex analysis and keep the notation $f(x)$, $g(x)$, etc.,
so that there is no ambiguity in the computer coding.}
In the BASIC program the function is defined by the statement
\smallskip
{\tt\parindent=0pt\ttraggedright\obeylines
105 DEF FNA (x AS DOUBLE) = SQR(1!-(SIN(EXP(1)/2)*SIN(x)){\char94}2) 
}
\smallskip\noindent
while in the ``C'' program the function is defined as a separate subroutine.
``C'' is fashioned after subroutines, so this should be no surprise.  The
function definition subroutine in ``C'' is thus
\smallskip
{\tt\parindent=0pt\ttraggedright\obeylines
double fc (double x)
{\char123}
\qquad double y;
\qquad y = sqrt(1.0-sin(exp(1.)/2.)*sin(exp(1.)/2.)*sin(x)*sin(x));
\qquad return (y);
{\char125}
}
\smallskip\noindent
Notice, further, that BASIC supports exponentiation, that is 
{\tt A\char94B} means $A^B$ whereas ``C'' does not (it requires a
separate subroutine call to {\tt pow()}).  Now we will focus on
the interval endpoints, $\alpha$ and $\beta$,
of the interval $[\alpha, \beta]$.  Since the letters {\tt A, B}
and {\tt a, b} are not used elsewhere inside the programs, they can be
used for endpoint values.  BASIC is not case sensitive; ``C'' is.  We define
the endpoints (or limits of integration) as follows:
\smallskip
{\tt\parindent=0pt\ttraggedright\obeylines 
110 A = 0!: REM                      This is the lower endpoint of [a,b].
120 B = 3.141592653589793\#/2!: REM  This is the upper endpoint of [a,b].
}
\smallskip
{\tt\parindent=0pt\ttraggedright\obeylines
a = (double) 0.0;
printf("{\char92}nLeft end point    = \%.16lf",a);
b = (double) pi/2.0;
printf("{\char92}nRight end point   = \%.16lf",b);
}
\smallskip\noindent
Notice that in the ``C'' program the value of the endpoints is displayed
via a {\tt printf()} function.  It is always a good practice to display
such variables.   The next item of interest is the input of the number
of intervals, $n$.  This also determines the width (length) of the
subintervals, $h = (\beta-\alpha)/n$.  Each program does a check to ensure 
that the integer $n$ is positive and not odd.  There are many ways to make
such a determination, so it is not necessary to dwell on the matter.  The
main portion of the Simpson's rule is the iteration.  Here a variable
named {\tt sum} or {\tt SUM} is initialed to zero and functions as an
accumulator.  The values
$y_{i-1} + 4\cdot y_{i}+y_{i+1}$ are added to the accumulator for
$i$ between 1 and $n-1$ in steps of 2.  The final value of the
accumulator is multiplied by $h/3$ to get the approximation to the 
integral $\int_\alpha^\beta f(x)\,dx$.
\smallskip
{\tt\parindent=0pt\ttraggedright\obeylines
130 SUM = 0!: REM                    This is an accumulator.
150 H = (B - A) / N: REM      The sub-interval size h = (b-a)/n.
160 FOR I = 1 TO N STEP 2: REM The "FOR" loop is done n/2 times.
170 SUM = SUM + FNA(A + (I - 1) * H)
180 SUM = SUM + 4! * FNA(A + I * H)
190 SUM = SUM + FNA(A + (I + 1) * H)
200 NEXT I: REM After loop, the sum is y\_0+4*y\_1+2*y\_2+...+4*y\_{\char123}n-1{\char125}+y\_n.
}
\smallskip\noindent
Notice in the ``C'' coding below the statement {\tt sum += }$\dots$.  This
is identical to the coding {\tt sum = sum + }$\dots$.
\smallskip
{\tt\parindent=0pt\ttraggedright\obeylines
h = (double) (b-a)/n;
for (i=1, sum=0.0; i<=n; i = i+ 2){\char123}
\quad p1 = fc((double) a+(i-1)*h);
\quad p2 = fc((double) a+i*h);
\quad p3 = fc((double) a+(i+1)*h);
\quad sum += p1 + 4.0 * p2 + p3;
{\char125}
} 
\smallskip\noindent
The final calculation and the display is given below.  First for the
BASIC program and then for the ``C'' program.  The BASIC program does
its calculation in the {\tt PRINT} statement.

\smallskip
{\tt\parindent=0pt\ttraggedright\obeylines
210 PRINT "Sum = "; SUM: PRINT "  Value of integral = "; SUM * H / 3
}
\smallskip\noindent
Notice that the ``C''
program uses a subroutine ({\tt printf()}) to display its output
on the video monitor.
Notice further that in the ``C'' program the last statement is a
print with the argument {\tt "{\char92}n"}.  This is a carriage return/line
feed that is needed after the last print statement.  Otherwise, the
program will type ``{\tt Hit any key to continue}'' on the same line as
the output.
\smallskip
{\tt\parindent=0pt\ttraggedright\obeylines
printf("{\char92}nValue of sum      = \%.16lf", (double) sum);
y = (double) h*sum/3.0;
printf("{\char92}nValue of integral = \%.16lf", (double) y);
printf("{\char92}n");
}
\medskip\noindent
We will examine the error estimation for this integral.  Let's write down the
function and its first four derivatives:
$$\eqalign{f(x) &= \sqrt{1-k^2\sin^2 x}; 
    \quad\hbox{where } k = \sin\big(e/2\big);\cr
f'(x) &= -{k^2\over2}{\sin(2x)\over\sqrt{1-k^2\sin^2x} };\cr
f''(x) &= -k^2\cos2x/\sqrt{X} -k^4\sin^2(2x)/\big(4\cdot X^{3/2}\big);
   \quad\hbox{where } X = \left(1-k^2\sin^2(x)\right)\cr
f^{(3)}(x) &= 2k^2\sin(2x)/\sqrt{X} -3k^4\sin(4x)\big(4\cdot X^{3/2}\big)\cr
  &\qquad    - 3k^6\sin^3(2x)/\big(8\cdot X^{5/2}\big);\cr
f^{(4)}(x) &= 4k^2\cos(2x)/\sqrt{X} + k^4\sin^2(2x)/X^{3/2} 
      -3k^4\cos(4x)/X^{3/2}\cr
 &\qquad  -9k^6\sin(4x)\sin(2x)/\big(8\cdot X^{5/2}\big)
   -18k^6\sin^2(2x)\cos(2x)/\big(8\cdot X^{5/2}\big)\cr
 &\qquad  - 15k^8\sin^4(2x)/\big(16\cdot X^{7/2}\big).\cr}$$
Observing that if $0.0 \le x \le 1.572$, then 
$-313.8 \le f^{(4)}(x) \le 85.0$, we get a bound for
the fourth derivative on $[0,\pi/2]$.  If $M=320$, then
$$\left| f^{(4)}(x)\right| \le
M,\quad\forall\,x\in\,\left[0,{\pi\over2}\right].$$
The error is thus bounded by
$${\pi^5\cdot M\over 180\cdot32\cdot n^4}\ \approx\ {17\over n^4},$$
where $n$ is the number of subintervals of $[\alpha,\beta] = [0, \pi/2]$.
Thus, to ensure that our answer is within say .5\% of the true answer, we
would have to pick $n$ to be greater than 7 (and even!).  But this is a 
conservative estimate.  We have already seen that a much smaller value of
$n$ will suffice.  In the examples chosen for textbooks, the fourth
derivative is generally a constant (a constant-values function, that is,
a function $f^{(4)}(x) = $ constant for all $x$ under consideration).
This builds the student's confidence, but does little to reflect the
real-world situation.  The integral we have chosen is an integral that
crops up again and again in engineering problems.  Is there a satisfactory
resolution to this (common) problem?  Indeed, there is.  The idea is to
examine the approximating values for increasing values of the integer $n$
which determines the step size.  This technique is based on the theoretical
consideration that a Cauchy sequence converges.  In computer programming,
it is called an {\it adaptive quadrature routine\/} (AQR).  We will cover
the adaptive quadrature routine in a later paragraph.
\vfill\eject
\noindent{\bf\llap{1.4\quad}Numerical versus Exact Integration.}
The sophomore who has just completed an introduction to calculus
might be surprised to discover that an integral as simple as
$$\int_0^\pi \sqrt{1+\cos^2 x}\,dx\eqno(4)$$
cannot be solved by elementary techniques.  (The value of this integral
is the length of one ``arch'' of the sine curve, $y=\sin x$.  See Fig. 1
in the preface for a graph of the sine curve.)  It can be expressed in terms
of a complete elliptic integral of the second kind.  If we write
$$E\left(k,{\pi\over2}\right)
\ =\ \int_0^{\pi/2} \sqrt{1-k^2\sin^2\theta}\,d\theta,\eqno(5)$$
then we may make the simple trigonometric substitution
$$\cos^2\theta = 1 - \sin^2\theta$$
to obtain
$$\eqalign{\int_0^\pi \sqrt{1-\cos^2\theta}\,d\theta 
\ &=\ 2\cdot\sqrt{2}\cdot E(\pi/4,\pi/2)\cr
\ &=\ \sqrt{2}\,\int_0^{\pi/2}
  \sqrt{1-\sin^2(\pi/4)\cdot\sin^2(\theta)}\,d\theta\cr
&\qquad +\ \sqrt{2}\,\int_{\pi/2}^\pi 
 \sqrt{1-\sin^2(\pi/4)\cdot\sin^2(\theta)}\,d\theta,\cr}$$
where $\sin(\pi/4) = \sqrt{2}/2$.  (Recall that $\pi\over4$ 
radians\footnote{$^{13}$}{A radian is a unit a angular measure.  Basically,
$360^\circ = 2\pi$ radians.  This means that 1 radian is approximately
57.29578 degrees.}
is $45^\circ$.  Lots of tables books give values in degrees and not
in radians.  You do remember radians, don't you?  If not, check out
a trigonometry book or look in Appendix A for a review.)
We look up the value of 
$E(45^\circ, \pi/2)$ in a tables book\footnote{$^{14}$}{Dr. M. Fogiel,
{\it Handbook of
Mathematical, Scientific, and Engineering Formulas, Tables, Functions,
Graphs, Transforms}, (Piscataway, NJ: Research and Education Association,
1984), page 623.} and find that
$$E\left(45^\circ, {\pi\over2}\right) \approx 1.35006.$$
This says that the value of the integral in equation (6) should be
approximately
$$\int_0^\pi \sqrt{1+\cos^2(\theta)}\,d\theta \approx 3.82007.$$
Now we apply our integration method (Simpson's rule) and see what the
results are:
\smallskip
$$\vbox{\offinterlineskip
\hrule
\halign{&\vrule#&\strut\quad\hfil#\quad&\vrule#
        &\quad\hfil#\quad&\vrule#&\quad\hfil#\quad&\vrule#
        &\quad\hfil#\quad&\vrule#\cr
height2pt&\omit&&\omit&&\omit&&\omit&\cr
&$n =$\hfil&&BASIC Program&&``C'' Program\quad&&Difference\quad&\cr
height2pt&\omit&&\omit&&\omit&&\omit&\cr
\noalign{\hrule}
height2pt&\omit&&\omit&&\omit&&\omit&\cr
\noalign{\hrule}
height2pt&\omit&&\omit&&\omit&&\omit&\cr
&  2    &&3.575356081779317 &&3.5753560817793170&&0.000000000000000&\cr
&  4    &&3.829178925615088 &&3.8291789256150880&&0.000000000000000&\cr
&  8    &&3.820282406120432 &&3.8202824061204320&&0.000000000000000&\cr
& 16    &&3.820197813575163 &&3.8201978135751630&&0.000000000000000&\cr
& 32    &&3.820197789027719 &&3.8201977890277180&&0.000000000000001&\cr
& 64    &&3.820197789027712 &&3.8201977890277120&&0.000000000000000&\cr
& 128   &&3.820197789027713 &&3.8201977890277120&&0.000000000000001&\cr
& 256   &&3.820197789027711 &&3.8201977890277110&&0.000000000000000&\cr
& 512   &&3.820197789027712 &&3.8201977890277110&&0.000000000000001&\cr
height2pt&\omit&&\omit&&\omit&&\omit&\cr}
\hrule}$$
There is a lesson to be learned here.  If we use a more accurate
value for $E(45^\circ, \pi/2)$ than 1.3506, namely 1.3506438, then we
will see that our computed answer agrees more closely with the theoretical
answer.
Now, lets look at the coding that was changed in the BASIC and in the
``C'' programs:
\smallskip
{\tt\parindent=0pt\ttraggedright\obeylines
105 DEF FNA (x AS DOUBLE) = SQR(1!+COS(x)*COS(x)) 
}
\smallskip
{\tt\parindent=0pt\ttraggedright\obeylines
double fc (double x)
{\char123}
\qquad double y;
\qquad y = sqrt(1.0+cos(x)*cos(x));
\qquad return (y);
{\char125}
}
\smallskip\noindent
We added something else new.  In the BASIC program and in the ``C'' program
we added lines giving the (expected) theoretical value.
\smallskip
{\tt\parindent=0pt\ttraggedright\obeylines
230 PRINT "Theoretical approx = "; 2*SQR(2)*1.35064388 
}
\smallskip
{\tt\parindent=0pt\ttraggedright\obeylines
printf("Theoretical approx  = %.16lf",2.0*sqrt(2.0)*1.35064388);
printf("\char92n");
}
\medskip\noindent
We want to delve\footnote{$^{15}$}{To delve---to search carefully and
painstakingly for information.} more deeply into the whole matter of numerical
versus exact integration.  To do this, we will evaluate an integral exactly
and numerically and compare the results.  Lets integrate one arch of
the cosine curve:
$$\int_0^{\pi/2} \cos(x)\,dx = \sin(x)\Big|_0^{\pi/2} = \sin(\pi/2)
   -\sin(0) = 1.\eqno(6)$$
And, numerically, setting $A_p(n)$ to be the value of Simpson's rule
approximation for $n$ subintervals
$$\vbox{\offinterlineskip
\hrule
\halign{&\vrule#&\strut\quad\hfil#\quad&\vrule#
        &\quad#\hfil\quad&\vrule#&\quad#\hfil\quad&\vrule#
        &\quad#\hfil\quad&\vrule#\cr
height2pt&\omit&&\omit&&\omit&&\omit&\cr
&$n =$\hfil&&Simpson's rule&&$\big|1-A_p(n)\big|$\quad&
  &Error bound\quad&\cr
height2pt&\omit&&\omit&&\omit&&\omit&\cr
\noalign{\hrule}
height2pt&\omit&&\omit&&\omit&&\omit&\cr
\noalign{\hrule}
height2pt&\omit&&\omit&&\omit&&\omit&\cr
&  2    &&1.00228            &&0.00228           &&0.00332          &\cr
&  4    &&1.00013            &&0.00013           &&0.00021          &\cr
&  6    &&1.00003            &&0.00003           &&0.00004          &\cr
&  8    &&1.00001            &&0.00001           &&0.00001          &\cr
& 10    &&1.0000033922209010 &&0.0000033922209010&&0.000005312      &\cr
& 20    &&1.0000002115465910 &&0.0000002115465910&&0.000000332      &\cr
& 50    &&1.0000000054122520 &&0.0000000054122520&&0.000000008      &\cr
& 100   &&1.0000000003382360 &&0.0000000003382360&&0.0000000005     &\cr
height2pt&\omit&&\omit&&\omit&&\omit&\cr}
\hrule}$$
Now we see how the error estimate really works.  If $f(x)=\cos(x)$, then
$f^{(4)}(x) = \cos(x)$, $-1\le \cos(x)\le 1$, for all real $x$. 
$$\eqalign{f'(x)\ &=\ -\sin(x)\cr
           f''(x)\ &=\ -\cos(x)\cr
           f^{(3)}(x)\ &=\ \sin(x)\cr
           f^{(4)}(x)\ &=\ \cos(x)\cr}\eqno(7)$$
$|f^{(4)}(x)|\le 1$.  We have an error bound
$$\left|{(\beta-\alpha)^5\over180\cdot n^4}f^{(4)}(\xi)\right|
 \le {\big(\pi/2\big)^5\over180\cdot n^4}\,1
\ =\  {\pi^5\over32\cdot180\cdot n^4}.$$

\vfill\eject
%
\noindent{\bf\llap{1.5\quad}Truncation Error.}  In this section we will
derive the truncation error for Simpson's rule.  This section has some 
theory and may be omitted by someone only interested in applications.  One
might note that in the previous section that the actual error was much
less than the error bound.  The reason for the ``looseness'' in the
error bound has to do with bounding the fourth derivative over the entire
interval, $[\alpha,\beta]$.
\medskip\noindent
We will assume that the function $f(x)$ has a Taylor series expansion.  This
is a reasonable assumption for theoretical purposes; however, in the real
world it may present a problem.  Derivation with weaker conditions may be
found in standard texts.  Using equation $(1)$ for $A_p$
$$\eqalign{A_p\ &=\ {1\over3}h\big(y_0+4y_1+y_2\big)\cr
&=\ {1\over3}h\Bigg[ y_0 +4\left(y_0 +hy'_0 + {1\over2}h^2y^{(2)}_0
+{1\over6}h^3y_0^{(3)} + {1\over24}h^2y_0^{(4)} + \dots\right)\cr
&\qquad +\left(y_0 + 2hy'_0 + 2h^2y_o^{(2)} +{4\over3}h^3y_0^{(3)}
   + {2\over3}h^4y_0^{(4)}+\dots\right)\,\Bigg] \cr
&=\ {1\over4}h\left(6y_0 + 6hy_0' + 4h^2y_0^{(2)} 
  +2h^3y_0^{(3)} + {5\over6}h^4y_0^{(4)} + \dots\right)\cr}\eqno(8)$$
Now do the same thing to the integral itself.
$$\eqalign{\int_{x_0}^{x_2} y(x)\,dx\ &=\ F(x_2) - F(x_1)\cr
&=\ 2hF'(x_0) + {1\over2}(2h)^2F^{(2)}(x_0) + {1\over6}(2h)^3F^{(3)}(x_0)\cr
&\qquad+{1\over24}(2h)^4F^{(4)}(x_0) + {1\over120}(2h)^5F^{(5)}(x_0) + \dots\cr
&=\ 2hy_0 + 2h^2y_0 + {4\over3}h^3y_0^2 + {2\over3}h^4y_0^{(3)}
   + {4\over15}h^5y_0^{(4)} + \dots.\cr}\eqno(9)$$
Subtract equation (8) from equation (9) to get
$$\int_{x_0}^{x_2} y(x)\,dx - {1\over3}h\big(y_0+4y_1+y_2\big)
 = \left({4\over15}-{5\over18}\right)h^5y-0^{(4)}
\ =\ -{h^5 y_0^{(4)}\over90}\eqno(10)$$
Equation (10) follows from the arithmetic calculation
$${4\over15}-{5\over18}\ =\ {24-25\over90}\ =\ -{1\over90}.$$
If we have an interval $[\alpha,\beta]$, which can be written
as the union of intervals $[x_0,x_2]$, $[x_2,x_4]$, $\ldots$,
$[x_{n-2},x_n]$, then we may write the total truncation error
as
$$-{h^5\over90}\left(y_0^{(4)} + y_2^{(4)} + \ldots + y_{n-2}^{(4)}\right).$$
Now we make the assumption that the fourth derivative is continuous on
the interval $[\alpha,\beta]$.  $\beta-\alpha = nh$, so we have
$$-{h^5\over90}\left(y_0^{(4)} + y_2^{(4)} + \ldots + y_{n-2}^{(4)}\right)
\ =\ -{(\beta-\alpha)^5\over180\cdot n}\,y^{(4)}(\xi),$$
where $\xi\,\in\,(\alpha,\beta)$ and
$$y_0^{(4)} + y_2^{(4)} + \ldots + y_{n-2}^{(4)}
\ =\ {n\over2}\,\cdot\,y^{(4)}(\xi).$$
\vfill\eject
%
\noindent{\bf\llap{1.6\quad}An Example.}  In this section we will
consider an application to the particular function
$$\frame <5pt> {$f(x)\ =
\ (x+4\pi)\cdot(x+4\pi-1/\pi)\cdot(x+4\pi-2/\pi)$}$$
This remarkable function has properties as 
follows\footnote{$^{16}$}{The superscript before the chemical symbol
gives the value of $Z$, the atomic weight; the subscript before the
symbol gives the value of $A$, the atomic number; and, the superscript
after the symbol gives the ionization. Values taken from {\it Quantum
Physics, 2nd Ed.\/} by Eisberg and Resnick, (NY: John Wiley \& Sons,
1985), page 520.}
\medskip
{\settabs5\columns\thinmuskip=2mu
\+Ionized   
  &\underbar{\raise2pt\hbox{\qquad mass\qquad}}   &   &  &relative \cr
\+Isotope   & electron mass &\qquad $x$      &\qquad $f(x)$ &deviation\cr
\baselineskip=14pt
\+\quad${}^1_1\!{\rm H}^{+}$ & {\tt\ \ 1836.152701} &\qquad $0$      
    & {\tt\ \ 1836.15174}
    & 0.00003\%\cr
\+\quad${}^2_1\!{\rm H}^{+}$ & {\tt\ \ 3670.4830}  &\qquad $\pi$    
   & {\tt\ \ 3643.34824}
   & 0.37\%\cr
\+\quad${}^4_2{\rm He}^{++}$& {\tt\ \ 7294.2995}  &\qquad $\pi+4$  
   & {\tt\ \ 7287.74349}
   & 0.01\%\cr
\+\quad${}^9_4{\rm Be}^{+4}$& {\tt\ 16424.2099}  &\qquad $\pi+10$ 
   & {\tt\ 16364.47397}
   &0.18\%\cr
\+\quad${}^{12}_{\ 6}{\rm C}^{+6}$&{\tt\ 21868.6918} &\qquad $5\pi$
   &{\tt\ 21845.9892}
   &0.05\%\cr
\+\quad${}^{63}_{29}{\rm Cu}^{+29}$&{\tt 114684.6335} &\qquad $9\pi+8$
   &{\tt114237.3148}   &0.19\%\cr
\+\quad${}^{120}_{\ 50}{\rm Sn}^{+6}$&{\tt 218518.1598} &\qquad $14\pi+4$
   &{\tt218491.3263}   &0.05\%\cr}
\medskip\noindent
However, for our example, we will only be interested in computing the
integral of the area between the curve and the $x$-axis between the
roots $\alpha_1 = -4\pi$ and $\alpha_2 = -4\pi + 1/\pi$.  This will involve
an application of Simpson's rule.  This example was chosen because it involves
a polynomial (of degree three) which does not have integer coefficients.
Indeed, the coefficients are not even algebraic numbers---they are
transcendental numbers.  We will observe how nicely our integration formula
functions in this case.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%  Cubic graph
%
$$\beginpicture
  \setcoordinatesystem units <10cm,10000pt>
  \setplotarea x from -13.1 to -12.0, y from -0.015 to  0.015
  \plotheading {Graph of $f(x) =
         (x+4\pi)\cdot(x+4\pi-1/\pi)\cdot(x+4\pi-2/\pi)$}
  \axis bottom shiftedto y=0   % label {F{\sevenrm IGURE} 6}
         ticks numbered from -13 to -12 by 1 /
  \axis left ticks in numbered from -0.015 to 0.015 by 0.005 /
%  \savelinesandcurves on "chap1b.t01"
\put {F{\sevenrm IGURE} 6} at -12.5 -0.015
\ifexpressmode
  \put {\bf EXPRESSMODE} at -12.5 0.01
\else
  \setquadratic
  \inboundscheckon 
  \plot
 -12.65310  -0.02541
 -12.60973  -0.01066 
 -12.56637   0.00000 /
  \inboundscheckoff
  \plot
 -12.56637   0.00000
 -12.53454   0.00552
 -12.50271   0.00929
 -12.47088   0.01151
 -12.43905   0.01238
 -12.40722   0.01209
 -12.37538   0.01084
 -12.34355   0.00880
 -12.31172   0.00619
 -12.27989   0.00319
 -12.24806   0.00000 
 -12.22325  -0.00250
 -12.19845  -0.00490
 -12.17364  -0.00713
 -12.14884  -0.00908
 -12.12403  -0.01066
 -12.09922  -0.01178
 -12.07442  -0.01236
 -12.04961  -0.01229
 -12.02481  -0.01149
 -12.00000  -0.00987 /
\fi
  \putrule from -12.53454 0.0 to -12.53454  0.00552  
  \putrule from -12.50271 0.0 to -12.50271  0.00929
  \putrule from -12.47088 0.0 to -12.47088  0.01151
  \putrule from -12.43905 0.0 to -12.43905  0.01238
  \putrule from -12.40722 0.0 to -12.40722  0.01209
  \putrule from -12.37538 0.0 to -12.37538  0.01084
  \putrule from -12.34355 0.0 to -12.34355  0.00880
  \putrule from -12.31172 0.0 to -12.31172  0.00619
  \putrule from -12.27989 0.0 to -12.27989  0.00319
  \putrule from -12.24806 0.0 to -12.24806  0.00000 
%\ifexpressmode
%  \put {\bf EXPRESSMODE} at -12.5 0.01
%\else
%  \replot "chap1b.t01"
\endpicture$$
%\centerline{F{\sevenrm IGURE} 6}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\medskip\noindent
The value of the integral is calculated to be approximately
{\tt 0.0025664955636711}.  This example points out one of the strong
points of using Simpson's rule.  When the answer is computed, it is
in numeric form.  If one obtained a so-called ``closed-form'' 
solution, then it
would be necessary to plug in the values for $\pi$ to get a numeric
result.
\medskip
\noindent  Following this same argument, let's consider integrating
the same function, $f(x) = (x+4\pi)\cdot(x+4\pi-1/\pi)\cdot(x+4\pi-2/\pi)$,
over the (closed) interval $\big[-4\pi+2/\pi,\,0\big]$.  Because of the
variation of the range of $f$, we will use a logarithmic scale on the
$y$-axis.  The value ``under the curve'' is approximately
{\tt 5618.52714450635}.  We do actually have this much accuracy because
of the very precise value in our ``C'' program for the constant $\pi$.
Recall the expression
\medskip
{\tt\obeylines\parindent=0pt\ttraggedright
\#define pi 3.141592653589793238462643383279     /* Accurate value for pi. */
}
\medskip\noindent
The only changes necessary to the ``C'' program were
\medskip
{\tt\obeylines\parindent=0pt\ttraggedright
a = (double) -4.0*pi+2.0/pi;
printf("{\char92}nLeft end point    = \%.16lf",a);
b = (double) 0.0;
printf("{\char92}nRight end point   = \%.16lf",b);
}
\medskip\noindent
And, now the graph.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Figure 7.
%
$$\beginpicture
  \setcoordinatesystem units <0.5cm,10cm>
  \setplotarea x from -13 to 2, y from 0 to .60206 %
  \plotheading {\lines{ The integral of\cr 
    $f(x)=(x+4\pi)\cdot(x+4\pi-1/\pi)\cdot(x+4\pi-2/\pi)$\cr
    over the interval $\big[-4\pi+2/\pi,\,0\big]$\cr} } %
  \axis bottom shiftedto y=0 ticks 
     withvalues {$\scriptstyle -4\pi+8/\pi$}  {$-8$} {$-6$} 
     {$-4$} {$-2$} {$0$} {$2$} / %
     at -10.01989 -8 -6 -4 -2 0 2 / / %
  \axis left label {\stack {P,o,w,e,r,s, ,of, ,$10$} } %
     ticks andacross logged numbered at 1 2 3 4 / %
     unlabeled length <0pt> at 1.25 1.5 1.75 2.25 2.5 2.75 3.5 / %
     from 1.5 to 3.5 by .5 from 1.25 to 2.75 by .50 / %
  \plot
    -10.01989  0.01489  -9.51890   0.11466  -9.01790   0.18282
    -8.51691   0.23359  -8.01591   0.27351  -7.51492   0.30610
    -7.01392   0.33343  -6.51293   0.35685  -6.01193   0.37724
    -5.51094   0.39523  -5.00995   0.41129  -4.50895   0.42575
    -4.00796   0.43888  -3.50696   0.45088  -3.00597   0.46192
    -2.50497   0.47212  -2.00398   0.48159  -1.50298   0.49041
    -1.00199   0.49868  -0.50099   0.50643   0.00000   0.51374 /
  \linethickness=.8pt
  \putrule from -10.01989 0.0 to -10.01989 0.01489
  \putrule from -9.51890  0.0 to -9.51890  0.11466
  \putrule from -9.01790  0.0 to -9.01790  0.18282
  \putrule from -8.51691  0.0 to -8.51691  0.23359
  \putrule from -8.01591  0.0 to -8.01591  0.27351
  \putrule from -7.51492  0.0 to -7.51492  0.30610
  \putrule from -7.01392  0.0 to -7.01392  0.33343
  \putrule from -6.51293  0.0 to -6.51293  0.35685
  \putrule from -6.01193  0.0 to -6.01193  0.37724
  \putrule from -5.51094  0.0 to -5.51094  0.39523
  \putrule from -5.00995  0.0 to -5.00995  0.41129
  \putrule from -4.50895  0.0 to -4.50895  0.42575
  \putrule from -4.00796  0.0 to -4.00796  0.43888
  \putrule from -3.50696  0.0 to -3.50696  0.45088
  \putrule from -3.00597  0.0 to -3.00597  0.46192
  \putrule from -2.50497  0.0 to -2.50497  0.47212
  \putrule from -2.00398  0.0 to -2.00398  0.48159
  \putrule from -1.50298  0.0 to -1.50298  0.49041
  \putrule from -1.00199  0.0 to -1.00199  0.49868
  \putrule from -0.50099  0.0 to -0.50099  0.50643
  \putrule from  0.00000  0.0 to  0.00000  0.51374
  \put {$\scriptstyle\bullet$} at 0 0.51374
  \put {$\longleftarrow (0,\,1836.15)$} [l] <2pt,0pt> at 0 0.51374
  \put {$\mid$} at -11.92975 0.0  %
  \put {$\uparrow\atop -4\pi+2/\pi$} [t] <0pt,-10pt> at -11.92975 0.0 %
\endpicture$$
\centerline{F{\sevenrm IGURE} 7}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\medskip\noindent
We might also consider some other interesting values for the
function $f(x)$, as defined above, for ratios of subatomic particles'
masses to the mass of the electron.
%$$f(x)=(x+4\pi)(x+4\pi-1/\pi)(x+4\pi-2/\pi)\eqno(\dagger)$$
{\settabs5\columns\thinmuskip=2mu
\+Subatomic   
  &\underbar{\raise2pt\hbox{\qquad mass\qquad}}   &   &  &relative \cr
\+Particle   & electron mass &\qquad $x$      &\qquad $f(x)$ &deviation\cr
\baselineskip=14pt
\+\quad$n$ &{\tt\ 1838.683662}  &\qquad $1/64\pi$           
   &{\tt\ 1838.39048447}
   &0.008\%\cr
\+\quad$\tau^{-}$  &{\tt\ 3491.4} &\qquad $3-1/4\pi$         
    &{\tt\ 3488.5}
    & 0.039\%\cr
\+\quad$\eta^{0}$  &{\tt\ 1073.97}  &\qquad $-2-1/64\pi$      
    &{\tt\ 1073.7}
    & 0.014\%\cr
\+\quad$\eta'$ &{\tt\ 1873.78}  &\qquad $1/4\pi$             
    &{\tt\ 1872.2}
    & 0.069\%\cr
\+\quad$\Sigma^{+}$ &{\tt\ 2327.5} &\qquad $1+1/64\pi$      
    &{\tt\ 2326.8}
    &0.0064\%\cr
\+\quad$\Sigma^{0}$ &{\tt\ 2333.76} &\qquad $1+1/16\pi$      
    &{\tt\ 2334.3}
    &0.034\%\cr
\+\quad$\Sigma^{-}$ &{\tt\ 2343.31} &\qquad $1+1/8\pi$       
    &{\tt\ 2344.8}
    &0.049\%\cr
\+\quad$\Lambda$ &{\tt\ 2183.23} &\qquad $1-1/\pi$           
    &{\tt\ 2160.26}
    &0.53\%\cr
}
%***Other truly remarkable values not included in the tables because****
%***they either had more than two terms or did not meet the criteria.***
%***PARTICLES***
%
% f(-1-1/\pi-1/16\pi)=496.79 Mev  K^0 497.67 Mev
% f(-2-1/\pi-1/8\pi)=493.803 Mev  K^{\pm} 493.646 Mev
% f(-6+1/\pi-1/4\pi)=139.1436 Mev \pi^{\pm} 139.5675 Mev
% f(-6+1/\pi-1/64\pi)=134.0077 Mev \pi^0 134.9739 Mev
% f(\pi/4-1/4\pi-1/64\pi)= 1115.21 Mev  \Lambda 1115.63 Mev
% f(\pi+1/64\pi)=1863.5536 Mev  D^{0} 1864.5 \pm 0.5 Mev
% f(\pi+1/16\pi)=1868.9790 Mev  D^{\pm} 1869.3 \pm 0.4 Mev
% f(3-1/\pi-1/4\pi)=1672.6886 Mev  \Omega 1672.43 \pm 0.32 Mev
%
% f(8-2\pi) = 1390.93 Mev  \omega 1391 \pm 18 Mev
% f(-9+\pi)=132.9758 Mev  \pi^{0} 134.9739
%
\vfill\eject
%
\noindent{\bf\llap{1.7\quad}Verifying Derivatives.}  Calculus is traditionally
partitioned into differential and integral calculus.  This partition
supposes that the two concepts are independent and mutually exclusive.
Then, almost as a gift from the gods, we encounter the so-called
{\bf Fundamental Theorem of Calculus},\footnote{$^{17}$}{{\bf Theorem.}
{\sl Let $f$ be continuous on $[\alpha,\beta\,]$ and $x\in(\alpha,\beta\,)$.
If $F(x)=\int_a^x f(t)\,dt$, then $F'(x) = f(x)$.}} 
which unifies the two concepts
(more or less---mostly less).  In this era of digital computers and the
future directed towards artificial intelligence, it might be a good idea to
discard the artificial divisions of calculus in favor of a more 
pragmatic\footnote{$^{18}$}{Pragmatic---practical, especially in contrast
to idealistic.} development.
In this section we will examine a good technique for determining if
a function, which is supposed to be the derivative of a given function,
really is.  The idea is simple.  We have a given function and a candidate
for its derivative.  Looking at some interval, we choose four points and
use Simpson's rule to evaluate the integral of the derivative on
subintervals, ending at the four points, and compare the results with
the original function.  This is awkward to verbalize but easy to
write ``in mathematics.''  Let $f(x)$ be a given function and
let $g(x)$ be a candidate for the derivative $f'(x)$ of $f(x)$.
In the interval $[\alpha,\,\beta\,]$, choose four points,
$x_1$, $x_2$, $x_3$, and $x_4$.  (Let $\alpha < x_1 < x_2 < x_3
< x_4 < \beta$.)  We can eliminate $g(x)$ if any of
the four approximations are incorrect:
$$f\big(x_1\big) - f\big(\alpha\big)
\ \approx \int_\alpha^{x_1} g(t)\,dt$$
$$f\big(x_2\big) - f\big(\alpha\big)
\ \approx \int_\alpha^{x_2} g(t)\,dt$$
$$f\big(x_3\big) - f\big(\alpha\big)
\ \approx \int_\alpha^{x_3} g(t)\,dt$$
$$f\big(x_4\big) - f\big(\alpha\big)
\ \approx \int_\alpha^{x_4} g(t)\,dt$$
We want to examine this idea, using a non-trivial example.
One such example comes from the four de\-riv\-a\-tives
of the func\-tion $f(x)$ $=$ $\sqrt{1-k^2\sin^2(x)}$.  Recall it was
the fourth derivative that was used in the error estimate.  This
function is easy to write; however, its derivatives become more
and more difficult to expand algebraically.  How can we be sure
that we did not make some arithmetic error?  It has been shown
that, on the average, 1 out of every 300 human calculations is in
error.  This will be done via a computer program.  The program
evaluates all the integrals simultaneously and outputs the results
in a rectangular array of real numbers---a {\sl matrix}.  By visually
comparing the rows of the matrix, we can tell at once if our
derivatives are correct.  Without any more delay, let's write down
the program and its results and then do an analysis of the coding.
\medskip
{\tt\parindent=0pt\ttraggedright\obeylines
100 DEFDBL A-H, O-Z
102 C2=SIN(EXP(1.)/2.)*SIN(EXP(1.)/2.)
105 DEF FNA(x AS DOUBLE)= SQR(1.-C2*SIN(x){\char94}2)
106 DEF FNB(x AS DOUBLE)= -.5*C2*SIN(2.*x)/FNA(x)
107 DEF FNC(x AS DOUBLE)= -C2*COS(2.*x)/FNA(x)
\qquad\qquad -.25* C2*C2*SIN(2.*x){\char94}2/FNA(x){\char94}3
108 DEF FND(x AS DOUBLE)= 2.*C2*SIN(2.*x)/FNA(x)
\qquad\qquad -.75*C2*C2*SIN(4.*x)/FNA(x){\char94}3
\qquad\qquad -.375*C2{\char94}3*SIN(2.*x){\char94}3/FNA(x){\char94}5
109 DEF FNE(x AS DOUBLE)= 4.*C2*COS(2.*x)/FNA(x) 
\qquad\qquad + C2{\char94}2*SIN(2.*x){\char94}2/FNA(x){\char94}3
\qquad\qquad -3.*C2{\char94}2*COS(4.*x)/FNA(x){\char94}3 
\qquad\qquad -1.125*C2{\char94}3*SIN(4.*x)*SIN(2.*x)/FNA(x){\char94}5 
\qquad\qquad -(18./8.)*C2{\char94}3*SIN(2.*x){\char94}2*COS(2.*x)/FNA(x){\char94}5 
\qquad\qquad -(15./16.)*C2{\char94}4*SIN(2.*x){\char94}4/FNA(x){\char94}7
110 A = 0.
112 INPUT "Left endpoint = "; B
114 IF (A>B) OR (B>3.141592653589793/2.) 
\qquad\qquad THEN PRINT "Input out of range": GOTO 112
116 OPEN "92\_12\_14.txt" FOR APPEND AS \#1
120 REM Left endpoint <= 3.141592653589793\# / 2!
130 SUM1 = 0.: SUM2 = 0.: SUM3 = 0.: SUM4 = 0.
150 N = 512: H = (B - A) / N
160 FOR I = 1 TO N STEP 2: REM The "FOR" loop is done n/2 times.
170 SUM1 = SUM1 + FNB(A + (I - 1) * H)
172 SUM2 = SUM2 + FNC(A + (I - 1) * H)
174 SUM3 = SUM3 + FND(A + (I - 1) * H)
176 SUM4 = SUM4 + FNE(A + (I - 1) * H)
180 SUM1 = SUM1 + 4.*FNB(A + I * H)
182 SUM2 = SUM2 + 4.*FNC(A + I * H)
184 SUM3 = SUM3 + 4.*FND(A + I * H)
186 SUM4 = SUM4 + 4.*FNE(A + I * H)
190 SUM1 = SUM1 + FNB(A + (I + 1) * H)
192 SUM2 = SUM2 + FNC(A + (I + 1) * H)
194 SUM3 = SUM3 + FND(A + (I + 1) * H)
196 SUM4 = SUM4 + FNE(A + (I + 1) * H)
200 NEXT I
210 L\$ = "\#\#.\#\#\#\#\#  \#\#.\#\#\#\#\#  \#\#.\#\#\#\#\#  \#\#.\#\#\#\#\#  \#\#.\#\#\#\#\#"
300 PRINT USING L\$; FNA(B)-FNA(A); FNB(B)-FNB(A);
\qquad\qquad FNC(B)-FNC(A); FND(B)-FND(A)
310 PRINT USING L\$; SUM1*H/3.; SUM2*H/3.; 
\qquad\qquad SUM3*H/3.; SUM4*H/3.
320 PRINT \#1, USING L\$; FNA(B)-FNA(A); FNB(B)-FNB(A); 
\qquad\qquad FNC(B)-FNC(A); FND(B)-FND(A)
330 PRINT \#1, USING L\$; SUM1*H/3.; SUM2*H/3.; 
\qquad\qquad SUM3*H/3.; SUM4*H/3.
340 CLOSE \#1
350 STOP
360 END
}
\medskip\noindent
Of course, we want to look at the output.  For input values, we used
$.25$, $.50$, $1.00$, and $1.50$.  Rows 1 and 2 of this 4 by 8 matrix
correspond to $x=.25$, rows 3 and 4 correspond to $x=.5$, and so on.
\medskip
{\tt\settabs 4\columns
\+\quad -0.02969  &\quad-0.23615   &\quad0.03387   &\quad\ 0.27142  \cr
\+\quad -0.02969  &\quad-0.23615   &\quad0.03387   &\quad\ 0.27142  \cr
\+\quad -0.11666  &\quad-0.45528   &\quad0.13655   &\quad\ 0.55428  \cr
\+\quad -0.11666  &\quad-0.45528   &\quad0.13655   &\quad\ 0.55428  \cr
\+\quad -0.43151  &\quad-0.76446   &\quad0.62760   &\quad\ 1.73354  \cr
\+\quad -0.43151  &\quad-0.76446   &\quad0.62760   &\quad\ 1.73354  \cr
\+\quad -0.77883  &\quad-0.30495   &\quad4.81402   &\quad17.17882  \cr
\+\quad -0.77883  &\quad-0.30495   &\quad4.81402   &\quad17.17882  \cr}
\medskip\noindent  All this goes to show that the equations we
coded into the computer program are {\sl probably\/} correct.  This is
not rigorous; however, in our days and times anything that fits this
well would be a good approximation---that is, it would serve as
the derivative {\sl for all practical purposes}.  
A word of caution:  there may be
more than one way to express a given equation.  For instance,
$y=x^2+2x+1$ and $y = (x+1)^2$ are {\sl algebraically\/} the same.
Even though two expressions are algebraically they same, they may
not yield the same value from a computer.  This is due to several
factors: (1) computers may evaluate the equations differently; (2) there
may be more truncation (round-off) error with one equation than with
the other (after all, computers don't have infinite accuracy); (3)
a coding error may affect one equation
(especially doing an integer division where
the fractional part is lost); and, (4) the grouping of successive
multiplications and divisions may affect the answer, especially when
two nearly equal numbers are being subtracted.
\medskip\noindent
There are a few items worth mentioning about the computer program itself.
We ``captured'' or ``saved'' the output by {\sl appending\/}
to a file (in this case named {\tt 92\_12\_14.txt}.  Later, we edited the
file and merged it into this document.  This gave two distinct advantages:
(1) it was quicker and (2) there was no possibility of a human blunder in
re-typing the numbers.  This is always a good technique.  Capturing output
data and inputing the file directly into text ensures a correct copy.  It may
not always possible to do this.  A second alternative is to compare the 
final text against the input via some compare routine.  In Appendix B, a
method of comparing text with output will be devised, developed, and
discussed.  In BASIC\footnote{$^{19}$}{BASIC---Beginners' All-purpose Symbolic
Instruction Code.  This elementary computer code is held in disdain by
many programmers who prefer ``C.''} the coding line is 256 characters long,
more or less (usually less, maybe 254 characters).  In a text document, it
is a good idea to have lines of 80 characters or less.  To accommodate the
document, it is necessary to ``break'' long lines into shorter segments.
Whenever this is done, the line is indented.  The sequence numbers are
retained to aid in determining when a line has been broken.  Some versions
of BASIC write floating point integers using an exclamation point, {\it e.g.},
$3.00$ becomes $3!$.  All floating point integers have been written in
the standard mathematical format using a decimal point.
QBasic\footnote{$^{20}$}{QBasic---the version of BASIC copyrighted by the
Microsoft Corporation.} puts in spaces between the mathematical operators.
While the spaces add to the readability of the code and aid the programmer,
they do little for the documentation, which has to fit on an $8{1\over2}
\times 11$ inch page.  The extra spaces were removed.  For the ``\TeX-nicians,''
they may note that some symbols cause the typesetting program to suffer.
Before merging into a \TeX\ file, the special symbols backslash 
({\tt\char92}) and hat (or circumflex) ({\tt\char94}) were replaced by
their character equivalences.  The pound sign, dollar sign, and underscore
(\#, \$, and \_) were dealt with in the usual manner, by prefixing them
with backslashes ({\tt\char92}).
\medskip\noindent
As usual, we will complement the BASIC program with a ``C'' program.  The
``C'' program also saves its output to a file.
{\tt\obeylines\parindent=0pt\ttraggedright
\#include <stdio.h>        /* Header for input/output subroutines. */
\#include <math.h>         /* Header for math subroutines. */
\#include <float.h>        /* Header for floating point subroutines. */

\#define pi 3.141592653589793238462643383279     /* Accurate value for pi. */
\#define k2 0.9558669573934826

/* Simpson's rule for approximating integrals.
\qquad a:              left endpoint
\qquad b:              right endpoint
\qquad fa:             original function
\qquad fb,fc,fd,fe:    pointers to functions to integrate
\qquad n:              number of subintervals
*/
double fa (double x);
double fb (double x);
double fc (double x);
double fd (double x);
double fe (double x);

main()
\char123
double a,b,h,sum,x,y;     /* In 'C' all variables must be assigned */
double p1, p2, p3;
int i, n;
FILE  *fp;
fp = fopen("92\_12\_13.txt", "a");
printf("{\char92}007");           /* Sound bell in computer. */
a = (double) 0.0;
printf("{\char92}nLeft end point    a = \%.16lf",a);
printf("{\char92}nEnter right end point ");
scanf("\%lf",\&b);
printf("{\char92}nRight end point   b = \%.16lf",b);
n = 512;
printf("{\char92}nNumber of subintervals \%d",n);
h = (double) (b-a)/n;
for (i=1, sum=0.0; i<=n; i = i+ 2)\char123
\quad p1 = fb((double) a+(i-1)*h);
\quad p2 = fb((double) a+i*h);
\quad p3 = fb((double) a+(i+1)*h);
\quad sum += p1 + 4.0 * p2 + p3;
\char125

y = (double) h*sum/3.0;
printf("{\char92}nValue of integral   = \%.16lf", (double) y);
printf("{\char92}nValue of f(b)-f(a)  = \%.16lf", (double) fa(b)-fa(a) );
fprintf( fp, "  \%.5lf \%.5lf", y, fa(b)-fa(a) );

for (i=1, sum=0.0; i<=n; i = i+ 2)\char123
\quad p1 = fc((double) a+(i-1)*h);
\quad p2 = fc((double) a+i*h);
\quad p3 = fc((double) a+(i+1)*h);
\quad sum += p1 + 4.0 * p2 + p3;
\char125

y = (double) h*sum/3.0;
printf("{\char92}nValue of integral   = \%.16lf", (double) y);
printf("{\char92}n  f'(b) - f'(a)     = \%.16lf", (double) fb(b)-fb(a) );
fprintf( fp, "  \%.5lf \%.5lf", y, fb(b)-fb(a) );

for (i=1, sum=0.0; i<=n; i = i+ 2)\char123
\quad p1 = fd((double) a+(i-1)*h);
\quad p2 = fd((double) a+i*h);
\quad p3 = fd((double) a+(i+1)*h);
\quad sum += p1 + 4.0 * p2 + p3;
\char125

y = (double) h*sum/3.0;
printf("{\char92}nValue of integral   = \%.16lf", (double) y);
printf("{\char92}n  f''(b) - f''(a)   = \%.16lf", (double) fc(b)-fc(a) );
fprintf( fp, "  \%.5lf \%.5lf", y, fc(b)-fc(a) );

for (i=1, sum=0.0; i<=n; i = i+ 2)\char123
\quad p1 = fe((double) a+(i-1)*h);
\quad p2 = fe((double) a+i*h);
\quad p3 = fe((double) a+(i+1)*h);
\quad sum += p1 + 4.0 * p2 + p3;
\char125

y = (double) h*sum/3.0;
printf("{\char92}nValue of integral   = \%.16lf", (double) y);
printf("{\char92}n f'''(b)-f'''(a)    = \%.16lf", (double) fd(b)-fd(a) );
fprintf( fp, "  \%.5lf \%.5lf {\char92}n", y, fd(b)-fd(a) );

fclose(fp);

return(0);
\char125

double fa (double x)
\char123
\qquad double y;
\qquad y = (double) sqrt(1.0 -k2*sin(x)*sin(x));
\qquad return (y);
\char125

double fb (double x)
\char123
\qquad double y;
\qquad y = (double) -0.5*k2*sin(2.0*x)/fa(x);
\qquad return (y);
\char125

double fc (double x)
\char123
\qquad double y;
\qquad y = (double) -k2*cos(2.0*x)/fa(x)
\qquad\quad -.25*k2*k2*sin(2.0*x)*sin(2.0*x)/pow(fa(x),3.0);
\qquad return (y);
\char125

double fd (double x)
\char123
\qquad double y;
\qquad y = (double) 2.0*k2*sin(2.0*x)/fa(x)
\qquad\quad -0.75*k2*k2*sin(4.0*x)/pow(fa(x),3.0)
\qquad\quad -.375*k2*k2*k2*pow(sin(2.0*x),3.0)/pow(fa(x),5.0);
\qquad return (y);
\char125

double fe (double x)
\char123
\qquad double y;
\qquad y = (double) 4.0*k2*cos(2.0*x)/fa(x)
\qquad\quad +k2*k2*sin(2.0*x)*sin(2.0*x)/pow(fa(x),3.0)
\qquad\quad -3.0*k2*k2*cos(4.0*x)/pow(fa(x),3.0)
\qquad\quad -1.125*k2*k2*k2*sin(4.0*x)*sin(2.0*x)/pow(fa(x),5.0)
\qquad\quad -2.25*k2*k2*k2*sin(2.0*x)*sin(2.0*x)*cos(2.0*x)/pow(fa(x),5.0)
\qquad\quad -0.9375*k2*k2*k2*k2*pow(sin(2.0*x),4.0)/pow(fa(x),7.0);
\qquad return (y);
\char125

/* End of file */
}
\medskip\noindent
Of course, there is an output file to be examined also.  Here we included
values of $1.51$, $1.52$, and $1.55$.
\medskip
{\tt\settabs 8\columns
\+ -0.02969&-0.02969& -0.23615&-0.23615& 0.03387&0.03387&\ 0.27142&\ 0.27142\cr
\+ -0.11666&-0.11666& -0.45528&-0.45528& 0.13655&0.13655&\ 0.55428&\ 0.55428\cr 
\+ -0.43151&-0.43151& -0.76446&-0.76446& 0.62760&0.62760&\ 1.73354&\ 1.73354\cr 
\+ -0.77883&-0.77883& -0.30495&-0.30495& 4.81402&4.81402& 17.17882& 17.17882\cr
\+ -0.78168&-0.78168& -0.26553&-0.26553& 4.97895&4.97895& 15.74172& 15.74172\cr 
\+ -0.78414&-0.78414& -0.22454&-0.22454& 5.12756&5.12756& 13.91650& 13.91650\cr 
\+ -0.78894&-0.78894& -0.09416&-0.09416& 5.43883&5.43883&\ 6.37637&\ 6.37637\cr
} 
\vfill\eject
\noindent{\bf\llap{1.8\quad}Generalizations.}  Mathematicians love to
take a popular, useful concept and generalize it.  Sometimes much can
be learned by generalizing and by abstracting; more often than not,
a generalization results in a more complicated, theoretical, and generally
useless body of knowledge that exists solely as a requirement for a degree.
This is especially true with Simpson's rule.  Simpson's rule is popular and
efficient.  Its error calculation is straightforward and its convergence
is assured.  There are some examples where Simpson's rule does poorly---but
these are mostly ``pathological'' in nature.  (They are artificially
constructed simply to demonstrate the fallability of the rule.)  There are
also so-called improper integrals.  The improper integrals are either
defined over an interval such as $[\alpha,\,+\infty]$, $[-\infty,\,\beta\,]$,
or $[-\infty,\,+\infty]$, or the function is unbounded in the interval
of integration (or both!)  The improper integrals have to be approached with
common sense.  Take for example the integral
$$\int_0^1 {dx\over \sqrt{x}}\ =\ 2\sqrt{x}\big|_{x=1}
   \ -\ \lim_{x\to0} 2\sqrt{x}\ =\ 2.$$
How could Simpson's rule be applied to such an example?  The answer is
simple---make the computer do the work!  Choose a small number
$\epsilon$, $0 < \epsilon \ll 1$.  By a proper determination of the
number of subintervals and a sufficiently small value of $\epsilon$, the
desired answer can be obtained with the required accuracy.  
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
$$\beginpicture
  \setcoordinatesystem units <1in,.125in>
  \setplotarea x from -2 to 2, y from -0.25 to 18
  \axis bottom shiftedto y=0 label {F{\sevenrm IGURE} 8} / %
  \axis left shiftedto x=0 / %
  \putrule from 0.0625 0.0 to 0.0625 16.0
\ifexpressmode
  \put {\bf EXPRESSMODE} at 0 10.0
\else
  \setquadratic
  \plot 0.0625 16.  0.125 8.  0.25 4.  .375 2.66667  .5 2.  .75 1.3333
        1.0 1.0    1.25 0.8    1.5 0.66667   1.75  0.57142    2.0 0.5 / %
\fi
\endpicture$$
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
In the case of integrals of the form
$${1\over\sqrt{2\pi}}\,\int_0^{\infty} e^{-x^2/2}\,dx$$
(the classic normal curve), one needs only observe that the
integral may be replaced by an integral
$${1\over\sqrt{2\pi}}\,\int_0^{N} e^{-x^2/2}\,dx$$
for a suitable chosen integer $N$.  We note here that $N=6$ generally
suffices for most engineering work.  This is called the 
six-sigma ($6$-$\sigma$).  We can compute this estimate by considering
$$\int_1^N e^{-x^2/2}\,dx \le \int_1^N e^{-x/2}\,dx.$$
If $x\ge1$, then $x^2 \ge x$ and $e^{-x^2/2} \le e^{-x/2}$.
$$\int_N^\infty e^{-x/2}\,dx = 2e^{-N/2}.$$
\centerline{%
 \beginpicture %
   \setcoordinatesystem units <.5in,2.5in> %
   \setplotarea x from -3 to 3, y from 0 to .4 %
   \plotheading {\lines {%
     The density $\varphi(\zeta) = e^{-\zeta^2\!/2}/\sqrt{2\pi}$ of the\cr %
     standard normal distribution.\cr}} %
   \axis bottom ticks numbered from -3 to 3 by 1 %
     length <0pt> withvalues $\zeta$ / at 1.5 /  / %
   \linethickness=.25pt %
   \putrule from  1.5 0  to  1.5 .12952 % (.12952 = density at 1.5)
   \setbox0 = \hbox{$swarrow$}%
   \put {$\swarrow$ \raise6pt\hbox{$\varphi(\zeta)$}} %
     [bl] at 1.5 .12952 %
\ifexpressmode
  \put {\bf EXPRESSMODE} at 0 0.2
\else
   \setquadratic \plot
     0.0      .39894
     0.16667  .39344  0.33333 .37738   0.5  .35207   0.66667  .31945 
     0.83333  .28191  1.      .24197   1.25 .18265   1.5      .12952 
     1.75     .08628  2.      .05399   2.25 .03174   2.5      .01753 
     2.75     .00909  3.0     .00443 /                              
   \setquadratic \plot
     0.0      .39894
    -0.16667  .39344 -0.33333 .37738  -0.5  .35207  -0.66667  .31945 
    -0.83333  .28191 -1.      .24197  -1.25 .18265  -1.5      .12952 
    -1.75     .08628 -2.      .05399  -2.25 .03174  -2.5      .01753 
    -2.75     .00909 -3.0     .00443 /                              
% \setshadegrid span <.025in>
% \vshade  0 0 .39894   0.5 0 .35207   1 0 .24197
%          1.5 0 .12952  2 0 .05399  / %
\fi
\endpicture } %
%\smallskip
\centerline{F{\sevenrm IGURE} 9}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\medskip\noindent
Having mentioned how to generalize this rule to improper integrals, it is time
to examine the position of this integration scheme in mathematicians'
grand scheme of things.
Simpson's rule is the most popular of the so-called {\bf Newton-Cotes
integration formulas}.  The first three of which are given below
$$\eqalign{\int_{x_0}^{x_0+h} f(x)\,dx
\ &=\ {h\over2}\big(y_0+y_1\big) - {h^3\over12}f''(\xi)\cr
\int_{x_0}^{x_0+2h} f(x)\,dx
\ &=\ {h\over3}\big(y_0+4y_1+y_2\big) - {h^5\over90}f^{(4)}(\xi)\cr
\int_{x_0}^{x_0+3h} f(x)\,dx
\ &=\ {3h\over8}\big(y_0+3y_1+3y_2+y_3\big)
   -{3h^5\over90}f^{(4)}(\xi),\cr}$$
where $\xi$ is an intermediate value of $x$.  The Newton-Cotes
formulas, such as those above, are called {\sl closed} because
the interval end-points are used.  If the end-points are not used,
but only the interior points in the interval, the formula is called
open.  The Newton-Cotes formulas belong to a class of formulas known as
polynomial approximations; the class of  polynomial approximations is
contained in a class of formulas of approximation of integrals by
families of analytic functions, and so on.
\medskip
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Figure 10 --- Exponential curve
%
$$\beginpicture
 \setcoordinatesystem units <1in,1in>
 \setplotarea x from 0 to 3, y from 0 to 1
 \normalgraphs
 \axis bottom ticks
   numbered from 0 to 3 by 1
   length <0pt> withvalues $x$ / at .5 /  / 
 \linethickness=.25pt  \putrule from  .5 0  to .5 .60653
 \putrule from  1 0  to 1 .36788  \putrule from 2 0  to 2 .13534
 \put {$\scriptstyle\bullet$} at  .5 .60653
 \put {$e^{-x}$} [rt] <-4pt,-4pt> at .5 .60653
 \put {$e^{-x^2}$} [lb] <4pt,4pt> at .5 .77880
\ifexpressmode
 \put {\bf EXPRESSMODE} at 1.5 0.5
\else
 \setquadratic
 \plot  0 1   .25  .77880   .50  .60653   .75  .47237  1.00  .36788
             1.25  .28650  1.50  .22313  1.75  .17377  2.00  .13534
             2.25  .10540  2.50  .08208  2.75  .06393  3.00  .04979 / %
 \setshadegrid span <.025in>
 \vshade  1 0 .36788   1.5 0 .10540   2 0 .01832 / %
 \linethickness=0.75pt
 \setquadratic
 \plot 
    0.00   1.00000     0.25   0.93941      0.50   0.77880
    0.75   0.56978     1.00   0.36788      1.25   0.20961
    1.50   0.10540     1.75   0.04677      2.00   0.01832
    2.25   0.00633     2.50   0.00193      2.75   0.00052
    3.00   0.00012  / %
\fi
\endpicture $$
\centerline {F{\sevenrm IGURE} 10}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\vfill\eject
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\headline={\tenrm\hfill Newton's Method}
\centerline{\bf Chapter 2}
\bigskip
\noindent{\bf\llap{2.0\quad}Introduction.} 
One of the most exciting applications of differential calculus is the
use of Newton's method for the determination of roots of functions.
Most students of algebra think that the only roots of interest
are those of polynomials.  An algebra student who has done all the
assigned homework might
think that it's just a matter of factoring; after all, it was so
clever just completing the square in the quadratic equation.  
(Wasn't it fun being so smart and showing that off?)
That's simply not the case.
In fact, there is no purely algebraic method for finding roots of
polynomials of degree five or higher.  The situation with the trigonometric
functions is even more involved.  What technique would one use to find
the root or roots of $x-\cos x = 0$?
$$\beginpicture
  \setcoordinatesystem units <1in,1in>
  \setplotarea x from -.5 to 2.25, y from -.5 to 3.5
  \plotheading {\lines {Tangent line at $(x_0,y_0)$\cr
                        slope $=$ $f'(x_0)$\cr}}
  \axis bottom shiftedto y=0 ticks unlabeled  short quantity 12  / %
  \axis left shiftedto x=0 ticks unlabeled quantity 9  / %
\ifexpressmode
  \put {\bf EXPRESSMODE} at 0 1
\else
  \setquadratic
  \plot
     -0.25 -0.2344  -0.15 -0.2844  -0.05 -0.3094   0.05 -0.3094   
      0.15 -0.2844   0.25 -0.2344   0.35 -0.1594   0.45 -0.0594   
      0.55  0.0656   0.65  0.2156   0.75  0.3906   0.85  0.5906
      0.95  0.8156   1.05  1.0656   1.15  1.3406   1.25  1.6406   
      1.35  1.9656   1.45  2.3156   1.55  2.6906   1.65  3.0906   
      1.75  3.5156    / %   
  \setlinear
  \plot  2.0 3.4375   0.5 -.3125 /  %
\fi  
  \put {$\scriptstyle\bullet$} at 1.0 0.9375
  \put {$\scriptstyle\bullet$} at .625 0.0
  \put {$(x_0,y_0)$} at 1.25 .9375
  \put {$\nwarrow$\lower6pt\hbox{$x$-intercept}} 
        [lt] <0.5pt, -0.5pt> at .62 0 %
  \put {$f(x)$\lower6pt\hbox{$\searrow$}} 
        [rb]  at 1.45  2.3156
 \put {$\nwarrow$\lower6pt\hbox{tangent line}} [lt] at 1.46 2.1
\endpicture$$
\centerline{F{\sevenrm IGURE} 11}
% f(x) = y = (15/12)*(x-1/2)*(x+1/2)
% f'(x) = y' = (15/6)*x
% when x = 1, y(1) = f(1) = 15/16 = .9375
% when x = 1, y'(1) = f'(1) = 15/6
% when x = 2, the point on the tangent line is 3.4375
% when y = 0, the abscissa on the tangent line is 5/8
%
\smallskip\noindent
All of the important details of Newton's method are found in
the above figure.  The equation of the tangent line is
determined by the so-called point-slope method from elementary algebra:
$${y-y_0 \over x-x_0}\ =\ \hbox{slope } 
=\ {\Delta y\over\Delta x}\ =\ {\hbox{rise}\over\hbox{run}}
\ =\ f'(x_0).$$
Set $y=0$ and determine the $x$-intercept of the tangent line,
$\big(x_0-y_0/f'\big(x_0),0\big)$.  From the
above figure, we see that the point at which the tangent line crosses
the $x$-axis is moving quickly towards the root---the point at which
the curve $f(x)$ crosses the $x$-axis.  If we only had a better
first approximation $x_0$, then the $x_0-y_0/f'(x_0)$ would be even
closer to the root.  Why not just try again and define a new approximation?
$$x_1 = x_0 - {f(x_0)\over f'(x_0)}\eqno(11)$$
\vfill\eject
\noindent{\bf\llap{2.1\quad}Theory.}  This paragraph is concerned with the
theory behind Newton's method.  The user who is interested only in
results may skip this entire section.  This paragraph employs the so-called
Taylor series (also known as the Taylor's series)
$$f(x_0) + f'(x_0)(x-x_0) + {f''(x_0)\over2}(x-x_0)^2
  + \ldots + {f^{(n)}\over n!}(x-x_0)^n + \ldots.\eqno(12)$$
An alternate derivation using the Mean-Value Theorem will be presented
in paragraph {\bf 2.2}.  If one looks in a 
dictionary,\footnote{$^{21}$}{{\it Webster's Ninth New Collegiate
Dictionary}, (Springfield, Massachusetts: Merriam-Webster, Inc., 1984),
page 1209.} equation (12) might be written in the form
$$f(x) = f(a) + {f^{[1]}\over 1!}(x-a)
  +{f^{[2]}\over 2!}(x-a)^2 + \ldots + 
  {f^{n]}\over n!}(x-a)^n + \ldots, \eqno(12')$$
where $f^{[n]}(a)$ is the derivative of the 
$n${\raise2pt\hbox{\sevenrm th}} order of $f(x)$ evaluated at $a$.
Before delving into the mathematics, 
a digression\footnote{$^{22}$}{Digression---A swerving away from the
main subject or the {\it Leitmotif\/}; a turning aside from the main
argument.} might be in order.  Whenever a textbook says ``degree,''
it means the highest exponent or power.  For example: $x^2$ is of
second degree; the polynomial $x^3-x^2+1$ is of third degree;
and, $x^2+y^2 = 1$ is a second degree relation.  Whenever a textbook
says ``order,'' it generally refers to a more abstract concept.  In
general, $x^n$ means the $n${\raise2pt\hbox{\sevenrm th}} power
(degree $n$) of $x$, that is
$$x^n \ =\ \underbrace{x\cdot x\cdots x}_{n\ %
\hbox{\sevenrm times }}.$$
(Like every rule, this one has an exception.  In tensor analysis, the
$i$, $j$, and $k$ in expressions like $x^i$, $y^{jk}$, {\it etc.},
refer to contravariant indices, not powers.)  On the other hand,
whenever one sees such notations as $x^{(n)}$, $y^{[n]}$, and
so on, a red flag ought to go up.  Indeed, if a function does have
a derivative, it is usually denoted by $f'(x)$, the second derivative
(the derivative of the derivative) is denoted by $f''(x)$.  But,
the third derivative is not generally denoted by $f'''(x)$.  The
third derivative of $f$ at $x$ is usually written $f^{(3)}(x)$ and
sometimes written as $f^{[3]}(x)$.  What does $f^{(0)}(x)$ mean?
Recall that (for all $x\not= 0$) $x^0\equiv1$.  $f^{(0)}(x)
\equiv f(x)$.  The symbol made up of three horizontal lines 
($\equiv$) means more
that just equals ($=$); this symbol, ($\equiv$),
means equivalence and that means equal for all the values under consideration.
\medskip
\noindent  One of mathematics' favorite devices, or tricks, is to
truncate\footnote{$^{23}$}{Truncate---To cut off; to make short by trimming
or cutting off.}
a Taylor series.  In the Newton method, the series is truncated at
the quadratic term, that is, at 
${1\over2}f''\big(x_0\big)\cdot\big(x-x_0\big)^2$.  So, the truncated
series becomes
$$\eqalign{f(x)\ &=\ f(x_0) + {f^{[1]}\over 1!}(x-x_0)^1\cr
  &=\ f\big(x_0\big) + f'\big(x_0\big)\cdot\big(x-x_0\big).\cr}\eqno(13)$$
This is the derivation.  So, after all that was said and done, there's
not much to do to get the result.  Just pretend that $f(x_1)=0$ and get
$$f\big(x_1\big)\ =\ 0\ =\  f\big(x_0\big)
\ +\ f'\big(x_0\big)\big(x_1-x_0\big),$$
or
$$x_1\ =\ x_0\ -\ {f(x_0)\over f'(x_0)},\eqno(14)$$
provided, of course, that $f'(x_0) \not= 0$.  If $f'(x_0) = 0$, then
we just choose another $x_0$, say $\tilde x_0$ where 
$f'\big(\tilde x_0\big) \not= 0$ and proceed.
\vfill\eject
\noindent{\bf\llap{2.2\quad}Applications.}  One of the most important
applications of Newton's method is in the calculation of square roots.  Many 
small hand calculators have a ``hard-wired'' square root generator.  The
circuitry follows a simple algorithm.  Since long division of decimal fractions
may not be supported, we will derive an algorithm which does not involve
division and finds the square root for any number $a > 0$.
\medskip \noindent
It is clear that if we let $h(x) = x^2 -a$ and solve this equation for
the root $h(x_0)=0$, this will be the same as finding the square root
of $a$.  Newton's formula becomes $x_{n+1} = {1\over2}\big(x_{n}
- a/x_{n}\big)$.\footnote{$^{24}$}{If the following mathematics looks
too deep, try Schaum's Outline {\it Numerical Analysis, 2nd Edition},
page 334.  In any case, the two BASIC computer programs ought to be
useful.}
\medskip\noindent
The next thing that we will do is to plug the function $F(x)= a - 1/x$
into Newton's method.  Then we will look at the fixed point formula.
To determine the starting value, we'll require $\xi$ to satisfy the expression
$|f'(\xi)| < 1$.  After we've done those things, we will generate a short
little BASIC program to check out some sample values.  Then, to satisfy the
academics, we will quote the necessary theorems and satisfy the hypotheses.
\medskip \noindent
$$F(x) = a - {1\over x}\eqno(15)$$
$$F'(x) = {1 \over x^2}\eqno(16)$$
Applying $f(x) = x + g\big(F(x)\big)$, where $g(y) = -y/y'$,
$$\eqalign{f(x)\ &=\ x - F(x)/F'(x)\cr
                 &=\ x - {a - 1/x \over 1/x^2}\cr
                 &=\ x - ax^2 +x\cr
                 &=\ 2x - ax^2\cr}\eqno(17)$$
And we observe, substituting $s = 1/a$ into $f'(s)$ that
$$f'(x)\ =\ 2 - 2ax,\eqno(18)$$
$$f'(1/a)\ =\ 2 -2a(1/a)\ =\ 2-2\ =\ 0.\eqno(19)$$ 
Recall that $a$ $\not=$ $0$.
$$f''(x)\ =\ -2a\eqno(20)$$
tell us that $f''$ does not vanish at $1/a$ so we have quadratic convergence
in some (yet to be determined) neighborhood of $1/a$.
So, the required function is equation $(17)$ and the iteration procedure is
$$x_{n+1}\ =\ 2x_{n} - ax_{n}^2.\eqno(21)$$
All that remains is to find the necessary $x_0 = x_0(a)$ to get things
started.
\medskip \noindent
We would like to determine an interval $[\alpha,\,\beta\,]$ 
on which $f$ is contractive.
It isn't enough just to know that one exists, we really need to compute it.
But we have an explicit function, $f$, and $f \in C^1$.  By the mean
value theorem ({\sl MVT for derivatives\/}), we seek values for $\xi$
such that $|f'(\xi)| < 1$.  This will ensure
$$\big| f(x_1) - f(x_2)\big|\ =\ |f'(\xi)|\,|x_1 - x_2|\eqno(22)$$
$$|f'(x)|\ <\ 1$$
$$|2 -2ax|\ <\ 1$$
$$-1\ <\ 2 -2ax\ <\ 1$$
$$-3\ <\ -2ax\ <\ -1$$
If we multiply an inequality by $-{1\over2}$, it changes the 'sense' of
the inequality:
$${3\over2}\ >\ ax\ >\ {1\over2}\eqno(23)$$
Either $a > 0$ or $a < 0$, so:
$$\hbox{ interval } 
= \cases{{3\over 2a}\,>\,x\,>\,{1\over2a}, & if $a\,>\,0$\cr
         \null\cr
         {1\over 2a}\,>\,x\,>\,{3\over2a}, & if $a\,<\,0$\cr}\eqno(24)$$
\medskip\noindent
Basic program:
\medskip
{\tt\obeylines\ttraggedright
10 REM ************************************************
20 REM *                         Harry A. Watson, Jr. 
30 REM *                                    Math 219b 
40 REM *                                27 April 1991 
50 REM ************************************************
60 INPUT "a, x\_0 = "; A, X0
70 FOR I = 1 TO 100
80 X1 = 2*X0 - A*X0*X0
90 IF ABS(X1-X0) < .0001 THEN PRINT X1,1/A,I:STOP
100 X0 = X1
110 NEXT I
}
\bigskip\noindent
Now we would like to have an iterative method for computing $\root n \of{a}$,
$a > 0$, which converges locally in second order.  We only want to
employ addition, subtraction, multiplication and division (the
four arithmetic operations of the real number field.
\medskip\noindent
Again, we will be successful in using Newton's method.  The fixed point theorem
will guarantee that the iterative method converges locally in second order, 
that is, quadratically.
$$F(x)\ =\ a - x^n\eqno(25)$$
$$F'(x)\ =\ -nx^{n-1}\eqno(26)$$
$${F(x)\over F'(x)}\ =\ {a-x^n \over -nx^{n-1} }\ =\ -{a\over nx^{n-1}}
   + {x\over n}\eqno(27)$$
$$\eqalign{f(x)\ &=\ x - F(x)/F'(x)\cr
      &=\ x + {a\over nx^{n-1}} -{x\over n}\cr
      &=\ \big(1-{\textstyle{1\over n}}\big)x 
     + {\textstyle {1\over n}}\big(a/x^{n-1}\big)\cr}\eqno(28)$$
We will ensure convergence by computing $f'$ and plugging in the value
$s = \root n \of {a}$.
$$f'(x)\ = (1-1/n) - [a(n-1)]/[nx^n]. \eqno(29)$$
$$\eqalign {f'\big(\root n\of {a}\big)\ &= \big(1+{\textstyle{1\over n}}\big)
  -{\textstyle{ n-1\over n}\,{a\over a}}\cr
  &=\ 1 - {\textstyle {1\over n}} - {\textstyle {n\over n}} 
      + {\textstyle{1\over n}}\cr
  &=\ 0\cr}\eqno(30)$$
Theory assured us this would happen; our $f'$ has been correctly computed.
$$f''(x)\ =\ [a(n-1)]/x^{n+1}\eqno(31)$$
For $a > 0$,
$$f''\big(\root n\of{a}\big)\ =\ {a(n-1) \over a \root n\of{a} }\ =
 \ {n-1 \over \root n \of {a}}.\eqno(32)$$    
$n \not= 1$ in any case.  Therefore we have quadratic convergence in some
(yet to be determined) interval.  Again, requiring $|f'(x)| < 1$:
$$-1\ <\ \left(1-{1\over n}\right) -{(n-1)a \over nx^n}\ <\ 1$$
$$-2 + {1\over n}\ <\ -{(n-1)a\over nx^n}\ <\ {1\over n}$$
$${n\over (n-1)a}\left(2-{1\over n}\right)\ >\ {1\over x^n}\ >\ 
{n\over (n-1)a}{1\over n}$$
$${(n-1)a\over 2n-1}\ <\ x^n\ <\ (n-1)a$$
Therefore,
$$\root n \of {(n-1)a\over 2n-1}\ <\ x\ <\ \root n \of {(n-1)a}\eqno(33)$$
is an interval in which the iteration algorithm converges quadratically.
Finally, we note that $x^2$ is just short-hand for $x\cdot x$ or $x\,\times\,x$,
and $x^3$ stands for $x\cdot x\cdot x$ or $x\,\times\,x\,\times\,x$ and so
on.  For a given, arbitrary (but fixed) value of $n$, the above algorithm
can be written without exponentiation, substituting
$$\underbrace{x\,\cdot\,x\,\cdots\,x}_{n\ \hbox{\sevenrm times}}\ \hbox{ for }
 x^n.$$
\medskip\noindent
BASIC computer program (fast):
\medskip
{\tt\obeylines\ttraggedright
10 REM *****************************************************
20 REM *                              Harry A. Watson, Jr. 
30 REM *                                         Math 219b 
40 REM *                                     27 April 1991 
50 REM *****************************************************
60 INPUT "Root n, n = "; N
70 INPUT "a, x\_0 = "; A, X0
80 FOR I = 1 TO 100
90 X1 = (1-1/N)*X0 +A/(N*X0{\char94}(N-1))
100 IF ABS(X1-X0) < .0001 THEN PRINT X1,A{\char94}(1/N),I:STOP
110 X0 = X1
120 NEXT I
130 REM  ***************************************************
140 REM  * Recall that the cube root of 20 is approximately 
150 REM  * $e\ =\ 2.7182818\ldots$                                
160 REM  * Plug in $n\ =\ 3$, $a\ =\ 20$,                  
170 REM  * and $x_0\ =\ 3$                                 
180 REM  * Answer = 2.714418      2.714418      4 (steps)  
190 REM  ***************************************************
}
\vfill\eject
\noindent{\bf\llap{2.3\quad}The Algorithm.}  The algorithm associated with
Newton's method may be seen graphically as follows:
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%  Graph showing Newton's method convergence.
%
%
$$\beginpicture
  \setcoordinatesystem units <1cm,1cm>
  \setplotarea x from 0 to 8, y from 0 to 5
  \axis bottom  /  %
  \axis left   / % 
  \put {$\scriptstyle\bullet$} at 1.1 0  %
  \plot 6 3.5  2.75 0 / %
  \putrule from 2.75 0 to 2.75 0.85  %
  \plot 2.75 0.85   1.5 0 / %
  \put {$\scriptstyle\bullet$} at 2.75 0 %
  \put {$\scriptstyle\bullet$} at 1.5 0  %
  \linethickness=.75pt
\ifexpressmode
  \put {\bf EXPRESSMODE} at 4 2.5 %
\else
  \setquadratic
  \plot .25 -.25   3 1   6  3.5  / %
\fi
\endpicture$$
\centerline{F{\sevenrm IGURE} 12}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\medskip
$$\tan\beta = f'(x_0) = {f(x_0) \over x_0 - x_1}$$
$$x_1 = x_0 -{f(x_0) \over f'(x_0)}$$
Solve the equation $f(x)=0$.  We assume that $f$ has a continuous
first derivative, that is $f \in C^1$, where
$$C^1 = \{\,\hbox{functions with continuous } f'\,\}.$$
\medskip\noindent
We will construct the following feedback diagram:
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%  Feedback diagram
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
$$\beginpicture %
  \setcoordinatesystem units <1in,1in> %
  \setplotarea x from 0 to 4, y from 0 to 3.5 %
  \put {\lines{ Last estimate\cr $x_n$\cr} } at 2 3 %
  \put {$\displaystyle x_{n+1} = x_n - {f(x_n)\over f'(x_n)}$} at 2 2 %
  \put {\lines{ Next estimate\cr $x_{n+1}$\cr} } at 2 1 %
\ifexpressmode
  \putrule from 2.0 2.75 to 2.0 2.25 %
  \putrule from 2.0 1.75 to 2.0 1.25 %
  \putrule from 3.5 3.00 to 3.0 3.00 %
\else
  \arrow <10pt> [.2, .4]  from 2.0 2.75 to 2.0 2.25 %
  \arrow <10pt> [.2, .4]  from 2.0 1.75 to 2.0 1.25 %
  \arrow <10pt> [.2, .4]  from 3.5 3.00 to 3.0 3.00 %
\fi
  \putrule from 3.0 1.0 to 3.5 1.0 %
  \putrule from 3.5 1.0 to 3.5 3.0 %
  \linethickness=1pt
  \putrectangle corners at 1 2.75  and  3 3.25 %
  \putrectangle corners at 1 1.75  and  3 2.25 %
  \putrectangle corners at 1 0.75  and  3 1.25 %
  \put {$n := n+1$} [l] <5pt,0pt> at 3.5 2 %
  \put {$ x_0 \longrightarrow $} [r] <-2pt,0pt> at 1.00 3.00 %
  \put{F{\sevenrm IGURE} 13} [B] at 2.00 0.25 %
\endpicture$$ %
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\noindent
We will see that each error is essentially proportional to the square of
the previous error.  This means that the number of correct decimal places
roughly doubles with each successive approximation.  This is called
quadratic convergence.
\vfill\eject
\noindent{\bf\llap{2.4\quad}A Second Opinion.}  We say a derivation of
Newton's method for finding roots in paragraph 2.1.  There was an implicit
assumption made that the function in question, $f(x)$, had a Taylor series
representation.  This is a very strong assumption and we may find examples
easily which do not satisfy this condition.  One very common engineering
example can be constructed as follows:
$$R'(x)\ =\ \cases{ 1 & if $ 0 \le x < 1$;\cr
                    0 & otherwise.\cr}$$
$$R(x)\ =\ \cases{ 0 & if $x < 0$;\cr
                   x & if $ 0 \le x < 1$;\cr
                   1 & if $x \ge 1$;\cr}$$
The function $R(x)$ is the so-called {\it Ramp function\/} so highly
favored by electrical engineers.  Clearly the derivative is discontinuous
at $0$ and $1$.  We can construct $f(x) = R(x) - {1\over2}$ and this function
has a root, $x_0 = {1\over2}$, $f(1/2) = R(1/2)-1/2 = 1/2-1/2 = 0$.
We can weaken the assumption that $f(x)$ has a Taylor series representation
and keep the same basic proof if we employ the so-called {\bf Extended
Law of the Mean (Mean Value Theorem).}\footnote{$^{25}$}{John M. H. Olmsted,
{\it Advanced Calculus}, (NY: Appleton-Century-Crofts, Inc., 1956),
pages 75-85.}  
\proclaim Theorem. {\bf Extended Law of the Mean (Mean Value Theorem).}
If each of $f(x)$, $f'(x)$, $\ldots$, $f^{(n-1)}(x)$ is continuous
on a (closed) number interval $[a,b]$, and if $f^{(n)}(x)$ exists
in the (open) number interval $(a,b)$, then there exists a number
$\xi\,\in\,(a,b)$ such that
$$f(b) = f(a) + f'(a)(b-a) + {f''(x)\over2!}(b-a)^2
+ \cdots + {f^{(n-1)}(a)\over(n-1)!}(b-a)^{n-1}
+ {f^{(n)}(\xi)\over n!}(b-a)^n.\eqno(34)$$ 

\medskip\noindent
We let $n=2$ and write
$$f(x) = f(x_0) + f'(x_0)(x-x_0) + {f''(\xi)\over2}(x-x_0)^2.\eqno(35)$$
We are looking for an $x_1$ such that $f(x_1) = 0$.  This being the
case, we have
$$x_1 = x_0 - {f(x_0)\over f'(x_0)} - {f''(\xi)\over2f'(x_0)}(x_1-x_0)^2.
\eqno(36)$$

\medskip\noindent
We will examine other example of convergence, graphing the steps.  In this
case the function is not a polynomial, we will use
$$f(x) = e^{-(x-1/4)} - 1/4.\eqno(37)$$
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%   Plot showing convergence of Newton's method on transcendental function.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
$$\beginpicture %
  \setcoordinatesystem units <5cm,5cm> %
  \setplotarea x from 0 to 2, y from -.25 to .75 %
  \axis left / %
  \axis bottom shiftedto y=0  / %
\ifexpressmode
  \put {\bf EXPRESSMODE} at 1 .5 %
\else
%  \savelinesandcurves on "chap2b.t01" %
  \plot .375 0   .375 .63249   1.09171 0   1.09171 .18097
        1.51162 0    1.51162 0.03194 / %
  \linethickness=.75pt % 
  \setquadratic
  \plot .25 .75  .375 .63249  .50 .52880  .625 .43728
        .75 .35653  .875 .28526  1.00 .22236  1.25 .11787
        1.50 .03650  1.75 -0.02686  2.00 -0.07622 / %
%  \replot "chap2b.t01" %
\fi
  \put {$x_0$} <0pt,-6pt> at .375 0 %
  \put {$x_1$} <0pt,-6pt> at 1.09171 0 %
  \put {$x_2$} <0pt,-6pt> at 1.51162 0 %
  \put {$\swarrow$\raise6pt\hbox{$\big(x_0,f(x_0)\big)$}}  %
          [lb] <1pt,1pt> at .375 .63249 %
  \put {$\swarrow$\raise6pt\hbox{$\big(x_1,f(x_1)\big)$}} %
          [lb] <1pt,1pt> at 1.09171 .18097 %
  \put {$\swarrow$\raise6pt\hbox{$\big(x_2,f(x_2)\big)$}} %
          [lb] <1pt,1pt> at 1.51162 .03194 %
  \put {$\scriptstyle\bullet$} at .375 .63249 %
  \put {$\scriptstyle\bullet$} at 1.09171 .18097 %
  \put {$\scriptstyle\bullet$} at 1.51162 .03194 %
  \put {$\scriptstyle\bullet$} at .375 0 %
  \put {$\scriptstyle\bullet$} at 1.09171 0 %
  \put {$\scriptstyle\bullet$} at 1.51162 0 %
  \put {$x \rightarrow$} [l] <2pt,0pt> at 2 0 %
  \put {$y=f(x)$} [r] <-2pt,0pt> at 0 0.75 %
\endpicture$$ %
\centerline{F{\sevenrm IGURE} 14}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\vfill\eject
\noindent{\bf\llap{2.5\quad}The Quasi-Newton Method.} The quasi-Newton method,
also known as the method of {\it regula falsi\/}, does not require the
function for the derivative.  It relies on approximation of the derivative
by the ratio
$${f(x_{n-1})-f(x_{n-2}) \over x_{n-1} - x_{n-2} }.$$
Making the substitution
$$f'(x_{n-1}) \approx {f(x_{n-1})-f(x_{n-2})
  \over x_{n-1} - x_{n-2} }, $$
we obtain
$$x_n\ =\ x_{n-1} - {\big(x_{n-1}-x_{n-2}\big)\,f(x_{n-1})
  \over f(x_{n-1}) - f(x_{n-2}) }.\eqno(38)$$
We will examine a problem from a standard 
textbook\footnote{$^{26}$}{Robert Eisberg and Resnick, Robert, {\it Quantum
Physics, 2nd Edition}, (NY: John Wiley \& Sons, 1985), page 24.} 
on quantum physics which will employ this technique.  
In order to derive the Wien displacement law,
$\lambda_{\hbox{max}}T = 0.2014\,hc/k$,\footnote{$^{27}$}{$h=6.626\times
10^{-34}$ joule-sec is the
Planck constant; $c=2.998\times10^{8}$ m/sec is the speed of
light in vacuum; $k=1.381\times10^{-23}$ joule/$^\circ$K is the
Boltzmann's constant.}
it is necessary to solve the
transcendental equation
$$e^{-x} + x/5 = 1.\eqno(39)$$
We know already that the answer is approximately 5, in fact, it is
approximately 4.965.  We will use the following elementary BASIC
computer program to see how well this method works.
\medskip
{\tt\parindent=0pt\obeylines\ttraggedright
5 DEFDBL A-H, O-Z
6 OPEN "92\_12\_16.txt" FOR OUTPUT AS \#1
10 DEF fna (x) = EXP(-x) + .2 * x - 1!
20 INPUT "Initial approximation = ", x0
25 PRINT \#1, "Initial approximation = ", x0
26 PRINT \#1, "x1   =   x0 + .0001   = ", x0 + .0001
30 h = .0001
40 x1 = x0 + h
50 x2 = x1 - (h * fna(x1)) / (fna(x1) - fna(x0))
60 PRINT x2
65 PRINT \#1, "Next approximation = ", x2
70 h = x2 - x1
80 IF INKEY\$ = "" THEN GOTO 80
90 x0 = x1: x1 = x2: GOTO 50
}
\medskip\noindent
Now, we will look at the output (starting with an initial guess of 5):
\smallskip
\centerline{\bf TYPE 92\_12\_16.TXT}
\medskip
{\tt\parindent=0pt\obeylines
Initial approximation =      5 
x1   =   x0 + .0001   =      5.0001 
Next approximation =         4.965135680044722 
Next approximation =         4.965114168528612 
Next approximation =         4.965114155083961 
Next approximation =         4.965114155083955 
Next approximation =         4.965114155083955 
}
\medskip\noindent
So, it didn't take but four steps to converge!  Now, let's draw a graph and
see just how this looks graphically.  Incidently, in the real world this
particular experimental value is rarely calculated with accuracy better than
10\%.  There is a good approximation to 0.2014, namely,
$${1\over2\pi\big(1-\cos(e/2)\big)} = {1\over4.963222169} = .20148\ldots
\eqno(40)$$
(It is accurate to within 0.019\%.)
\vfill\eject
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Figure 10 --- Exponential curve with intersecting straight line.
%
$$\beginpicture
 \setcoordinatesystem units <.75in,1in>
 \setplotarea x from 0 to 6, y from 0 to 1
 \normalgraphs
 \axis bottom ticks
   numbered from 0 to 6 by 1
   length <0pt> withvalues $x$ / at .75 /  / 
 \linethickness=.25pt  \putrule from  .75 0  to .75 .47237
 \setdots \putrule from .75 .47237 to .75 .85  \setsolid
 \put {$\scriptstyle\bullet$} at  .75 .47237
 \put {$\scriptstyle\bullet$} at  .75 .85
 \put {$y_1=e^{-x}$} [lb] <4pt,0pt> at .75 .47236
 \put {$\swarrow$\raise6pt\hbox{$y_2=1-x/5$}} [lb] <2pt,2pt> at .75 .85
 \put {$\scriptstyle\bullet$} at 4.96511 0
 \put {$\swarrow$\raise6pt\hbox{$y_1 = y_2$}} [lb] <2pt,2pt> at 4.96511 0 %
\ifexpressmode
 \put {\bf EXPRESSMODE} at 1.5 0.5
\else
 \setquadratic
 \plot  0 1   .25  .77880   .50  .60653   .75  .47237  1.00  .36788
             1.25  .28650  1.50  .22313  1.75  .17377  2.00  .13534
             2.25  .10540  2.50  .08208  2.75  .06393  3.00  .04979
             3.25  .03877  3.50  .03019  3.75  .02352  4.00  .01832
             4.25  .01426  4.50  .01109  4.75  .00865  5.00  .00674
             5.25  .00525  5.50  .00409  5.75  .00318  6.00  .00248  / %
 \setlinear
 \plot 0 1  5.5 -0.1 / %
\fi
\endpicture $$
\centerline {F{\sevenrm IGURE} 15}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\medskip\noindent
Let's take a look at another example, this time an iteration for a
transcendental equation.  We want to find the positive solution
for $2\sin(x) = x$.  The first thing we will want to do is to draw a
graph and estimate the root.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%  Graph of y_1=2*sin(x) and y_2 = x
%
%  Figure 11.
%
$$\beginpicture
  \setcoordinatesystem units <1in,.5in>
  \setplotarea x from 0 to 4, y from 0 to 4
  \axis left ticks numbered from 0 to 4 by 1 /
  \axis bottom label {F{\sevenrm IGURE} 16} ticks
     numbered from 0 to 3 by 1 /
\ifexpressmode
  \put {\bf EXPRESSMODE } at 2 2 %
\else
  \setquadratic
  \plot
      0.0 0.00000  0.1 0.19967  0.2 0.39734  0.3 0.59104  0.4 0.77884 
      0.5 0.95885  0.6 1.12928  0.7 1.28844  0.8 1.43471  0.9 1.56665
      1.0 1.68294  1.1 1.78241  1.2 1.86408  1.3 1.92712  1.4 1.97090
      1.5 1.99499  1.6 1.99915  1.7 1.98333  1.8 1.94770  1.9 1.89260
      2.0 1.81859  2.1 1.72642  2.2 1.61699  2.3 1.49141  2.4 1.35093
      2.5 1.19694  2.6 1.03100  2.7 0.85476  2.8 0.66998  2.9 0.47850
      3.0 0.28224 / %
  \setlinear
  \plot 0 0  4 4 /
\fi
  \put {$\longleftarrow y_1 = 2\cdot\sin(x)$} [l] <2pt,0pt> at 2.0 1.81859 %
  \put {$\nwarrow$\lower6pt\hbox{$y_2=x$}} [lt] <-1pt,-1pt> at 2.5 2.5 %
  \put { $y_1=y_2$\lower6pt\hbox{$\downarrow$} }  %
        [rb] <5pt,1pt> at 1.8955 1.8955 %
  \setdots
  \putrule from 1.8955 0.0  to 1.8955 1.8955 /
\endpicture$$
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\medskip\noindent
Of course, we will examine both the quasi-Newton and the Newton method.
The Newton method is listed to the right and we see at once that it
has converged both faster and to greater precision.  The important point
to note here is that with a modern (fast) computer both methods converge
rapidly.  We can use the two in conjunction to check each other's results.
Moreover, with the Simpson's rule procedure to check the derivative
function, we can be assured that the derivative $f'(x)$ of $f(x)$ is
the correct function.  This comparison of two different methods is 
similar to the adaptive quadrature routine (AQR), which will be studied
in a later section. 
\medskip
%
% 
{\tt\parindent=0pt\obeylines
Initial approximation =      2 
x1   =   x0 - .0001   =      1.99990  \quad 1.9009955942039090 
Next approximation =         1.90099  \quad 1.8955116453795950
Next approximation =         1.89580  \quad 1.8954942672087130
Next approximation =         1.89550  \quad 1.8954952670339810
Next approximation =         1.89549  \quad 1.8954952670339810
Next approximation =         1.89549  \quad 1.8954925670339810
Next approximation =         1.89549  \quad 1.8954925670339810
}
\medskip\noindent
\vfill\eject
\noindent{\bf\llap{2.6\quad}Pathological examples.} Newton's method
will fail for certain situations.  There are an excellent developments of
such examples in many elementary textbooks\footnote{$^{28}$}{George B.
Thomas, Jr., {\it Calculus and Analytic Geometry, 3rd Edition},
(Reading Massachusetts: Addison-Wesley Publishing Company, 1966),
page 455.}   We won't dwell on the analytic details of the failure
of the Newton method; on the other hand, we will present graphs to
illustrate the point.  It has already been pointed out that the very
first thing to do is to construct a graph and then estimate the root.
With all the excellent graphics software available, there is no excuse
for not doing so.  Without further ado, let's first look at a graph
where Newton's method fails to converge.
$$f(x)\ =\ \cases{\sqrt{\left| x-r\right|}& if $x > r$\cr\noalign{\vskip2pt}
                  0 & if $x = 0$\cr
                  -\sqrt{\left| x-r\right|}& if $x < r$\cr}\eqno(41)$$
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Figure 17.
%
$$\beginpicture
  \setcoordinatesystem units <2in,1in>
  \setplotarea x from -.125 to 2, y from -1 to 1 %
  \plotheading {\lines{ Successive approximants\cr
                       oscillate back and forth.\cr} } %
  \axis bottom shiftedto y=0  / %
  \axis left shiftedto x=0 /
\ifexpressmode
  \put {\bf EXPRESSMODE} at 2 1 %
\else
  \linethickness=1pt
  \setquadratic
  \plot
    0.35 -0.80623  0.40 -0.77460  0.45 -0.74162  0.50 -0.70711  0.55 -0.67082
    0.60 -0.63246  0.65 -0.59161  0.70 -0.54772  0.75 -0.50000  0.80 -0.44721
    0.85 -0.38730  0.90 -0.31623  0.95 -0.22361  0.975 -0.15811
    1.00 0.00000  1.025 0.15811 1.05 0.22361
    1.10 0.31623  1.15 0.38730  1.20 0.44721  1.25 0.50000  1.30 0.54772
    1.35 0.59161  1.40 0.63246  1.45 0.67082  1.50 0.70711  1.55 0.74162
    1.60 0.77460  1.65 0.80623  / %
\fi
  \linethickness=0.4pt
  \put {$\scriptstyle\bullet$} at 0.50 -0.70711 %
  \put {$\scriptstyle\bullet$} at 1.50  0.70711 %
  \put {$\scriptstyle\bullet$} at 0.50 0.0 %
  \put {$\scriptstyle\bullet$} at 1.50 0.0 %
  \put {$x_0$} [b] <0pt,4pt> at 0.50 0.0 %
  \put {$x_1$} [t] <0pt,-4pt> at 1.50 0.0 %
  \put {$O$} [rt] <-2pt,-2pt> at 0 0 %
  \put {$r$} [lb] <4pt,2pt> at 1.0 0.00 %
  \put {$x$} at 2.1 0.0 %
  \put {$y$} at 0.0 1.1 %
  \setdashes
  \putrule from 0.5 0.0  to  0.5 -0.70711 %
  \putrule from 1.5 0.0  to  1.5  0.70711 %
  \setlinear
  \setsolid
  \linethickness=.25pt
  \plot 0.5 -0.70711  1.5 0.0 / %
  \plot 0.5 0.0  1.5 0.70711 / %
\endpicture$$
\centerline {F{\sevenrm IGURE} 17} %
\medskip\noindent
Now, let's look at a graph that really gets crazy!  This time the
successive approximations, $x_0$ and $x_1$ don't oscillate, they
get worse!
$$\beginpicture
  \setcoordinatesystem units <2in,1in>
  \setplotarea x from -.125 to 2, y from -1 to 1 %
  \plotheading {\lines{ Successive approximants\cr
                       diverge from root.\cr} } %
  \axis bottom shiftedto y=0  / %
  \axis left shiftedto x=0 /
\ifexpressmode
  \put {\bf EXPRESSMODE} at 2 1 %
\else
  \linethickness=1pt
  \setquadratic
  \plot
    0.35 -0.86624  0.40 -0.84343  0.45 -0.81932  0.50 -0.79370
    0.55 -0.76631  0.60 -0.73681  0.65 -0.70473  0.70 -0.66943
    0.75 -0.62996  0.80 -0.58480  0.85 -0.53133  0.90 -0.46416
    0.95 -0.36840  0.975 -0.29240 1.00  0.00000  1.025 0.29240
    1.05  0.36840  1.10  0.46416
    1.15  0.53133  1.20  0.58480  1.25  0.62996  1.30  0.66943
    1.35  0.70473  1.40  0.73681  1.45  0.76631  1.50  0.79370
    1.55  0.81932  1.60  0.84343  1.65  0.86624  / %
\fi
  \linethickness=0.4pt
  \put {$\scriptstyle\bullet$} at 0.50 -0.79370 %
  \put {$\scriptstyle\bullet$} at 1.50  0.79370 %
  \put {$\scriptstyle\bullet$} at 0.50 0.0 %
  \put {$\scriptstyle\bullet$} at 1.50 0.0 %
  \put {$x_0$} [b] <0pt,4pt> at 0.50 0.0 %
  \put {$x_1$} [t] <0pt,-4pt> at 1.50 0.0 %
  \put {$O$} [rt] <-2pt,-2pt> at 0 0 %
  \put {$r$} [lb] <4pt,2pt> at 1.0 0.00 %
  \put {$x$} at 2.1 0.0 %
  \put {$y$} at 0.0 1.1 %
  \setdashes
  \putrule from 0.5 0.0  to  0.5 -0.79370 %
  \putrule from 1.5 0.0  to  1.5  0.79370 %
%  \setlinear
%  \setsolid
%  \linethickness=.25pt
%  \plot 0.5 -0.79370  1.5 0.0 / %
%  \plot 0.5 0.0  1.5 0.79370 / %
\endpicture$$
\centerline {F{\sevenrm IGURE} 18} %
\vfill\eject
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\headline={\tenrm\hfill Linear Least-squares Line}
\centerline{\bf Chapter 3}
\bigskip
\noindent{\bf\llap{3.0\quad}Introduction.}  One of the best introductions
to the so-called methods of least-squares 
follows from a simple display of 
a least-squares linear fit graph properly done
(with the error bars and all the statistical appendages) and a
dissection the components  of the graph.  The points in the
given point set are graphed as central dots 
({\raise1pt\hbox{$\scriptstyle\bullet$}})
in the error bars 
$\big(\,{{\lower1pt\hbox{$\scriptstyle\top$}}
    \atop{\raise1pt\hbox{$\scriptstyle\bot$}}}\,\big)$.  
The length of the error bar  is determined by
the standard deviation, $\sigma_y$.  From the central dot to the
cross-bar is one unbiased standard deviation.   
%
  \newdimen\xposition
  \newdimen\yposition
  \newdimen\dyposition
  \newdimen\crossbarlength
  \def\puterrorbar at #1 #2 with fuzz #3 {%
    \xposition=\Xdistance{#1}
    \yposition=\Ydistance{#2}
    \dyposition=\Ydistance{#3}
  \setdimensionmode
  \put {$\bullet$} at {\xposition} {\yposition}
  \dimen0 = \yposition %                 ** Determine the y-location of
    \advance \dimen0 by -\dyposition %   **   the lower cross bar.
  \dimen2 = \yposition %                 **  Determine the y-location of
    \advance \dimen2 by  \dyposition %   **   the upper cross bar.
  \putrule from {\xposition} {\dimen0} % **  Place vertical rule.
    to {\xposition} {\dimen2}
  \dimen4 = \xposition
    \advance \dimen4 by -.5\crossbarlength
  \dimen6 = \xposition
    \advance \dimen6 by  .5\crossbarlength
  \putrule from {\dimen4} {\dimen0} to {\dimen6} {\dimen0}
  \putrule from {\dimen4} {\dimen2} to {\dimen6} {\dimen2}
  \setcoordinatemode}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%  Graph for least-squares curve fit with shading, areas, etc.
%
$$\beginpicture
  \setcoordinatesystem units <.4in,.45in>
  \crossbarlength=5pt 
  \setplotarea x from -1 to 10, y from 85 to 95 
  \plotheading { \lines{Missile `A'\cr
       Warrantied in shaded area\cr} } %
  \linethickness=.25pt
  \axis bottom  
    label { \lines{ years $\longrightarrow$\cr \noalign{\vskip3pt} %
          F{\sevenrm IGURE} 19\cr } } %
    ticks in short withvalues $1981$ $1982$ $1983$ $1984$ $1985$
    $1986$ $1987$ $1988$ $1989$ $1990$ $1991$ /
    at 0 1 2 3 4 5 6 7 8 9 10 / /
  \axis left 
    label {\lines{$\uparrow$\cr $R$\/\%\cr }}
    ticks in short numbered from 85 to 95 by 1 /
  \puterrorbar at 0 90.08   with fuzz  0.180
  \puterrorbar at 1 90.57   with fuzz  0.180
  \puterrorbar at 2 90.76   with fuzz  0.180
  \puterrorbar at 3 91.30   with fuzz  0.180
  \puterrorbar at 4 91.57   with fuzz  0.180
  \puterrorbar at 5 92.44   with fuzz  0.180
  \puterrorbar at 6 92.87   with fuzz  0.180
  \puterrorbar at 7 93.68   with fuzz  0.180
  \puterrorbar at 8 93.85   with fuzz  0.180
  \puterrorbar at 9 94.49   with fuzz  0.180
  \puterrorbar at 10 94.88   with fuzz  0.180
\ifexpressmode
  \put {\bf EXPRESSMODE} at 5 90
\else
  \linethickness=.4pt
  \plot 0. 89.9077 10. 94.9086 /
  \setlinear
  \setshadegrid span <4pt>
  \vshade 4 85 95  7 85 95 /
  \vshade 7 85 86.5  9.1 85 86.5 / 
  \vshade 9 85 95 10 85 95 /
  \setlinear
  \setshadegrid span <4pt>
  \hshade 87.5 7 9.1 95 7 9.1 /
\fi
  \put {\frame <3pt> {\lines {Unwarrantied\cr Region\cr}}} at 2 93.5
  \put {\frame <4pt> {$\sigma_y = 0.180$}} at 8 87
\endpicture$$
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
The underlying principle of the method of least squares is
the determination of a curve which
minimizes the squares of the deviations 
of a given point set to that curve.  This is another
way of saying that we will determine a curve such that
the sum of the squares of the distances from the
points in a given point set to the curve is a minimum.
Since the sought-for curve is a straight line, this
case is frequently referred to as a linear least-squares
curve fit.  The determined line is sometimes called a trend line,
sometimes called a regression line, and sometimes called
a line of best fit.
The technique is known as the {\bf maximum likelihood
criterion}.
\footnote{$^{29}$}{Glenn James and James, Robert C.,
{\it Mathematics Dictionary}, (Princeton, NJ: D. Van Nostrand
Company, Inc., 1964), page 253.}
The theory involves an understanding of partial derivatives;  it will
be covered later in an optional paragraph.  The computer programs
are straightforward to easy to apply;  they will be covered first.
\vfill\eject
\noindent{\bf\llap{3.1\quad}Dissection of a graph.}  The first thing to
consider in any least-squares construction is the underlying point set.
This point set is sometimes called the sample space, sometimes called the
sample data, and sometimes called the sample population.  Confusion arises
because of the various names that are attached to the data points.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%  Graph giving only the data points
%
$$\beginpicture
  \setcoordinatesystem units <.3in,.2in>
  \setplotarea x from -1 to 10, y from 85 to 95 
  \plotheading { Sample Data } %
  \linethickness=.25pt
  \axis bottom  %
    label { \lines{ abscissa ($x$-axis) $\longrightarrow$\cr %
    \noalign{\vskip3pt} %
    F{\sevenrm IGURE} 20\cr} } %
    ticks in short withvalues {} {} {} {} {} {} {} {} {} {}
    {} /  at 0 1 2 3 4 5 6 7 8 9 10 / / %
  \axis left 
    label {\lines{$\uparrow$\cr $y(x)$\cr }}
    ticks in short withvalues {} {} {} {} {} {} {} {}
    {} {} {} / at 85 86 87 88 89 90 91 92 93 94 95 / / %
  \put {$\scriptstyle\bullet$} at 0 90.08   
  \put {$\scriptstyle\bullet$} at 1 90.57 
  \put {$\scriptstyle\bullet$} at 2 90.76  
  \put {$\scriptstyle\bullet$} at 3 91.30   
  \put {$\scriptstyle\bullet$} at 4 91.57   
  \put {$\scriptstyle\bullet$} at 5 92.44  
  \put {$\scriptstyle\bullet$} at 6 92.87   
  \put {$\scriptstyle\bullet$} at 7 93.68   
  \put {$\scriptstyle\bullet$} at 8 93.85  
  \put {$\scriptstyle\bullet$} at 9 94.49   
  \put {$\scriptstyle\bullet$} at 10 94.88   
\ifexpressmode
  \put {\bf EXPRESSMODE} at 5 90
\else
%  \linethickness=.4pt
%  \plot 0. 89.9077 10. 94.9086 /
\fi
\endpicture$$
%\centerline{F{\sevenrm IGURE} 21}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
The next thing to consider are the arithmetic quantities associated with the
data points (with the so-called point set).  If we simply add up all the
$x$-values (the abscissas) and divide by the count of the number of points,
we will get the $x$-average, also called the $x$-mean, and usually denoted
as
$$\frame <5pt> {$\displaystyle \mu_x\ =\ {\sum_{i=1}^n x_i \over n}.$}
   \eqno(42)$$
In a similar manner, we compute the $y$-average, $\mu_y$,
$$\frame <5pt> {$\displaystyle \mu_y\ =\ {\sum_{i=1}^n y_i \over n}.$}
   \eqno(43)$$
The averages $\mu_x$ and $\mu_y$ are familiar.  Other computations are also
necessary to obtain the ingredients needed to calculate the trend line, 
shown below.  Because this line is the best possible fit to the data, it is
also known as the {\sl line of best fit}.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%  Graph showing the trend line and data only
%
$$\beginpicture
  \setcoordinatesystem units <.3in,.2in>
  \setplotarea x from -1 to 10, y from 85 to 95 
  \plotheading { Sample Data } %
  \linethickness=.25pt
  \axis bottom  %
    label {\lines{ abscissa ($x$-axis) $\longrightarrow$\cr %
    \noalign{\vskip3pt} %
    F{\sevenrm IGURE} 21\cr } } %
    ticks in short withvalues {} {} {} {} {} {} {} {} {} {}
    {} /  at 0 1 2 3 4 5 6 7 8 9 10 / / %
  \axis left 
    label {\lines{$\uparrow$\cr $y(x)$\cr }}
    ticks in short withvalues {} {} {} {} {} {} {} {}
    {} {} {} / at 85 86 87 88 89 90 91 92 93 94 95 / / %
  \put {$\scriptstyle\bullet$} at 0 90.08   
  \put {$\scriptstyle\bullet$} at 1 90.57 
  \put {$\scriptstyle\bullet$} at 2 90.76  
  \put {$\scriptstyle\bullet$} at 3 91.30   
  \put {$\scriptstyle\bullet$} at 4 91.57   
  \put {$\scriptstyle\bullet$} at 5 92.44  
  \put {$\scriptstyle\bullet$} at 6 92.87   
  \put {$\scriptstyle\bullet$} at 7 93.68   
  \put {$\scriptstyle\bullet$} at 8 93.85  
  \put {$\scriptstyle\bullet$} at 9 94.49   
  \put {$\scriptstyle\bullet$} at 10 94.88   
\ifexpressmode
  \put {\bf EXPRESSMODE} at 5 90
\else
  \linethickness=.4pt
  \plot 0. 89.9077 10. 94.9086 /
\fi
  \put {$\nwarrow\lower6pt\hbox{Line of Best Fit}$} [lt] <1pt,1pt> at 4.5 92.0
\endpicture$$
%\centerline{F{\sevenrm IGURE} 21}
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\vfill\eject
\noindent Now is a good time to introduce the BASIC computer program.  
Since we have seen several BASIC and ``C'' programs already, this
listing will be provided {\it without comments}.  The statistical
parameters $\mu_x$ and $\mu_y$ will be denoted as {\tt XMU} and
{\tt YMU}, respectively.  The statistical parameters for the
standard deviations, $\sigma_x$, $\sigma_y$, and
$\sigma_{xy}$ (biased and unbiased), will be denoted
{\tt SIGMAX}, {\tt SIGMAY}, and {\tt SIGMAXY}, respectively.
\smallskip
{\tt\obeylines\parindent=0pt\ttraggedright
100 CONST N = 11
110 DEFDBL A-H, O-Z: REM Define everything double precision except the indices.
120 DIM X(N), Y(N)
130 DATA 1981.,1982.,1983.,1984.,1985.,1986.,1987.,1988.,1989.,1990.,1991.
140 DATA 90.08,90.57,90.76,91.30,91.57,92.44,92.87,93.68,93.85,94.49,94.88
150 FOR I = 1 TO N: READ X(I): NEXT I
160 FOR I = 1 TO N: READ Y(I): NEXT I
170 SUMX = 0!: SUMY = 0!: SUMX2 = 0!: SUMXY = 0!: SUMX2 = 0!
180 SIGMAX = 0!: SIGMAY = 0!: SIGMAXY = 0!
190 FOR I = 1 TO N
200 SUMX = SUMX + X(I): SUMY = SUMY + Y(I): SUMX2 = SUMX2 + X(I) * X(I)
210 SUMXY = SUMXY + X(I) * Y(I): SUMY2 = SUMY2 + Y(I) * Y(I)
220 NEXT I
230 DELTA = CDBL(N) * SUMX2 - SUMX * SUMX
240 A = (SUMX2 * SUMY - SUMX * SUMXY) / DELTA
250 B = (CDBL(N) * SUMXY - SUMX * SUMY) / DELTA
260 CLS : PRINT "A (x-intercept) = "; A
270 PRINT "B (slope) = "; B
280 XMU = SUMX / CDBL(N): YMU = SUMY / CDBL(N)
290 FOR I = 1 TO N
300 SIGMAX = SIGMAX + (XMU - X(I)) * (XMU - X(I))
310 SIGMAY = SIGMAY + (YMU - Y(I)) * (YMU - Y(I))
320 SIGMAXY = SIGMAXY + (XMU - X(I)) * (YMU - Y(I))
330 NEXT I
340 R = SIGMAXY / SQR(SIGMAX * SIGMAY)
350 PRINT "Correlation Coefficient r =  "; R: SIGMAY = 0!
360 FOR I = 1 TO N
370 SIGMAY = SIGMAY + (Y(I) - A - B * X(I)) {\char94} 2
380 PRINT USING "\#\#\#\#  \#\#.\#\#\#\#\#  \#\#.\#\#\#\#\#"; X(I); Y(I); A + B * X(I)
390 NEXT I
400 SIGMAY = SQR(SIGMAY / (CDBL(N) - 2!))
410 PRINT "sigma\_y (unbiased) = "; SIGMAY
420 PRINT "sigma\_x (unbiased) = "; SIGMAY * SQR(SIGMAX / DELTA)
430 END
}
\smallskip\noindent  After the above BASIC computer program was executed,
the following data were obtained:
\smallskip
{\tt\parindent=0pt\obeylines\ttraggedright
A (x-intercept) = -900.77236
B (slope) = .50009
Correlation Coefficient r = .99475
Sigma\_y (unbiased) = .17980
Sigma\_x (unbiased) = .05421
}
\smallskip\noindent  The next important topic is the construction of
the error bars.  This is often the most forgotten item in the creation
of a data presentation.  It is the only way properly introduct statistical
uncertainty into a display graph.  In the hard science of Physics, such
information is mandatory.  The distance from the data point to the
crossbars (top and bottom of the errorbar) must equal one unbiased
standard deviation, $\sigma_y$.  There may also be uncertainty in the
abscissa.  In this case, it is necessary to construct an error bar
horizontal to the $x$-axis of length $\sigma_x$.  The error bars tell
us more than just some statistical fact about the individual data points.
If we examine closely Fig. 33, we'll note that the line of best fit
passes through most of the error bars.  This tells us that there
is a good probability that there is an underlying functional
relationship between $x$ and $f(x)$, namely $f(x) \approx a\cdot x
+ b$.  If, on the other hand, the best straight line missed a high
proportion of the error bars, or if it missed any by a large
distance (relative to the length of the error bar), then the
relationship between $x$ and $y \approx f(x)$ would probably not be that of
a straight line (if there was any relationship at all).
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%  Error bar graph
%
$$\beginpicture % Always put % here because some word processor
  \setcoordinatesystem units <1cm,1cm> % truncate final space(s).
  \setplotarea x from -2 to 8, y from -2 to 2 %
  \crossbarlength = 5pt %
  \plotheading {Error bars with uncertainties} %
  \puterrorbar at 0 0 with fuzz 1.2 %
  \put { \lines{$\uparrow$\cr $\sigma_y$\cr $\downarrow$\cr} } % 
     <0pt,3pt> at .5 .5 %
  \puterrorbar at 5 0 with fuzz 1.2 %
  \put { \lines{$\uparrow$\cr $\sigma_y$\cr $\downarrow$\cr} } % 
     <0pt,3pt> at 5.5 .5 %
  \putrule from 3.6 0.0  to 6.4 0.0 %
  \putrule from 3.6 0.1 to 3.6 -0.1 %
  \putrule from 6.4 0.1 to 6.4 -0.1 %
  \put { $\leftarrow\sigma_x\rightarrow$ } at 5.7 -0.5 %
\endpicture$$ %
\centerline{F{\sevenrm IGURE} 22}
\smallskip\noindent
We'll conclude this long chapter with a another graph
showing the line of best fit and the companion ``C'' program for the
BASIC program above.  The next graph was suggested by the textbook
{\sl An Introduction to Error Analysis\/} by John R. Taylor
(Mill Valley, CA: University Science Books, 1982), page 26.  It
indicates the uncertainties found in any physical measurement.
The underlying physical law is know as {\bf Hooke's Law}, published
by Robert Hooke in 1678.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% A plot of an extension of a spring $x$ versus the load $m$.
%
%
$$\beginpicture
  \setcoordinatesystem units <.002in,.4in>
  \crossbarlength=5pt 
  \setplotarea x from 0 to 1000, y from 0 to 5 
  \linethickness=.25pt
  \axis bottom  %
    label {\lines{ $m$ (gm) $\longrightarrow$ \cr \noalign{\vskip2pt} %
       F{\sevenrm IGURE} 23\cr} } %
    ticks in numbered from 0 to 1000 by 500
    unlabeled short quantity 11  /
  \axis left 
    label {\lines{$\uparrow$\cr $x$\cr (cm)\cr}}
    ticks in withvalues $5$ / at 5 / 
    quantity 6 /
  \plot 0 5  0 5.5 /
  \plot 1000 0  1050 0 /  
  \linethickness=.4pt
  \plot 0 0  1000 5.5 /
  \puterrorbar at 200 1.1   with fuzz  0.3
  \puterrorbar at 300 1.5   with fuzz  0.3
  \puterrorbar at 400 1.9   with fuzz  0.3
  \puterrorbar at 500 2.8   with fuzz  0.3
  \puterrorbar at 600 3.4   with fuzz  0.3
  \puterrorbar at 700 3.5   with fuzz  0.3
  \puterrorbar at 800 4.6   with fuzz  0.3
  \puterrorbar at 900 5.4   with fuzz  0.3
\endpicture$$
\vfill\eject
{\tt\parindent=0pt\obeylines\ttraggedright
/* Program for mean, standard deviation */
/* Program for least-squares fit to a straight line */
/* Program for computing correlation coefficient "r" */
\medskip
\#include <stdio.h>
\#include <math.h>
\#include <float.h>
\medskip
int main()
{\char123}
/* Initialize all the parameters and type variables */
\quad  FILE *fp;   /* file pointer */
\quad  static double x[] = { 1981., 1982., 1983., 1984., 1985.,
\qquad 1986., 1987., 1988., 1989., 1990.,
\qquad 1991. };   /* x values (years) */
\quad  static double y[] = { 90.08, 90.57, 90.76, 91.30, 91.57,
\qquad 92.44, 92.87, 93.68, 93.85, 94.49,
\qquad 94.88 };   /* y values (reliabilities) */
\quad  /* The variables sum\_{something} are used as accumulators. */
\quad  /* a, b are the y-intercept and slope, respectively of the
\qquad least-squares line. */
\quad  /* r is the correlation coefficient (the so-called Pearson's r). */
  double  Delta, a, b, r, sum\_x, sum\_x2, sum\_y, sum\_y2, sum\_xy;
  /* mu\_{something} is a mean, sigma\_{something} is a standard
     deviation (unbiased for least-squares = n-2 degrees of freedom). */
  double  mu\_x, mu\_y, sigma\_x, sigma\_y, sigma\_xy;
  /* The integer i is an index. (unsigned, short integer) */
  int i;
  /* The long integer n is the count of the number of data points. */
  long int n;
/* Begin computations */
  /* Initialize n, x[n], y[n], and give the output file a name.  Then you
     are ready to begin execution. */
  n = 11;
  /* Make sure the accumulators are all initialized to zero. */
  sum\_x = (double) 0.0;
  sum\_y = (double) 0.0;
  sum\_y2 = (double) 0.0;
  sum\_x2 = (double) 0.0;
  sum\_xy = (double) 0.0;
  sigma\_x = (double) 0.0;
  sigma\_y = (double) 0.0;
  sigma\_xy = (double) 0.0;
  /* The so-called "for loop" computes the sum\_{something}s */
  for (i=0; i < n; ++i){\char123}          /* i marches from 0 to n-1.   */
    sum\_y += (double) y[i];                  /* All the arrays in "C"      */
    sum\_y2 += (double) y[i]*y[i];            /* programs begin with index  */
    sum\_x += (double) x[i];                  /* zero.                      */
    sum\_x2 += (double) x[i]*x[i];
    sum\_xy += (double) x[i]*y[i];
  {\char125}
  /* Display the results of the computation of all the sums, Delta,
\quad  the slope, and the y-intercept for
\quad  y = a + b * x, the least-squares straight line */
  printf("{\char92}n sum\_x = \%.16lf", sum\_x);
  printf("{\char92}n sum\_y = \%.16lf", sum\_y);
  printf("{\char92}n sum\_x2 = \%.16lf", sum\_x2);
  printf("{\char92}n sum\_y2 = \%.16lf", sum\_y2);
  printf("{\char92}n sum\_xy = \%.16lf", sum\_xy);
  Delta = (double) n*sum\_x2 - sum\_x*sum\_x;
  printf("{\char92}n Delta = \%.16lf", Delta);
  a = (double) (sum\_x2*sum\_y - sum\_x*sum\_xy)/Delta;
  b = (double) (n*sum\_xy - sum\_x*sum\_y)/Delta;
  printf("{\char92}n A (y-intercept) = \%.16lf", a); /* Use Capital "A" here    */
  printf("{\char92}n B (slope)       = \%.16lf", b);    /* to be consistent with   */
  mu\_y = (double) sum\_y/n;                     /* the text book (Taylor). */
  mu\_x = (double) sum\_x/n;
  /* Display (print to screen) the means only if needed. */
  /* printf("{\char92}n mu\_y = \%.16lf", mu\_y); */
  /* printf("{\char92}n mu\_x = \%.16lf", mu\_x); */
  for (i=0; i < n; ++i){\char123}
\quad    /* Biased estimates for sigma\_x and sigma\_y */
    sigma\_y += (mu\_y-y[i])*(mu\_y-y[i]);       /* We are just using the   */
    sigma\_x += (mu\_x-x[i])*(mu\_x-x[i]);       /* sigma\_{something}s here */
    sigma\_xy += (mu\_x-x[i])*(mu\_y-y[i]);      /* as accumulators.        */
 {\char125}
  /* compute the correlation coefficient from the values in the
     accumulators.  This is not the actual formula.  */
  r = sigma\_xy/sqrt(sigma\_x*sigma\_y);
  printf("{\char92}n correlation coefficient = \%.16lf", r);
  sigma\_y = 0.0;
  for (i=0; i < n; ++i){\char123}
    sigma\_y += (y[i] - a - b*x[i])*(y[i] - a - b*x[i]);
    /* This is the least-squares data to be plotted. */
    printf("{\char92}n \%d \%.4lf \%.4lf \%.4lf", i, x[i], y[i],a + b*x[i]);
 {\char125}
  /* Unbiased estimates for sigma\_x and sigma\_y */
  /* Note that there are n-2 degrees of freedom (not n-1). */
  sigma\_y = sqrt((double) sigma\_y/(n-2.0));
  printf("{\char92}n sigma\_y (unbiased) = \%.16lf", sigma\_y);
  sigma\_x = sigma\_y * sqrt((double) sum\_x2/Delta);
  printf("{\char92}n sigma\_x (unbiased) = \%.16lf", sigma\_x);
/*  Output to file.  Make sure you have a unique filename. */
  fp = fopen("pgm3.txt","w");     /* Open the file */
  fprintf(fp,"{\char92}n A (y-intercept) = \%.16lf", a);
  fprintf(fp,"{\char92}n B (slope)       = \%.16lf", b);
  fprintf(fp,"{\char92}n correlation coefficient = \%.16lf", r);
  fprintf(fp,"{\char92}n sigma\_y (unbiased) = \%.16lf", sigma\_y);
  fprintf(fp,"{\char92}n  x[i]     y[i]     a+b*x[i]");
  for (i=0; i < n; ++i){\char123}
  fprintf(fp,"{\char92}n \%.4lf \%.4lf \%.4lf",x[i],y[i], a + b*x[i]);
  {\char125}
  fclose(fp);                  /* Close the file */
return(0);
{\char125}
\medskip
/* End Of File */
}
\vfill\eject
% End of first portion for chapter 3.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\noindent{\bf\llap{3.2\quad}Formula Derivation.}  Whenever a mathematicians
are faced with the chore of explaining some difficult point, they just make
a definition.  Then, by choosing a definition to fit their needs, they can
create a theorem which employs the definition and declare the problem solved.
That's basically what we'll do here.  There is a formal derivation based on
determining a minimum of the formula
$$P_{a,b}\big[y_i\big]\ \propto\ {1\over\sigma_y}
   e^{-(y_i-ax_i-b)^2/2\sigma_y^2};\eqno(44)$$
however, we'll approach the problem from a simpler tack and arrive at
the same result.  Now, suppose we have a point set consisting of $n$
points.  Let's call this point set $S$ and observe that it can be
written several ways
$$\eqalign{S\ &=\ \big\{\,(x_1,y_1), (x_2,y_2), \ldots, (x_n, y_n)\,\big\}\cr
  &=\ \big\{\,(x_i,y_i)\,\mid\,i=1,\ldots,n\,\big\}.\cr}$$
If we call the difference $\big| y_i - a\cdot x_i - b\big| = \sqrt{
\big(y_i - ax_i -b\big)^2}$ the deviation, or residual, of $y_i$ from
the value $a\cdot x_i + b$, then we may define
$$\dev^2\ =\ \dev^2(a,b)\ =\ \sum_{i=1}^n \left( y_i-a\cdot x_i  -b\right)^2.
   \eqno(45)$$
There are many different ways to approach this subject matter.  
We will choose the most straightforward.  If one looks in a textbook,
the derivation will the much more elaborate.   This is because
mathematicians love
to generalize things.  The line $y = f(x) = ax + b$ is also a point
set (not finite) and sometimes it will be written as
$$L\ =\ \big\{\,(x,y)\,\mid\,y=a\cdot x + b,\ x\in{\bf R}\,\big\}.$$
In any case, we simply want to {\sl minimize\/} the value for
$\dev^2(a,b)$.  In so doing, we will determine $a$ and $b$ in terms of
the values of $x_i$, $y_i$, and the integer $n$.  There's no way out
of taking a partial derivative here.  Just take it on faith that the
minimum occurs exactly when
$${\partial \dev^2(a,b)\over \partial a} = {\partial \dev^2(a,b)\over\partial b}
= 0.\eqno(46)$$
We compute
$${\partial \dev^2(a,b)\over\partial a} 
= \sum_{i=1}^n -2\big(y_i-ax_i-b\big)\cdot x_i\eqno(47)$$
and
$${\partial \dev^2(a,b)\over\partial b} 
= \sum_{i=1}^n -2\big(y_i-ax_i-b\big).\eqno(48)$$ 
Assume that equations (46), (47), and (48) are 
correct.\footnote{$^{30}$}{for the derivation, see John R. Taylor, 
{\it An Introduction to Error Analysis}, 
(Mill Valley, CA: University Science Books, 1982), page 156}
Let's go through the algebra (carefully) and see what happens.
Setting equation (47) equal to zero and expanding gives
$$\sum_{i=1}^n x_iy_i\ - a\,\sum_{i=1}^n x_i^2
    \ -\ b\,\sum_{i=1}^n x_i\ =\ 0.\eqno(49)$$
Setting equation (48) equal to zero and expanding gives
$$\sum_{i=1}^n y_i\ -\ a\,\sum_{i=1}^n x_i
  \ -\ b\,\sum_{i-1}^n 1\ =\ 0.\eqno(50)$$
A very common error is not realizing that 
$$\sum_{i=1}^n 1\ =\ n.$$
We have been writing $\sum_{i=1}^n$ again and again.  Let's simplify
our notation by agreeing that whenever we use $\sum_{i=1}^n$ we can
just write capital sigma, $\sum$.  Equations (49) and (50) can now
we re-written as
$$\sum x_iy_i\ - a\,\sum x_i^2
    \ -\ b\,\sum x_i\ =\ 0\eqno(49')$$
and
$$\sum y_i\ -\ a\,\sum x_i
  \ -\ nb\ =\ 0.\eqno(50')$$
We want to solve for the values of $a$ and $b$.  To solve for $a$, we
eliminate $b$.  That is done by multiplying equation (49') by $n$ and
by multiplying equation (50') by $\sum x_i$ and subtracting one from
the other (clever). Let's do just that
$$\eqalign{n\sum x_iy_i\ - na\,\sum x_i^2
    \ -\ nb\,\sum x_i\ &=\ 0\cr
\sum y_i\sum x_i\ -\ a\,\left(\sum x_i\right)^2
  \ -\ nb\sum x_i\ &=\ 0.\cr
a\,\left(\left(\sum x_i\right)^2-n\,\sum x_i^2\right)
 \  &=\ \sum x_i\sum y_i\ -\ n\,\sum x_iy_i \cr}$$
So,
$$ a\ =\ {  n\sum x_iy_i - \sum x_i\sum y_i
        \over n\sum x_i^2 - \big(\sum x_i\big)^2 }.\eqno(51)$$
Now all we'll have to do is to plug the value for $a$ into either
(49') or (50') to solve for $b$.  (50') is easier.
$$\eqalign{\sum y_i - a\sum x_i - nb \ &=\ 0,\cr
 \sum y_i\ -\ { n\sum x_iy_i- \sum x_i\sum y_i \over \
  n\sum x_i^2 - \left(\sum x_i\right)^2 } \sum x_i\ -\ nb\ &=\ 0,\cr
n \sum y_i \sum x_i^2 - \sum y_i\left(\sum x_i\right)^2 
- n\sum x_i\sum x_iy_i\qquad   \cr
 +\ \sum x_i\sum x_i\sum y_i 
    -nb\left(n\sum x_i^2 -\left(\sum x_i\right)^2\right)
  \ &=\ 0,\cr
n\sum y_i\sum x_i^2 -n \sum x_i\sum x_iy_i - nb \Delta\ &=\ 0,\cr}$$
where
$$\Delta\ =\ n\sum x_i^2 -\left(\sum x_i\right)^2.$$
Thus,
$$ b\ =\ {\sum y_i\sum x_i^2 - \sum x_i\sum x_iy_i\over
   n\sum x_i^2 -\left(\sum x_i\right)^2 }.\eqno(52)$$
$$ a\ =\ {n\sum x_iy_i- \sum x_i\sum x_i \over \Delta}\eqno(51')$$
$$ b\ =\ {\sum y_i\sum x_i^2 - \sum x_i\sum x_iy_i \over \Delta}\eqno(52')$$ 
This is exactly what we saw in the BASIC computer code:
{\tt\obeylines\parindent=0pt\ttraggedright
230 DELTA = CDBL(N) * SUMX2 - SUMX * SUMX
240 A = (SUMX2 * SUMY - SUMX * SUMXY) / DELTA
250 B = (CDBL(N) * SUMXY - SUMX * SUMY) / DELTA
260 CLS : PRINT "A (x-intercept) = "; A
270 PRINT "B (slope) = "; B
}
\smallskip\noindent
(Here {\tt A} and {\tt B} stand for $b$ and $a$, respectively!  {\tt DELTA}
is still $\Delta$.)
\vfill\eject
\noindent{\bf\llap{3.3\quad}Goodness of Fit.}  We saw how to derive the
least squares line of best fit, now the question is ``just how good of
a fit is it?''  Two things are needed: (1) a correlation coefficient;
and (2) a quantitative measure of the degree of correlation for a
given sample size.  The formula for the correlation coefficient is
simply
$$r\ =\ {\sum_{i=1}^n \big(x_i-\mu_x\big)\big(y_i-\mu_y\big)\over
  \left(\sum_{i=1}^n \big(x_i-\mu_x\big)^2
  \sum_{i=1}^n\big(y_i-\mu_y\big)^2 \right)^{1/2} }.\eqno(53)$$
This correlation coefficient is also called the linear correlation
coefficient or {\it Pearson's\/} $r$.  It turns out that $r$ is a
number lying in the interval $[-1, 1]$.  The closer $r$ is to
$\pm 1$, the more closely the data points lie on a straight line.
On the other hand, a value of $r$ near zero implies little or no
linear correlation.  But this is vague.  What does one mean by
saying ``$|r|$ lies close to 1?''  There must be some statistical
test to determine goodness of fit quantitatively.  There is!  We can
calculate the probability that a coefficent of linear correlation
will be greater than some given value, say $r_0$, by the
integral\footnote{$^{31}$}{E. M. Pugh and G. H. Winslow, {\it The
Analysis of Physical Measurement\/} (Addison-Wesley, 1966), 
Section 12--8.}
$$P_N\left(|r|\ge |r_0|\right)
\ =\ {2\Gamma\left[\big(N-1\big)/2\right] \over \sqrt{\pi}
  \Gamma\left[\big(N-2\big)/2\right] }\,
  \int\limits_{|r_0|}^1 \left(1-r^2\right)^{(N-4)/2}\,dr.\eqno(54)$$
But how to evaluate this integral?  We might look for a table of
values.  But relax, if we employ our Simpson's rule we can solve
this integral accurately and quickly.  A BASIC program is listed
below:
\smallskip
{\tt\obeylines\parindent=0pt\ttraggedright
100 DEFDBL A-H, O-Z
110 INPUT "Number of points, Correlation Coefficient "; P, R
130 IF (P < 4) THEN PRINT "Number of points too small": GOTO 110
140 DEF FNA (x AS DOUBLE) = (1 - x * x) {\char94} ((P - 4!) / 2!)
150 A = 0!: B = 1!: N = 256: SUM = 0!: H = (B - A) / N
160 FOR I = 1 TO N STEP 2
170 SUM = SUM + FNA(A + (I - 1) * H)
180 SUM = SUM + 4! * FNA(A + I * H)
190 SUM = SUM + FNA(A + (I + 1) * H)
200 NEXT I: REM N = 256 gives accuracy to 5 decimal places.
220 PSUM = SUM * H / 3!
240 A = R: B = 1!
250 SUM = 0!:  H = (B - A) / N
260 FOR I = 1 TO N STEP 2
270 SUM = SUM + FNA(A + (I - 1) * H)
280 SUM = SUM + 4! * FNA(A + I * H)
290 SUM = SUM + FNA(A + (I + 1) * H)
300 NEXT I
310 RSUM = 100! * (SUM * H / 3!) / PSUM
320 PRINT USING "Probability |r| >= r\_0 is \#\#.\#\%"; RSUM
}
\smallskip\noindent
But where are all the $\Gamma()$'s and the coefficient 
$2\big/\sqrt{\pi}$? What
we are doing is normalizing the value of the integral by observing
that 
$$P_N\big( |r| \ge 0 \big) \ =\ 1.$$
This way the coefficient,
$${2\Gamma\left[\big(N-1\big)/2\right] \over \sqrt{\pi}
  \Gamma\left[\big(N-2\big)/2\right] }$$
does not have to be computed explicitly.\footnote{$^{32}$}{This is nice
because $\Gamma\big(m+{1\over2}\big)= {1\cdot3\cdot5\cdots(2m-1)\over
2^m}\,\sqrt{\pi}$ for $m=1,2,3,\dots$.}  Working through this program
will give us a deeper insight as to why numerical integration is
important and how it ties in to Probability Theory.  In fact, this
whole topic of calculation of probabilities for correlation coefficients
is one frequently misunderstood and confused by working scientists
and engineers.
\medskip
\noindent  The ``C'' program is listed below.  This program has internal
documentation in the form of comments.  A comment in ``C'' begins with
the character combination ({\tt /*}) and ends with the reverse
character combination ({\tt */}).
\medskip
{\tt\parindent=0pt\ttraggedright\obeylines
\#include <stdio.h>        /* Header for input/output subroutines. */
\#include <math.h>         /* Header for math subroutines. */
\#include <float.h>        /* Header for floating point subroutines. */
\medskip
\#define pi 3.141592653589793238462643383279     /* Accurate value for pi. */
\medskip
/* Simpson's rule for approximating integrals.
\quad a:              left endpoint
\quad b:              right endpoint
\quad fc:             pointer to function to integrate
\quad n:              number of subintervals
*/
double fc (double x);
\medskip
main()
{\char123}
double a,b,h,sum,x,y,p,r,psum;   /* In 'C' all variables must be assigned */
double p1, p2, p3;
int i, n, o;
printf("{\char92}007");           /* Sound bell in computer. */
a = (double) 0.0;         /* The normalizing integral is */
b = (double) 1.0;         /* from 0 to 1.  */
n = 256;  /* This has been tested and seen to give 5 decimal accuracy. */
printf("{\char92}nEnter number of points ");
scanf("%d",&o);
if (o < 5){\char123}
\quad printf("Too few points"); return(o);
{\char125}
p = (double) o;
h = (double) (b-a)/n;
for (i=1, sum=0.0; i<=n; i = i+ 2){\char123}
\quad p1 = fc((double) a+(i-1)*h, p);
\quad p2 = fc((double) a+i*h, p);
\quad p3 = fc((double) a+(i+1)*h, p);
\quad sum += p1 + 4.0 * p2 + p3;
{\char125}
\medskip
psum = (double) h*sum/3.0; /* This normalizes the integral */
/* For a given value of N, P\_N[|r|>=0] (must)) = 1.0 */
/* We assure this (and eliminate the need to compute some */
/* Gamma function) in computing the value in "psum." */
\medskip
printf("{\char92}nEnter correlation coefficient r = ");
scanf("%lf", &r); a = fabs((double) r);
if (a > 1) a = (double) 1.0;
\quad /* r\_0 must be between -1 and 1.  Force a to assume a value */
\quad /* between 0 and 1. */
h = (double) (b-a)/n;
for (i=1, sum=0.0; i<=n; i = i+ 2){\char123}
\quad p1 = fc((double) a+(i-1)*h, p);
\quad p2 = fc((double) a+i*h, p);
\quad p3 = fc((double) a+(i+1)*h, p);
\quad sum += p1 + 4.0 * p2 + p3;
{\char125}
\medskip
y = (double) h*sum/3.0;
printf("{\char92}nPercentage probability that N measurements");
printf("{\char92}nof two uncorrelated variables would give an r");
printf("{\char92}nas large as r\_0, %.2lf, that is ", a);
printf("{\char92}nP\_N[|r|>=r\_0] = %.2lf (per cent).", (double) 100.*y/psum);
printf("{\char92}nIf the number %.2lf is less than 5 (per cent),", 100.*y/psum);
printf("{\char92}nthen the correlation is called significant.");
printf("{\char92}nIf the number %.2lf is less than 1 (per cent),", 100.*y/psum);
printf("{\char92}nthen the correlation is call highly significant.");
printf("{\char92}nGiven this simple program and its speedy calculation");
printf("{\char92}nof the significance of a correlation in absolute terms,");
printf("{\char92}nwe only have to plug in the number of data points and,");
printf("{\char92}nthe value gotten from Pearson's correlation coefficient, r.");
printf("{\char92}n");
return(0);
{\char125}
\medskip
double fc (double x, double p)
{\char123}
\quad double y;
\quad y = (double) pow((double)(1.0-x*x),(double) (p-4.0)/2.0);
\quad return (y);
{\char125}
\medskip
/* End of file */
}
\medskip
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%  Figure 24---Kolgoroff-Schmirnoff
%
$$\beginpicture
  \setcoordinatesystem units <4in,2in>
  \setplotarea x from 0 to 1, y from 0 to 1 %
  \plotheading {\lines{ Curve $f(x) = \big( 1-x^2\big)^{(N-4)/2}$,\cr
                        where $N=5$ and $|r_0| = {1\over2}$\cr } } %
  \axis left ticks numbered from 0 to 1 by .2 / %
  \axis bottom label {F{\sevenrm IGURE} 24} 
    ticks numbered from 0 to 1 by .2 / %
\ifexpressmode
  \put {\bf EXPRESSMODE} at .5 .5 %
\else  
  \setquadratic
  \linethickness=1pt
  \plot
     0.00  1.00000  0.05  0.99875  0.10  0.99499  0.15  0.98869
     0.20  0.97980  0.25  0.96825  0.30  0.95394  0.35  0.93675
     0.40  0.91652  0.45  0.89303  0.50  0.86603  0.55  0.83516
     0.60  0.80000  0.65  0.75993  0.70  0.71414  0.75  0.66144
     0.80  0.60000  0.85  0.52678  0.90  0.43589  0.925 0.37997
     0.95  0.31225  0.975 0.22220  1.00  0.00000  / %
\fi
  \setlinear
  \linethickness=0.4pt
  \putrule from  0.50 0.0  to  0.50  0.86603  
  \putrule from  0.55 0.0  to  0.55  0.83516
  \putrule from  0.60 0.0  to  0.60  0.80000
  \putrule from  0.65 0.0  to  0.65  0.75993 
  \putrule from  0.70 0.0  to  0.70  0.71414
  \putrule from  0.75 0.0  to  0.75  0.66144
  \putrule from  0.80 0.0  to  0.80  0.60000
  \putrule from  0.85 0.0  to  0.85  0.52678  
  \putrule from  0.90 0.0  to  0.90  0.43589  
  \putrule from  0.95 0.0  to  0.95  0.31225
  \putrule from 0.525 0.0  to 0.525 0.85110
  \putrule from 0.575 0.0  to 0.575 0.81815
  \putrule from 0.625 0.0  to 0.625 0.78062
  \putrule from 0.675 0.0  to 0.675 0.73782
  \putrule from 0.725 0.0  to 0.725 0.68875
  \putrule from 0.775 0.0  to 0.775 0.63196
  \putrule from 0.825 0.0  to 0.825 0.56513
  \putrule from 0.875 0.0  to 0.875 0.48412
  \putrule from 0.925 0.0  to 0.925 0.37997
  \putrule from 0.975 0.0  to 0.975 0.22220
  \put {$\scriptstyle\bullet$} at 0.5 0.0 
  \put {$|r_0|$} [t] <0pt,-3pt> at 0.5 0.0
\endpicture$$ %
%
\vfill\eject
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\noindent{\bf\llap{3.4\quad}More Theory.}  We will examine the theory behind
the ``method of least squares'' as applied to fitting a straight line
$$y = a\cdot x + b \eqno(55)$$
to the data points $(x_1,y_1)$, $(x_2,y_2)$, $\ldots$, $(x_n,y_n)$.  As before,
we might observe that associated with each value of the abscissa, that is,
associated with each $x_i$ ($1 \le i \le n$) there are two value of $y$.
There is the $y_i$ and the value $\tilde y_i = a\cdot x_i + b$.  If we
take the numerical difference of the two, $d_i = y_i - \tilde y_i$,
we get the amount the predicted value of $y$, $\tilde y$, differs from
the observed value of $y$.  Let's write out some of these differences, which
are called {\it deviations}:
$$\eqalign{ d_1\ &=\ y_1 - (a\cdot x_1 + b) \cr
            d_2\ &=\ y_2 - (a\cdot x_2 + b) \cr
            \vdots\ &\ \qquad \vdots \cr
            d_n\ &=\ y_n - (a\cdot x_n + b) \cr}$$
The set of all the $d_i$'s ($1 \le i \le n$), which is also written
$$\big\{\,d_1,\,d_2,\,\ldots,\,d_n\,\big\}$$
or
$$\{\,d_i\,\mid\,1 \le i \le n\,\}.$$
If it turns out that $d_i = 0$ for all the integers $i$,
$i = 1,2,\ldots,n$, then the straight line $y=ax+b$ passes through
each of the $n$ points.  If this is not the case, then we have to
solve the problem of finding a line of best fit.  A line of best fit
is a line which best fits the data in some sense.  One obvious
sense is to minimize the values of the deviations.  Let's look at
the situation graphically:
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
$$\beginpicture
  \setcoordinatesystem units <1in,1in>
  \setplotarea x from -.25 to 3.00, y from -.25 to 1.50
  \plotheading {Line of Best Fit}
  \axis bottom shiftedto y=0 label {F{\sevenrm IGURE} 25} /
  \axis left shiftedto x=0 /
  \put {$O$} [rt] <-2pt,-2pt> at 0 0 %
  \put {$x$} [l] <4pt,0pt> at 3.00 0.00
  \put {$y$} [b] <0pt,4pt> at 0.00 1.50
  \put {$\scriptstyle\times$} at 0.25 1.00
  \put {$(x_1,y_1)$} [lb] <4pt,6pt> at 0.25 1.00
  \put {$\scriptstyle\times$\lower6pt\hbox{$(x_2,y_2)$} } [lt] at 0.75 0.75
  \put {$\scriptstyle\times$} at 1.6 1.2
  \put {$\scriptstyle\times$\raise6pt\hbox{$(x_n,y_n)$} } [rb] at 2.85 1.45
  \put {$\scriptstyle\bullet$} at 0.25 0.83421
\ifexpressmode
\else
  \setlinear
  \plot 0.25 0.83421  0.25 1.00  / %
  \linethickness=1pt
  \plot 0.10 0.8   2.95 1.45 / %
\fi
\endpicture$$
% 10 x0 = .1
% 20 y0 = .8
% 30 x1 = 2.95
% 40 y1 = 1.45
% 50 x = .25
% 60 y = y0 + (y1 - y0) * (x - x0) / (x1 - x0)
% 70 PRINT x, y
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
If a line comes close  to some of the points, the value of the deviation from
those points will be small.  The values of the deviation for points lying
above the line is positive, while the values of the deviation for points
lying below the line is negative.  To get a realistic picture of what is
meant by the {\sl best\/} fit, we could take the absolute value; however,
it is simpler to take the squares of the deviations.  Written as a function,
this becomes
$$\dev^2 \ =\ (y_1-ax_1-b)^2 + (y_2-ax_2-b)^+ \cdots +
    (y_n-ax_n-b)^2.\eqno(56)$$
Recall the ``Big-Sigma'' notation:
$$\sum_{i=1}^n (y_i-ax_i-b)^2\ =\ (y_1-ax_1-b)^2 + (y_2-ax_2-b)^+ \cdots +
    (y_n-ax_n-b)^2.\eqno(57)$$
So, we have a function, $\dev^2$.  But what is it a function of?  Clearly,
the $x_i$'s and $y_i$'s are fixed data points.  There are only two things
to vary, the constants $a$ and $b$.  And this is a general concept for
much advanced mathematics:  ``hold the variables constant and vary the
constants.''  No where is this more often done than in thermodynamics.  So, we
write
$$\eqalign{\dev^2(a,b)\ &=\ (y_1-ax_1-b)^2 + (y_2-ax_2-b)^2+ \cdots +
    (y_n-ax_n-b)^2\cr
  &=\ \sum_{i=1}^n (y_i-ax_i-b)^2.\cr}\eqno(58)$$
Now, by definition, we will say that the line of best fit is that line
$y = ax + b$ for which the sum of the squares of the deviations is a minimum.
This is a philosophically correct definition as well.  The problem is how
to determine which values of $a$ and $b$ to choose.  The minimum value for
equation (56) can be obtained from partial differentiation.  But we have
already done that in a previous paragraph.  We solved the equations
$${\partial \dev^2(a,b) \over \partial a}\ =\ 0\qquad\hbox{and }\quad %
  {\partial \dev^2(a,b) \over \partial b}\ =\ 0.\eqno(59)$$
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%  Least-squares plot after Thomas.
%
$$\beginpicture
  \setcoordinatesystem units <1cm,1cm>
  \setplotarea x from -.25 to 5.00, y from -.25 to 6.00
  \plotheading {Line of Best Fit}
  \axis bottom shiftedto y=0 label {F{\sevenrm IGURE} 26}
    ticks numbered from 1 to 4 by 1  /
  \axis left shiftedto x=0  ticks numbered from 1 to 5 by 1 /
  \put {$O$} [rt] <-2pt,-2pt> at 0 0 %
  \put {$x$} [l] <4pt,0pt> at 5.00 0.00
  \put {$y$} [b] <0pt,4pt> at 0.00 6.00
  \put {$\scriptstyle\times$} at 0.0 1.0
  \put {$\scriptstyle\times$} at 2.0 2.0
  \put {$\scriptstyle\times$} at 1.0 3.0
  \put {$\scriptstyle\times$} at 3.0 4.0
  \put {$\scriptstyle\times$} at 4.0 5.0
  \put {$(x_1,y_1)$} [l] <4pt,0pt> at 0.0 1.0
  \put {$(x_2,y_2)$} [t] <0pt,-4pt> at 2.0 2.0
  \put {$(x_3,y_3)$} [b] <0pt,4pt> at 1.0 3.0
  \put {$(x_4,y_4)$} [rb] <-2pt,2pt> at 3.0 4.0
  \put {$(x_5,y_5)$} [rb] <0pt,4pt> at 4.0 5.0
\ifexpressmode
\else
  \setlinear
  \linethickness=1pt
  \plot 0.0 1.2  5.0 5.7  / %
\fi
\endpicture$$
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The data points belong to the set $\{\,(0,1),\,(2,2),\,
(1,3),\,(3,4),\,(4,5)\,\}$.  The equation of the line of
best fit is $y = 0.9x + 1.2$.  Since the data values are integers
and the number of points is small, it is possible to evalute the
deviation function directly.\footnote{$^{33}$}{George B. Thomas,
{\it Calculus and Analytic Geometry}, (Reading, Massachusetts:
Addison-Wesley Publishing Company, Inc., 1966), page 699.}
All of this can be generalized to a polynomial $p(x)$ of degree
$n \ge 2$.  We simply write $y=a_0+a_1x+a_2x^2+\cdots+a_nx^n$ to get
$$\dev^2(a_0,\ldots,a_n)
\ =\ \sum_{i=1}^n \big(y_i -a_0 -a_1x -\cdots-a_nx_i^n\big).\eqno(60)$$
Then it is necessary to minimize $\dev^2({\bf a})$, where 
(${\bf a} = (a_0,a_1,\ldots,a_n)$), as follows:
$${\partial \dev^2(a_0,\ldots,a_n) \over \partial a_i}\ =\ 0.\eqno(61)$$
\vfill\eject
\noindent{\bf\llap{3.5\quad}Problems.}  In this paragraph we will work
problems.\footnote{$^{34}$}{Erwin Kreyszig, {\it Advanced Engineering
Mathematics, 5th Edition}, (NY: John Wiley \& Sons, 1983), pages 820-821.}
One very important consideration is the plotting of the data.
With the digital computer, the very first thing a scientist or engineer 
should do is plot the data.  This ensures proper scaling and gives an
intuitive feeling for the behavior of the line of best fit.  Moreover,
unusual characteristics (such as logarithmic dependence, non-linear
behavior, or sectional continuity) can be determined before the machinery
of the least-squares method comes into play.
\medskip\noindent
The first data set: $\{\,(5,10),\,(10,8.9),\,(15,8.2),\,(20,7.0)\,\}$.  Since
the $x$-values are given as integers, we will assume that $\sigma_x = 0$ for
this problem.  The only error-bar needed is parallel to the $y$-axis.  First
plot the data.  Ten points to the inch look like a good scale for the
$x$-axis and one-fourth inch per unit looks good for the $y$-axis.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
$$\beginpicture
  \setcoordinatesystem units <.1in,.25in>
  \setplotarea x from 5 to 20, y from 5 to 10 %
  \axis bottom label {F{\sevenrm IGURE}  27} ticks numbered
     from 5 to 20 by 5 /
  \axis left ticks numbered from 5 to 10 by 1 /
  \put {$\scriptstyle\bullet$} at 5 10
  \put {$\scriptstyle\bullet$} at 10 8.9
  \put {$\scriptstyle\bullet$} at 15 8.2
  \put {$\scriptstyle\bullet$} at 20 7.0
\endpicture$$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Now, we run the program and get the following values for the data on the line
of best fit: $\{\,(5,9.98),\,(10,9.01),\,(15,8.04),\,(20,7.07)\,\}$.  The
equation for the line of best fit is
$$y = -0.194 x + 10.949 \eqno(62)$$
and the correlation coefficient is $r = -.99546$.
This looks like it will be a good linear fit.  Let's plot it now.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
$$\beginpicture
  \setcoordinatesystem units <.1in,.25in>
  \setplotarea x from 0 to 25, y from 5 to 12 %
  \axis bottom label {F{\sevenrm IGURE}  28} ticks numbered
     from 0 to 25 by 5 /
  \axis left ticks numbered from 5 to 12 by 1 /
  \put {$\scriptstyle\bullet$} at 5 10
  \put {$\scriptstyle\bullet$} at 10 8.9
  \put {$\scriptstyle\bullet$} at 15 8.2
  \put {$\scriptstyle\bullet$} at 20 7.0
\ifexpressmode
\else
  \setlinear
  \linethickness=1pt
  \plot 0 10.949  25 6.099 /
\fi
\endpicture$$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
This graph looks OK without error bars.  They would just clutter up the
picture, so let's not put them in.  The correlation coefficient is nearly
$-1$, there is no need to do a goodness of fit analysis.  In short, this
data set very closely approximates a linear relationship {\sl over the region
in question}.  We cannot, however, extrapolate outside of the extreme
values of the data;  our only conclusions are valid within the specified
range of data values.  This is a very important point.  Notice that the
line of best fit has been drawn across the entire graph.  This is standard
mathematical practice;  otherwise, the equation would not represent a
function---a graph of a function from the $x$-axis.  But this mathematical
practice should not obscure the underlying assumptions.
\vfill\eject
%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\noindent
The next data set has a different appearance.  The data points are
$\{\,(4,-17)$, $(15,-4)$, $(30,-7)$, $(100,50)$,
$(200,70)\,\}$.  We will choose
one printer's point per data point.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%  Kreyszig problem 3
%
$$\beginpicture
  \setcoordinatesystem units <1pt,1pt>
  \setplotarea x from 0 to 200, y from -20 to 100
  \axis bottom shiftedto y=0 label {F{\sevenrm IGURE} 29} / %
  \axis left /
  \put {$\scriptstyle\circ$} at 4 -17
  \put {$\scriptstyle\circ$} at 15 -4
  \put {$\scriptstyle\circ$} at 30 -7
  \put {$\scriptstyle\circ$} at 100 50
  \put {$\scriptstyle\circ$} at 200 70
\endpicture$$
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
After running the computer program, the following data along the line of best
fit is obtained:
$$\{\,(4,-11.67),(15,-6.65),(30,0.21),(100,32.20),(200,77.91)\,\}.\eqno(63)$$
The equation of the line of best fit and the correlation coefficient $r$
are
$$y = 0.457 x - 13.503 \quad\hbox{and}\quad r = .96074.\eqno(64)$$
Now, we will plot the data against the curve.  It is now easy to
scale the data and put in numbered tick marks.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
$$\beginpicture
  \setcoordinatesystem units <1pt,1pt>
  \setplotarea x from 0 to 250, y from -20 to 100
  \axis bottom shiftedto y=0 label {F{\sevenrm IGURE} 30 }
    ticks numbered from 50 to 250 by 50 /
  \axis left ticks numbered from -20 to 100 by 20 /
  \put {$\scriptstyle\circ$} at 4 -17
  \put {$\scriptstyle\circ$} at 15 -4
  \put {$\scriptstyle\circ$} at 30 -7
  \put {$\scriptstyle\circ$} at 100 50
  \put {$\scriptstyle\circ$} at 200 70
\ifexpressmode
\else
  \setlinear
  \linethickness=1pt
  \plot 0  -13.503    250 100.747  /
\fi
\endpicture$$
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Now we run the computer program to get the value for $\sigma_y$ and see how
good our fit is.  $\sigma_y = 12.47154$ and the probability that a random
choice of values would give the same correlation coefficient is 0.9\%---we
have a highly significant correlation.  So, let's put in the error bars to
see just how this graph looks in its final form.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
$$\beginpicture
  \crossbarlength=5pt
  \setcoordinatesystem units <1pt,1pt>
  \setplotarea x from 0 to 250, y from -20 to 100
  \axis bottom shiftedto y=0  label {F{\sevenrm IGURE} 31}
    ticks numbered from 50 to 250 by 50 /
  \axis left ticks numbered from -20 to 100 by 20 /
\ifexpressmode
  \put {$\scriptstyle\circ$} at 4 -17
  \put {$\scriptstyle\circ$} at 15 -4
  \put {$\scriptstyle\circ$} at 30 -7
  \put {$\scriptstyle\circ$} at 100 50
  \put {$\scriptstyle\circ$} at 200 70
\else
  \setlinear
  \linethickness=1pt
  \plot 0  -13.503    250 100.747  /
  \puterrorbar at 4 -17 with fuzz 12.47154
  \puterrorbar at 15 -4 with fuzz 12.47154
  \puterrorbar at 30 -7 with fuzz 12.47154
  \puterrorbar at 100 50 with fuzz 12.47154
  \puterrorbar at 200 70 with fuzz 12.47154
\fi
\endpicture$$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\vfill\eject
\noindent
Now things are starting to look routine.  We will repeat this procedure
again and again to ensure that the idea is clear.  Again, let's start with
a data set
$$\{\,(2.8,30), (2.9,26), (3.0,33), (3.1,31), (3.2,33),
   (3.2,35), (3.2,37), (3.3,36), (3.4,33)\,\}.\eqno(65)$$
Notice that one $x$-value has several $y$-values.  This won't cause the
least-square program to have any problems.  First the graph
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%  Kreyszig, page 821, problem 5
%
$$\beginpicture
  \setcoordinatesystem units <10cm,3mm>
  \setplotarea x from 2.8 to 3.4, y from 26 to 38
  \axis bottom label {F{\sevenrm IGURE}  32} ticks
    numbered from 2.8 to 3.4 by 0.2 /
  \axis left ticks numbered from 26 to 38 by 2 /
  \put {$\scriptstyle\times$} at 2.8 30
  \put {$\scriptstyle\times$} at 2.9 26
  \put {$\scriptstyle\times$} at 3.0 33
  \put {$\scriptstyle\times$} at 3.1 31
  \put {$\scriptstyle\times$} at 3.2 33
  \put {$\scriptstyle\times$} at 3.2 35
  \put {$\scriptstyle\times$} at 3.2 37
  \put {$\scriptstyle\times$} at 3.3 36
  \put {$\scriptstyle\times$} at 3.4 33
\endpicture$$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We execute the computer program and get a line of best fit with an
equation
$$y = 12.06767 x - 5.01128 \quad\hbox{and}\quad r = .69155.\eqno(66)$$
With nine points and a value of the correlation coefficient of only
$0.69$ just how significant is our data? ---3.9\%---significant (under 5\%)
but not highly significant.  Now the finished plot, with error bars:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
$$\beginpicture
  \setcoordinatesystem units <10cm,3mm>
  \setplotarea x from 2.5 to 3.5, y from 25 to 40
  \crossbarlength=5pt
  \axis bottom label {F{\sevenrm IGURE}  33} ticks
    numbered from 2.5 to 3.5 by 0.5 /
  \axis left ticks numbered from 25 to 40 by 5 /
\ifexpressmode
  \put {$\scriptstyle\times$} at 2.8 30
  \put {$\scriptstyle\times$} at 2.9 26
  \put {$\scriptstyle\times$} at 3.0 33
  \put {$\scriptstyle\times$} at 3.1 31
  \put {$\scriptstyle\times$} at 3.2 33
  \put {$\scriptstyle\times$} at 3.2 35
  \put {$\scriptstyle\times$} at 3.2 37
  \put {$\scriptstyle\times$} at 3.3 36
  \put {$\scriptstyle\times$} at 3.4 33
\else
  \puterrorbar at 2.8 30  with fuzz  2.59
  \puterrorbar at 2.9 26  with fuzz  2.59
  \puterrorbar at 3.0 33  with fuzz  2.59
  \puterrorbar at 3.1 31  with fuzz  2.59
  \puterrorbar at 3.2 33  with fuzz  2.59
  \puterrorbar at 3.2 35  with fuzz  2.59
  \puterrorbar at 3.2 37  with fuzz  2.59
  \puterrorbar at 3.3 36  with fuzz  2.59
  \puterrorbar at 3.4 33  with fuzz  2.59
  \setlinear
  \linethickness=1pt
  \plot 2.5 25.16805  3.5 37.23572 / %
\fi
\endpicture$$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
Now try your hand at the data set $\{\,(0,100)$, $(3,130)$, $(5,140)$,
$(8,170)$, $(10,190)\,\}$, which will give $y=8.8216 x + 100.12739$
and have a correlation coefficient of $r= +0.99666$.  What is $\sigma_y$?
This correlation is significant; is it highly significant?
\vfill\eject
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%                                                                     %%%%%
%%%%%                          Chapter 4                                  %%%%% 
%%%%%                                                                     %%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\headline={\tenrm\hfill Adaptive Quadrature Routine}
\centerline{\bf Chapter 4}
\bigskip
\noindent{\bf\llap{4.0\quad}Introduction.}  Sometimes it's not easy to tell
just how many subdivisions are needed to do a numerical integration. 
The reasons for this uncertainty are many.
Functions exhibiting bizarre behavior at a point or in a 
subinterval require special attention.  
Some continuous functions fail to have derivatives.
Moreover,  a function can be integrable
even if it is not be continuous at every point. 
For these cases and many others it's nice to
have some kind of automatic system for determining just when an approximation
is ``good enough.''  Fast modern computers may still be too slow.
In particular, if a function is
defined in terms of an integral its numerical 
evaluation may be time consuming; moreover, functions so defined may
be prone to significant error propagation.
In general, a function defined over an interval may be easier to integrate
over some subintervals than over others.  The methods which have been 
devised to deal with this phenomenon are call {\sl adaptive quadrature
routines}, or {\sl AQRs}.  Adaptive methods make use of error estimates to
automatically determine the number of subintervals, $n$, and
the interval size $h$.  These methods may also use smaller values of the
interval size $h$ over subintervals where it is needed.  There are many
systematic ways of doing this.  There is one method which is easily 
programmed (in BASIC), but which is theoretically involved.  This method
entails comparing Simpson's rule with the so-called trapezoid rule.  Another
method, not so easily programmed and requiring recursion, compares two 
different estimates from Newton-C\^otes' formulas.  In the following
paragraphs, we'll develop computer programs for the simplest 
computers---the programmable calculators---and compare results.
\bigskip
\bigskip
\hrule
\bigskip
$$\beginpicture
  \setcoordinatesystem units <1in,.2in>
  \setplotarea x from -.25 to 3.25, y from -5 to 6
  \plotheading {\lines{ An Adaptive Quadratic Routine,
                        yielding unequal intervals, to integrate\cr
                        the function $f(x)=2x\,\sin x^2$
                        over the interval $[0,\pi]$.\cr}}
  \axis bottom shiftedto y=0  / %
  \axis left shiftedto x=0 ticks withvalues
    {$-5$} {$-4$} {$-3$} {$-2$} {$-1$} {$1$} {$2$} {$3$} {$4$} {$5$} / %
    at -5 -4 -3 -2 -1 1 2 3 4 5 / / %
  \put {$x$} at 3.35 0
  \put {$y=2x\,\sin(x^2)$} [l] at 0.1 6.0
  \putrule from 0.398 0.0  to  0.398  0.12556
  \putrule from 0.635 0.0  to  0.635  0.49833
  \putrule from 0.832 0.0  to  0.832  1.06205
  \putrule from 1.012 0.0  to  1.012  1.72904
  \putrule from 1.193 0.0  to  1.193  2.36008
  \putrule from 1.439 0.0  to  1.439  2.52579
  \putrule from 1.586 0.0  to  1.586  1.85901
  \putrule from 1.717 0.0  to  1.717  0.66035
  \putrule from 1.843 0.0  to  1.843 -0.92998
  \putrule from 1.976 0.0  to  1.976 -2.73115
  \putrule from 2.143 0.0  to  2.143 -4.25521
  \putrule from 2.255 0.0  to  2.255 -4.20048
  \putrule from 2.357 0.0  to  2.357 -3.13566
  \putrule from 2.444 0.0  to  2.444 -1.49136
  \putrule from 2.545 0.0  to  2.545  0.98048
  \putrule from 2.699 0.0  to  2.699  4.54638
  \putrule from 2.795 0.0  to  2.795  5.58508
  \putrule from 2.880 0.0  to  2.880  5.21203
  \putrule from 2.962 0.0  to  2.962  3.59141
  \putrule from 3.045 0.0  to  3.045  0.92665
  \putrule from 3.14159 0.0 to 3.14159 -2.70366
\ifexpressmode
  \put {\bf EXPRESSMODE} at 1.5 2.0
\else
  \linethickness=1.5pt
  \setquadratic
  \plot 0 0  .398 .12556  .635 .49883  .832 1.06205  1.012 1.72904
    1.193 2.36008  1.439 2.52579  1.586 1.95901  1.717 .66035
    1.843 -.92998  1.976 -2.73115  2.143 -4.25521  2.255 -4.20048
    2.357 -3.13566  2.444 -1.49136  2.545 .98048  2.699 4.54638
    2.795 5.58508  2.880 5.21034  2.962 3.59141  3.045 .92665
    3.07 0  3.14159 -2.70366 / %
\fi
\endpicture$$
\centerline{\rm F{\sevenrm IGURE} 34}
\vfill\eject
\noindent{\bf\llap{4.1\quad}A Simple Approach.}  The best introduction to this
subject is to work simple examples and then develop the theory as needed. 
Excellent numerical results can be obtained using a very simple approach.
We are given a function, $f(x)$, an interval $[\alpha,\,\beta\,]$, and an
error bound $\epsilon>0$.
First, approximate the integral with two (different) approximations:
the trapezoid rule and Simpson's rule.  Begin with $n=10$ subintervals.
Second, compute the error estimate (relative) as follows
$$E\ =\ \left| {A_T - A_S \over A_T + A_S} \right|, \eqno(67)$$
where $A_T$ is the approximation obtained from the trapezoid rule and
$A_S$ is the value from Simpson's rule.  If $E < \epsilon$, then the
results are OK.  If not, then set $n=20$ ($n=2\cdot n$) subintervals
and repeat the process.  Continue until either $E < \epsilon$ or until $n$
gets too large, say $n > 1000$.  This is easy to program and fast
to run.  The first prototype would look like this:
\medskip
{\tt\parindent=0pt\obeylines
10 \ DEF FNA(X) = SQR(1.0-SIN(EXP(1)/2){\char94}2*SIN(X){\char94}2)
20 \ N=10 : A=0.0 : B=2.0*ATN(1.0)  
30 \ H=(B-A)/N : S = 0.0 : T = 0.0
40 \ FOR I=1 TO N STEP 2
50 \ X=A+(I-1)*H : S = S + FNA(X) : T = T + FNA(X)
60 \ X=A+I*H : T = T + 4.0*FNA(X)
70 \ X=A+(I+1)*H : S = S + FNA(X) : T = T + FNA(X)
80 \ NEXT I : S1 = H*S : S2 = H*S/3.0 : E = ABS((S1-S2)/(S1+S2))
90 \ IF (E>0.000001) THEN N = 2*N : GOTO 30
100 PRINT "Integral = "; T, N
}
\medskip\noindent
The answer is given immediately as
\smallskip
{\tt\parindent=0pt\obeylines
Integral = 1.054686 \qquad 20
}
\smallskip\noindent
From our previous calculations, we know that this is precisely the
correct number of subintervals for this precision.  It is a good idea
to examine an alternative means of coding the same program.  It runs
faster if we use a {\tt GOSUB} construction instead of the defined
function.  The answers will be identical and we can consult the table
prepared in paragraph 1.2 for numerical results.
\medskip
{\tt\parindent=0pt\obeylines
10 \ D=0.0001 : INPUT "A,B = ";A,B : N=10
20 \ N=N+10 : H=(B-A)/N : X=A : S=0.0 : T=0.0  
30 \ FOR I=1 TO N STEP 2
40 \ GOSUB 100 : S = S + Y*H/3 : T = T + Y*H/2 : X = X + H
50 \ GOSUB 100 : S = S + 4*Y*H/3 : T = T + Y*H : X = X + H
60 \ GOSUB 100 : S = S + Y*H/3 : T = T + Y*H/2 : NEXT I
70 \ IF ABS((S-T)/(S+T)) > D THEN 20
80 \ PRINT "Integral = "; T, N
90 \ PAUSE : INPUT "Error \% = ";D : GOTO 20
100 Y = SQR(1.0-SIN(EXP(1)/2){\char94}2*SIN(X){\char94}2) : RETURN
}
\medskip\noindent
This program gives one the ability to increase the error estimate and
thereby determine closely the exact number of steps required.
Using this program with a hand-held calculator will give a very
good qualitative insight into the workings of the algorithm.  The 
interested reader will later be invited to experiment with other
adaptive quadrature routines to determine which is better.
\vfill\eject
\noindent{\bf\llap{4.2\quad}An Adaptive Quadrature Routine.}  Suppose that we
have a function $f(x)$ defined over an interval $[a,b]$.  We would like to
compute the integral close to the actual value.  Let's assume that we know
that the function is bounded, $|f(x)| \le M$ for all $x\in[a,b]$.  Then, for
an answer within say 0.01\% of the actual error, we would choose
$$\epsilon = {0.0001 \over M + 1}.\eqno(68)$$
With an overall error of $\epsilon$, we want to divide up the interval
$[a,b]$ so that the total error of integration is less than $\epsilon$.
There is an error formula which depends on the value of the fourth 
derivative of $f(x)$ on $[a,b]$, but we will depend on another criterion.
If we choose Simpson's rule
$$A_2 = {h\over3}\big( y_0 + 4y_1 + 2y_2 + 4y_3 + y_4\big),\eqno(69)$$
then the doubled interval rule
$$A_1 = {2h\over3}\big( y_0 + 4y_2 + y_4\big),\eqno(70)$$
is a good approximant.  The estimate of the error is\footnote{\dag}{Francis
Scheid, {\it Schaum's Outline Series Theory and Problems of Numerical
Analysis 2nd edition}, (NY: McGraw-Hill Book Company, 1988), page 124.}
$$E_2 \simeq {|A_2-A_1| \over 15 }.\eqno(71)$$
This follows from a comparison of the two error estimates:
$$E_1 = -{(b-a)(2h)^4y^{(4)}(\xi_1)\over180}\qquad
  E_2 = -{(b-a)h^4y^{(4)}(\xi_2)\over180},\eqno(72)$$
where $E_1$ is the error estimate for Simpson's rule with an interval length
of $2h$ and $E_2$ is the error estimate for Simpson'r rule with interval
length of $h$.
We can accept a value of $A_2$ whenever $|A_2-A_1| \le 15\epsilon/2^k$
is reached, for an interval being halved $k$ times.
An alternate derivation\footnote{\ddag}{Mike Stimpson, ``Numerical Integration
Using Adpative Quadrature,'' {\it The C Users Journal}, May 1992, page 35.}
supposes that $A$ is the actual value of the integral and requires
$E_1 = |A-A_1|$, $E_2 = |A-A_2|$, where $A_1$ is Simpson's rule and $A_2$ is
the so-called five-point approximation (Simpson's rule with interval length
of $h$).  From the triangle inequality and the inequality
$$E_2 \le {1\over16}\,E_1,\eqno(73)$$
the result follows.  The programming makes use of a technique known as
{\sl recursion}.  Recursion occurs when a computer subroutine calls
itself.  The situation in analysis occurs when a function, say $f(x)=\sin(x)$,
is used to define $g(x) =\sin\big(\sin(x)\big)=f\big(f(x)\big)$. Now we will
examine a ``C'' program which uses recursion to do adaptive quadrature:
\medskip
{\tt\obeylines\parindent=0pt
\#include <math.h>
\#include <float.h>
\#include <stdio.h>
double adaptive (double a, double b, double (*f)(double x), double *err);
double f(double x);
int main() {\char123}
\ double a,b,*err,int1;
\ a = (double) 0.0;
\ b = (double) 2.0*atan(1.0);
\ *err = (double) 0.00000001;
\ printf("{\char92}nleft endpoint \ = \%.16lf",a);
\ printf("{\char92}nright endpoint = \%.16lf",b);
\ printf("{\char92}nerror estimate = \%.16lf",*err);
\ int1 = adaptive(a, b, f, err);
\ printf("{\char92}nIntegral \ \ \ \ \ \ = \%.16lf",int1);
\ printf("{\char92}nTheoretical \ \ \ = \%.16lf",1.0);
\ return(0);
{\char125}
double f(double x) {\char123}
\  double y;
\  y = sin(x);
\  return (y);
{\char125}
double adaptive (double a, double b, double (*f)(double x), double *err) {\char123}
\  double h, s1, s2, s3, s4, s5, error, err1, err2, t1, t2;
\  h = b - a;
\  s1 = (*f)(a);  s2 = (*f)(a+0.25*h);  s3 = (*f)(a+0.5*h);
\  s4 = (*f)(b-0.25*h);  s5 = (*f)(b);
\  t1 = h*(s1+4.0*s3+s5)/6.0;
\  t2 = h*(s1+4.0*s2+2.0*s3+4.0*s4+s5)/12.0;
\  error = fabs(t1-t2)/15.0;
\  if (error < *err) {\char123}
\quad    *err = error;  return(t2);
\quad {\char125}
\  else {\char123}
\quad   err1 = err2 = *err/2.0;  t2 = adaptive(a, a+0.5*h, f, \&err1);
\quad    t2 += adaptive(a+0.5*h, b, f, \&err2);  return(t2);
\quad  {\char125}
{\char125}
/* End of File */
}
\medskip
\noindent The output of the program consists of the endpoints, the error
estimate (maximum), the computed answer and the theoretical answer.  Here, we
know that $\int_0^{\pi/2} \sin(x)\,dx = 1$.  The quadrature should return an
answer close to 1; we see that it does.
\medskip
{\tt\obeylines
left endpoint \ = 0.0000000000000000
right endpoint  = 1.5707963267948970
error estimate  = 0.0000000100000000
Integral \ \ \ \ \ \ = 1.00000000025973520
Theoretical \ \ \ = 1.0000000000000000
}
%
\medskip
\noindent While the adaptive quadrature routine (AQR) is impressive, it often
fails to evaluate improper integrals.  If the integrand of an improper
integral experiences an anomoly (``blows up'') at a point, it is necessary
to compute successive approximations for values approaching that point.  There
is no ``quick and easy'' solution to improper integrals.  One cardinal rule is
to graph the function first.  It is frequently possible to determine the
correct approach from a graph when so-called adaptive methods fail.
\vfill\eject
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%                                                                     %%%%%
%%%%                   Chapter 5---Metafont                              %%%%%
%%%%                                                                     %%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\headline{\tenrm\hfill METAFONT}
\centerline{\bf Chapter 5}
\bigskip
\noindent{\bf\llap{5.0\quad}Introduction.}  The creation of fonts is
usually something not approached lightly.  With the typesetting software of
\TeX\ this is especially true.  The problem of merging text with graphics
has been resolved in the earlier chapters using the macros of \PiCTeX, with
varying degrees of success.  Now it is necessary to investigate the more
sophisticated means of doing graphics.  \PiCTeX\ can be compared to assembly
language; {\bf METAFONT} to machine code.  \PiCTeX\ is straightforward and
easy to use; however, it uses a great deal of randon access memory (RAM) and
it is slow.  {\bf METAFONT}, on the other hand, creates a font which is 
read in directly.  It uses little memory and is fast.  There is a price which
must be paid for using {\bf METAFONT}---it has to be prepared in its own
language and pre-processed through its own processor.
\medskip
\noindent
The public domain version of {\bf METAFONT} does not include a user-friendly
view program.  It is necessary to produce files and then process them through
a \TeX\ program named {\tt testfont.tex}.  The program {\tt testfont.tex} is
well-written and clever.  It produces excellent output; however, one must
have a commercial previewer to use it on-line.  Otherwise, the output must be
printed on a printer.  The public domain previewer, {\tt cdvi12.arc}, only
employs the sixteen basic \TeX\ fonts; {\bf METAFONT} actually produces 
{\sl new\/} fonts---that is, the graphics from {\bf METAFONT} are fonts.  Each
graph is actually a single letter or blocks of letters ``tiled'' together.
Donald E. Knuth is correct in saying, ``Geometric design are rather easy;
$\dots$.''  {\bf METAFONT} does allow the user a great deal of freedom in
constructing graphs.  There are pitfalls, however, and some experimentation
is needed to construct the desired output.
\medskip
\noindent
In {\bf METAFONT}, the subscripted variables $x_1$ and $y_1$ may be
written either as $x[1]$, $y[1]$ or as $x1$, $y1$; likewise, $x_2$, $y_2$ may be
written either as $y[2]$, $y[2]$ or $x2$, $y2$, and so on.  The
letter $z$ stands for the ordered pair $(x,y)$ (just as is the case
in complex analysis).  Thus $z_1$ is equivalent to $(x_1, y_1)$,
$z_2$ is equivalent to $(x_2, y_2)$, etc.  When we write $z3$ we mean
$(x3, y3)$ or $(x[3], y[3])$.  This shorthand aids in construction.
For those who recall FORTRAN (the FORmula TRANSlator of the 1960's),
the symbol {\tt **} meant exponentiation.  $x${\tt **}$y$ meant
$x^y$.  This was the first encounter many of us had with a ``two
character'' operator.  (The binary operators $+$, $-$, $*$ (times),
and $/$ were well understood.)  In BASIC (Beginner's All-purpose
Symbolic Instruction Code), FORTRAN's {\tt **} has been replaced by
the single character operator {\tt \char94}.  {\bf METAFONT} has
re-introduced the two character operator.  This time $z1${\tt..}$z2$
means draw a (curved) line from $z1$ ($z[1]$) to $z2$ ($z[2]$) and
$z1--z2$ means draw a straight line from $z1$ to $z2$.  The symbol
``$:=$'' means assignment while ``$=$'' means simple equality
(replacement).  Variables followed by the hash mark (American pound
sign) ({\tt\#}) are called ``sharped'' and have absolute values
(values that are resolution-independent) 
as opposed to other variables whose values are derived and
determined programmatically.
\vfill\eject
\noindent{\bf\llap{5.1\quad}A Metafont File.}  We will begin by
listing a simple {\bf METAFONT} file that will produce a small
Gaussian distribution. 
\medskip
\noindent
{\tt\obeylines\parindent=0pt\ttraggedright
\% The statement "mode\_setup" adapts METAFONT to the current task.
mode\_setup;
\ em\#:=30pt\#; cap\#:=10pt\#;
\% The statement "define\_pixels( , )" converts the values
\% \quad of its (two) arguments into pixel (picture element) units.
define\_pixels(cap, cap);
\% The "beginchar" operation assigns values to the variables
\% \quad w, h, and d, which represent the width, height,
\% \quad and deptgh of the current character's bounding rectagle.
beginchar("A",em\#,em\#,0); "A for Gaussian Distribution";
\ pickup pencircle scaled 0.4pt; \quad \% Use a pen of diameter 0.4 point.
\ z1=(0,0.39894b); 
\ z2=(0.1a,0.39695b);
\ z3=(0.2a,0.39104b);
\ z4=(0.3a,0.38138b);
\ z5=(0.4a,0.36827b);
\ z6=(0.5a,0.35206b); 
\ z7=(0.6a,0.33322b);
\ z8=(0.7a,0.31225b);
\ z9=(0.8a,0.28969b);
\ z10=(0.9a,0.26608b);
\ z11=(a,0.24197b); 
\ z12=(1.2a,0.19418b);
\ z13=(1.4a,0.14972b);
\ z14=(1.6a,0.11092b); 
\ z15=(1.8a,0.07895b);
\ z16=(2a,0.05399b); 
\ z17=(2.5a,0.01752b);
\ z18=(3a,0.00443b);
\ z19=(3.5a,0.00087b);
\ z20=(4a,0.0001338b);
\ a=w; b=8h;
\ draw z1..z2..z3..z4..z5..z6..z7..z8..z9..z10
\qquad        ..z11..z12..z13..z14..z15..z16..z17..z18..z19..z20;
\ for k=2 step 1 until 20: 
\ z[20+k]=(-x[k],y[k]); endfor
\ draw z1..z22..z23..z24..z25..z26..z27..z28..z28..z30
\qquad        ..z31..z32..z33..z34..z35..z36..z37..z38..z39..z40;
\ draw (0,-0.02b)--(0,0);
\ draw (1a,-0.02b)--(1a,0);
\ draw (2a,-0.02b)--(2a,0);
\ draw (3a,-0.02b)--(3a,0);
\ draw (4a,-0.02b)--(4a,0);
\ draw (-1a,-0.02b)--(-1a,0);
\ draw (-2a,-0.02b)--(-2a,0);
\ draw (-3a,-0.02b)--(-3a,0);
\ draw (-4a,-0.02b)--(-4a,0);
\ draw (-4a,0)--(4a,0);
\ draw (1.5a,0)--(1.5a,0.12952b);
\ endchar; \quad \% Write character to output file.
}
\medskip\noindent
It is not possible to display the above graph using the public domain
view program {\bf CDVI12.arc}.  {\tt CDVI-2.com} and the other previewers
use only the sixteen basic \TeX\ fonts.  Some discipline is needed to use
public domain software; it is not as flexible as the commercially procured
products.  However, there is much to be said for the persons and organizations
that devote their time and energy to the creation of public domain
instruments.  Without such volunteerism, many scholars and scientists would
simply be unable to afford needed tools.  Certain activities are not able
to budget for commercial software; governmental agencies are only able
to procure software within the strict limits of their charters---and then
only with ADPE (Automatic Data Processing Equipment) approval from their MIS
(Management Information System) directors.  Public domain materials are an
invaluable tool for research and development.  
\medskip\noindent
Since a public domain \TeX\ previewer is to be employed and only sixteen
fonts are allowed, one cannot view a newly created font.  The graphs
produced by {\bf METAFONT} are, essentially, new fonts.  In particular,
each graph is one letter in a font.  Some display of the {\bf METAFONT}
graph is needed;
therefore, a \PiCTeX\ picture is provided
to give the reader an idea of the construction.
%\vbox{\narrower\narrower\narrower\noindent{\bf 25.} The main point here
%is that you can't have \PiCTeX\ draw a curve by simply giving it the formula
%for that curve --- you have to supply explicit coordinate points.  Here
%is the code that produced the figure.}
%
\bigskip
\centerline{%
 \beginpicture %
   \setcoordinatesystem units <.5in,2.5in> %
   \setplotarea x from -3 to 3, y from 0 to .4 %
   \plotheading {\lines {%
     The density $\varphi(\zeta) = e^{-\zeta^2\!/2}/\sqrt{2\pi}$ of the\cr %
     standard normal distribution.\cr}} %
   \axis bottom ticks numbered from -3 to 3 by 1 %
     length <0pt> withvalues $\zeta$ / at 1.5 /  / %
   \linethickness=.25pt %
   \putrule from  1.5 0  to  1.5 .12952 % (.12952 = density at 1.5)
   \setbox0 = \hbox{$swarrow$}%
   \put {$\swarrow$ \raise6pt\hbox{$\varphi(\zeta)$}} %
     [bl] at 1.5 .12952 %
\ifexpressmode
  \put {\bf EXPRESSMODE} at 0 .2 %
\else
   \setquadratic \plot
     0.0      .39894
     0.16667  .39344  0.33333 .37738   0.5  .35207   0.66667  .31945 
     0.83333  .28191  1.      .24197   1.25 .18265   1.5      .12952 
     1.75     .08628  2.      .05399   2.25 .03174   2.5      .01753 
     2.75     .00909  3.0     .00443 /                              
   \setquadratic \plot
     0.0      .39894
    -0.16667  .39344 -0.33333 .37738  -0.5  .35207  -0.66667  .31945 
    -0.83333  .28191 -1.      .24197  -1.25 .18265  -1.5      .12952 
    -1.75     .08628 -2.      .05399  -2.25 .03174  -2.5      .01753 
    -2.75     .00909 -3.0     .00443 /                              
\fi
\endpicture } %
\smallskip
\centerline{\rm F{\sevenrm IGURE} 35}
%
%
\vfill\eject
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%                                                                     %%%%%
%%%%             Chapter 6---An Important Improper Integral              %%%%%
%%%%                                                                     %%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\def\Res{\mathop{\rm Res}} % Residue defined for use in integral.
\headline{\tenrm\hfill An Important Improper Integral}
\centerline{\bf Chapter 6}
\bigskip
\noindent{\bf\llap{6.0\quad}Introduction.}  An integral can be interpreted
geometrically as the area under a curve.  More specifically, it is the area
bounded by the $x$-axis, the lines $x=\alpha$ and $y=\beta$, and the curve.
Several things might go ``wrong'' with this idea: (1) the function might
not be continuous; (2) either $\alpha=-\infty$ or $\beta=+\infty$, or
both; or (3) the function to be integrated, known as the {\sl integrand},
might ``blow up'' (the function might increase without bound behaving as
$1/x$ near $0$).  In the first case, if the function is composed of
several continuous pieces and there is only a finite collection of
``jump'' discontinuities,
that is, if the function is piece-wise continuous,
then our usual methods apply.  For pathological cases of discontinuous
functions, there is an entire discipline of mathematics, Real Analysis.  The
principal thrust of Real Analysis follows trying to answer the question:
``under what conditions does $\int_\alpha^\beta f(x)\,dx$ make sense?''
Aside from some graphs and geometric persuasions, we won't concern ourselves
with this matter.  This chapter is concerned with the concept of the
improper integral.  One improper integral of particular interest has as
an integrand the function
$$f(x) = {1\over\sqrt{2\pi}}\,e^{-x^2/2}.\eqno(6.1)$$
On first glance, this would seem to be the ideal integral to focus on.
However,  integral has been tabulated and is available in standard tables books.
Another important integral, which will be the motivation for this chaper,
is
$$\int\limits_{\quad0}^{\qquad\infty}\, {\ln x\,dx \over x^2+a^2}.
\eqno(6.2)$$
This integral is important in physics, engineering, and applied
mathematics.  Its evaluation is found in some tables books; however,
it is not treated as a classical example in most texts.  For this reason,
an evaluation and an approximation will be done.  The computed value for
this {\sl definite\/} integral can be found in Schaum's Outline {\it Complex
Variables}\footnote{\dag}{Spiegel, Murray R., {\it Schaum's Outline Series
Theory and Problems of Complex Variables with an application to conformal
mapping and its applications}, (NY: McGraw-Hill, Inc., 1964), page 197.}
\smallskip\noindent
84. Prove that
$$\int_0^\infty {\ln x\,dx\over x^2+a^2}\ =\ {\pi\ln a\over 2a}.$$
\medskip
\hrule
\medskip
$$\beginpicture
  \setcoordinatesystem units <1in,1in>
  \setplotarea x from 0 to 3.0, y from 0 to 1.95
  \plotheading {An Integrable, Discontinuous Function} %
  \axis bottom label {F{\sevenrm IGURE} 36}
    ticks withvalues $\alpha$ $\beta$ / at .25 2.75 / / %
  \axis left ticks withvalues $y_{\rm min}$ $y_{k-1}$
    $y_k$ {\raise4pt\hbox{$y_{k+1}$}} 
    $y_{\rm max}$ / at .5 1.4 1.5 1.6 1.9 / / %
  \setshadegrid span <.02in> %
  \vshade 2.0 0.0 1.5  2.1 0.0 1.5 / %
  \vshade 1.25 0.9 1.0  1.5 0.9 0.9 / %
  \setdashes <2.5pt>
  \putrule from 2.75 0 to 2.75 1.9
  \putrule from 0 1.4  to 2.75 1.4
  \putrule from 0 1.5  to 2.75 1.5
  \putrule from 0 1.6  to 2.75 1.6
  \putrule from 0 1.9  to 2.75 1.9
  \putrule from 0 0.5  to 2.75 0.5
  \putrule from 2.0 0  to 2.0 1.5
  \putrule from 2.1 0  to 2.1 1.5
  \putrule from 0.25 0  to 0.25 1.9
  \put {$x$} at 3.15 0
  \put {$y$} at 0.1 2.0
  \setsolid
  \setquadratic
  \plot 0.75 1.0  1.0 1.9  1.25 1.0 / %
  \plot 0.25 1.0  0.5 0.5  0.75 1.0 / % 
  \setlinear
  \plot 1.5 0.9  2.3 1.7 / %
  \putrule from 2.3 1.7  to  2.6  1.7
  \putrule from 2.6 1.8  to  2.75 1.8 
\endpicture$$
\vfill\eject
\noindent{\bf\llap{6.1\quad}Theoretical Evaluation.}  
$$\beginpicture
  \setcoordinatesystem units <1in,1in>
  \setplotarea x from -2 to 2, y from -.5 to 2
  \axis bottom shiftedto y=0.0 ticks length <0pt> %
        withvalues $-R$ $\rightarrow$ $-\epsilon$ $\epsilon$ %
        $\rightarrow$ $R$ $x$ / at -1.5 -.75 -.25 .25 .75 1.5 2.0 /  / %
  \axis left shiftedto x=0.0 /
  \linethickness=1pt
\ifexpressmode
  \put {\bf EXPRESSMODE} at 1 1 %
\else
  \circulararc 180 degrees from 1.5 0 center at 0 0 %
  \circulararc 180 degrees from .25 0 center at 0 0 %
\fi
  \putrule from -1.5 0 to -.25 0 %
  \putrule from .25 0 to 1.5 0 %
  \put {$y$} [rb] at 0.1 2.0 %
  \put {Contour $\Gamma$} [lB] at 0.25 -0.50
  \put {F{\sevenrm IGURE} 37} [rB] at -0.25 -0.50
  \put {$\nwarrow$} at 1.25 1.25
  \put {$\swarrow$} at -1.25 1.25
\endpicture$$
Evaluate the (improper) integral
$$\int_0^\infty {\ln x\,dx\over x^2+a^2}\eqno(6.3)$$
using the methods of complex analysis.  This is the preferred means of
evaluation of such integrals by physicists and engineers.  We will consider
the above contour, $\Gamma$.  By a contour, we mean a simple closed curve,
in this case $\Gamma$, which is traversed in the positive sense.
If someone walked around this curve, he would not cross his own path
(the curve is simple); he would eventually return to his starting point
(the curve is closed); and, he would notice that the area contained within 
the curve would always be to his left (traversing in the positive sense).
The curve is also made up of pieces, each of which
is well-behaved in some mathematical sense. 
Let $C_1$ denote the outer semi-circle (of radius
$R$) and let $C_2$ denote the inner semi-circle (of radius $\epsilon$).
Then $\Gamma$ is composed of four parts:  (1) the interval along the
$x$-axis from $-R$ to $-\epsilon$, which is denoted
as $[-R,-\epsilon]$; (2) the semi-circle
$C_2$; (3) the interval $[\epsilon,R]$; and, (4) the semi-circle $C_1$.
We will use the so-called calculus of residues to compute
$$\int_{\Gamma}{\ln z\,dz\over z^2+a^2}\ =\ 
   2\pi i \Res_{z=ia}\left[{\ln(z)\over z^2+a^2}\right]
\ =\ {i\pi^2\over2a}\ +\ {\pi\cdot\ln a\over a}.\eqno(6.4)$$
$$\Res_{z=ia}\left[{\ln(z)\over z^2+a^2}\right]\ =\ \lim_{z\to ia}
(z-ia){\ln(z)\over(z-ia)(z+ia)}\ =\ {\ln(ia)\over2ia}
\ =\ {\ln(i)\over2ia}+{\ln(a)\over2ia}\ =\ {\pi\over4a}+{\ln a\over2ia}.$$
$$\int_{\Gamma}{\ln z\,dz\over z^2+a^2}
\ =\ \int_{-R}^{-\epsilon}{\ln z\,dz\over z^2+a^2}
\ +\ \int_{C_1}{\ln z\,dz\over z^2+a^2}
\ +\ \int_{\epsilon}^{R}{\ln x\,dx\over x^2+a^2}
\ +\ \int_{C_1}{\ln z\,dz\over z^2+a^2}.\eqno(6.5)$$
$$\int_{-R}^{-\epsilon}{\ln z\,dz\over z^2+a^2}
\ =\ \int_{-R}^{-\epsilon}{\ln|z|+i\arg(z)\over z^2+a^2}\,dz
\ =\ \int_{\epsilon}^{R}{\ln x\,dx\over x^2+a^2}
\ +\ i\pi\,\int_{\epsilon}^{R}{dx\over x^2+a^2},\eqno(6.6)$$
where $\arg z = \theta$.  We apply the well-known 
formula\footnote{$^*$}{Handbook of Mathematical, Scientific, and
Engineering, {\it etc.}, (Piscataway, NJ: REA, 1980), page 323}
%
$$\eqalignno{\int {dx\over x^2+a^2}
  \ &=\ {1\over a}\,\arctan\left({x\over a}\right)&(6.7)\cr
  \lim_{\epsilon\to0}\,\lim_{R\to\infty}\,\int_\epsilon^R
  {dx\over x^2+a^2}\ &=\ \lim_{R\to\infty}{1\over a}\arctan %
  \left({R\over a}\right)\ -\ \lim_{\epsilon\to0}{1\over a}\arctan %
  \left({\epsilon\over a}\right)\cr
  &=\ {1\over a}\,\arctan\left(\lim_{R\to\infty}{R\over a}\right)
  \ =\ {\pi\over2a}&(6.8)\cr}$$
\bigskip
\noindent
Claim
$$\lim_{\epsilon\to0}\,\int_{C_2} {\ln z\,dz\over z^2+a^2}\ =\ 0
\qquad\hbox{and}\qquad \lim_{R\to\infty}\,\int_{C_1}{\ln z\,dz\over %
z^2+a^2}\ =\ 0.\eqno(6.9)$$
This is an interesting problem because one inequality will suffice for
both integral estimates:
$$\left|\ln a\right|\ =\ \big|\ln\left|z\right| + i\arg z\big|
\ =\ \sqrt{(\ln\rho)^2 + \theta^2},\eqno(6.10)$$
where $z=\rho e^{i\theta}$ and $\theta=\arg z$.  The variable $\theta$
lies in the interval $[0,\pi]$, that is $0\le\theta\le\pi$.  When
$\rho=R$, $z=Re^{i\theta}$, we choose $R>\pi$ so that
$$\big|\ln z\big|\ =\ \sqrt{(\ln R)^2 + \theta^2}
  \le \sqrt{(\ln R)^2+\pi^2} \le \sqrt{2}\,\ln R.\eqno(6.11)$$
We also require that $R>|a|$ so that
$$0 < {1\over R^2+|a|^2} \le {1\over R^2-|a|^2},\eqno(6.12)$$
which follows from the inequality $\big|R^2-|a|^2\big|
\le \big|R^2e^{2i\theta} + a^2\big|$.  The modulus (absolute value) of
the contour integral
$$\int_0^\pi {\ln Re^{i\theta}\,dz\over R^2e^{2i\theta} + a^2}
  \eqno(6.13)$$
can thus be bounded
$$\eqalign{ \left|\int_{C_1} {\ln z\,dz\over z^2+a^2}\right|
\ &=\ \left|\int_0^\pi {\ln(z)Re^{i\theta}\,d\theta \over 
R^2e^{2i\theta}+a^2}\right|\cr 
  &\le\ \int_0^\pi {|\ln(z)|\cdot R\,d\theta\over R^2-|a|^2}\cr
  &\le\ \int_0^\pi {\sqrt{2}\ln(R)\cdot R\,d\theta \over 
  R^2 -|a|^2}\cr
  &=\ {\pi\sqrt{2}\,\ln(R)\cdot R\over R^2-|a|^2}.\cr}\eqno(6.14)$$
And, taking the limit
$$\lim_{R\to\infty} {\pi\sqrt{2}\,\ln(R)\cdot R\over R^2-|a|^2}
\ =\ 0.\eqno(6.15)$$
In the second contour,
$$\eqalign{\left|\int_{C_2} {\ln z\,dz\over z^2+z^2}\,\right|
\ &=\ \left|\int_0^\pi {\ln(Re^{i\theta})\cdot Re^{i\theta}\,d\theta
\over \big(R^2e^{2i\theta}+a^2\big)}\right|
\ \le\ \int_0^\pi {\sqrt{2}\,|\ln(1/\epsilon)|\cdot \epsilon\,d\theta 
  \over \big(|a|^2 - \epsilon^2\big)}\cr
 &=\ {\pi\sqrt{2}\cdot\ln(1/\epsilon)\cdot\epsilon
  \over \big(|a|^2 - \epsilon^2\big)},\cr}\eqno(6.16)$$
by choosing $1 > \epsilon > 0$ so that $|a| > \epsilon$ and
$\ln\epsilon < -\pi$.  This follows from
$$z=\epsilon e^{i\theta},\qquad |z| = \sqrt{|\ln\epsilon|^2+\theta^2},
\quad \hbox{for}\quad 0\le\theta\le\pi$$
and
$$\big|\ln z\,\big| =  \sqrt{|\ln\epsilon|^2+\theta^2}
\le \sqrt{|\ln\epsilon|^2+\pi^2} \le \sqrt{|\ln\epsilon|^2+\epsilon^2}
= \sqrt{2}\,\ln(1/\epsilon).\eqno(6.17)$$
And, in the limit as $\epsilon\to0$,
$$\lim_{\epsilon\to0}{\pi\sqrt{2}\cdot\ln(1/\epsilon)\cdot\epsilon
\over\big(|a|^2-\epsilon^2\big)}\ =\ 0\quad\equiv\quad
\lim_{\delta\to0}{\ln(1/\delta)\over1/\delta}\ =\ 0.\eqno(6.18)$$
\vfill\eject
\noindent{\bf\llap{6.2\quad}Numerical Evaluation.}  
We will apply a program using the adaptive quadrature routine.\footnote{\dag}
{Mike Stimpson, ``Numerical Integration Using Adaptive Quadrature,''
{\it The C Users Journal}, May 1992, pages 31--36.}  For the value of
$a$ in equation (6.3), let $a=2$ so that the
theoretical solution will be $\pi\ln(a)/(2a)$ $=$ 
$\arctan(1)\cdot\ln(2)$.  The output from a computer program is
\medskip
{\tt\obeylines
left endpoint \ = 0.000000500000000
right endpoint = 10000.00000000000000000
error estimate = 0.0000000010000000
Integral \ \ \ \ \ \ = 0.5442733355868506
Theoretical \ \ \ = 0.5443965225759005
Program returned \ (0). \ Press any key
}
\medskip
\noindent We must employ ``C'' because BASIC does not support recursion.  The
program listing is as follows:
\smallskip
{\tt\parindent=0pt\obeylines
\#include <math.h>
\#include <float.h>
\#include <stdio.h>
double adaptive (double a, double b, double (*f)(double x), double *err);
double f(double x);
int main() {\char123} 
\ double a,b,*err,int1;
\ a = (double) 0.0000005;  b = (double) 100000.0;
\ *err = (double) 0.00000001;
\ printf("{\char92}nleft endpoint \ = \%.16lf",a);
\ printf("{\char92}nright endpoint = \%.16lf",b);
\ printf("{\char92}nerror estimate = \%.16lf",*err);
\ int1 = adaptive( a, b, f, err);
\ printf("{\char92}nIntegral \ \qquad \   = \%.16lf",int1);
\ printf("{\char92}nTheoretical \quad\  = \%.16lf",atan(1.0)*log(2.0));
\ return(0); 
{\char125}
double f(double x) {\char123}   
\quad double y;
\quad  y = log(x)/(x*x+4.0);
\quad  return (y); 
{\char125}
double adaptive (double a, double b, double (*f)(double x), double *err) {\char123}
\  double h, s1, s2, s3, s4, s5, error, err1, err2, t1, t2;
\  h = b - a;
\  s1 = (*f)(a);  s2 = (*f)(a+0.25*h);  s3 = (*f)(a+0.5*h);
\  s4 = (*f)(b-0.25*h);  s5 = (*f)(b);
\  t1 = h*(s1+4.0*s3+s5)/6.0;
\  t2 = h*(s1+4.0*s2+2.0*s3+4.0*s4+s5)/12.0;
\  error = fabs(t1-t2)/15.0;
\  if (error < *err)  {\char123}  *err = error;  return(t2); {\char125}
\  else  {\char123}
\quad   err1 = err2 = *err/2.0;  t2 = adaptive(a, a+0.5*h, f, \&err1);
\quad   t2 += adaptive(a+0.5*h, b, f, \&err2);  return(t2); {\char125}
{\char125}
/* End of File */
}
\vfill\eject
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%                                                                     %%%%%
%%%%                   APPENDICES                                        %%%%%
%%%%                                                                     %%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\headline{\tenrm\hfill Mathematical Preliminaries}
\centerline{\bf Appendix A}
\bigskip
\noindent
Sometimes the most advanced concepts in mathematics are most quickly grasped
by those with the least formal training.  This may be because the tyro has
less conditioning and has formed fewer bad habits.  This is particularly true
as regards variables with subscripts and superscripts.  The algebra student
who first encounters subscripted and superscripted variables is drilled on
the idea that the superscript denotes successive multiplication, that is
$$x^n = \underbrace {x\cdot x\cdots x}_{n\ \hbox{\sevenrm times}}.
\eqno{\rm(A.1)}$$
This follows from the explanation that $x^2 = x\cdot x$, $x^3 = x\cdot x
\cdot x$, and so on.  The idea that $x^i$, for $i$ an {\it index\/} and not
an integer appears later (in tensor analysis).  Then there's the subscript.
If we have a collection of data, 
where $x$ stands for a single data element, then
it is customary to denote the data by subscripting the symbol $x$.  So, the
set of data is formed by collecting together $x_1$, $x_2$, $\dots$.  There
is a need to group this set, so braces ({\tt \char123 \char125}) are used.
And, for appearance sake, the braces are typeset slightly larger than
the symbols, ($\{ \}$).  The variables and other mathematical entities are
usually set in {\it italics\/} and the vectors in {\bf bold face}.  Functions,
such as $\sin$ and $\cos$, are usually typeset in the standard roman font.
All these things are recognized by the student and accepted, usually without
needing to be verbalized.  As a student progresses from algebra through
calculus, she/he develops more and more of a mind-set towards the symbols
and their appearance.  Then there is the rude awakening of tensor analysis.
Here superscript means coordinate and not power and there is a special system
for arranging the symbols.  The beginning algebra student would have no
problem whatsoever with this arrangement;  however, the math major suffers
no end of frustration.  Those familiar with chemistry and physics are aware
of the conventions associated with isotopes.  There are not only subscripts
and superscripts following the element symbol, but preceeding it as well.
Something like ${}^2_1{\rm H}^{+}$ means the hydrogen (deuterium) ion in 
chemistry.  The idea that elements can be arranged according to their atomic
number via subscripts preceeding the element symbol is easy for the beginner.
Later, learning such conventions becomes harder and harder.
$$ {}_1{\rm H},\,{}_2{\rm He},\,{}_3{\rm Li},\,\hbox{\it etc.}
\qquad {}^1_1{\rm H},\,{}^2_1{\rm H},\,{}^3_1{\rm H},\,
{}^3_2{\rm He},\,{}^4_2{\rm He},\,\dots.\eqno{\rm(A.2)}$$
Mathematicians and theoretical physicists sometimes align the
the subscripts and superscripts in tensor notation to indicate a
priority.  The symbol here is $S$ and it's contravariant parts are
indexed $j,k$ while its covariant parts are indexed $i,,,\ell$.
$$S_i{}^{jk}{}_\ell.$$
This is very different from
$$S_i{}^j{}_{k\ell}.$$
\bigskip
\noindent
Enough on the topic of subscripts and superscripts.  Now let's consider a
more difficult situation.  Small children understand so well that a 
number such as 0.9999999999 is close to 1---but it is not exactly one.
Only with the introduction of the ellipsis ($\dots$) comes true confusion
and misunderstanding.  And just what is this ellipsis?  An ellipsis is
a useful notation for something, clearly understood, that is left out.
If we wish to write the alphabet, for example, we all know the letters
$$\hbox{\tt ABCDEFGHIJKLMNOPQRSTUVWXYZ}.$$
So, we usually just write {\tt ABC$\ldots$XYZ} or even {\tt ABC$\ldots$Z}.
And what permits us to do this?  Well, as long as we know exactly what
the omitted objects are, there is no problem.  Let's dwell on this matter
for a while.  If someone writers {\tt ABC$\ldots$Z}, then it's clear that
the first letter omitted in the ellipsis is {\tt D} and the last letter
omitted is {\tt Y}.  So, we have the situation that {\tt ABC$\dots$Z}
is exactly the same as {\tt ABCD$\ldots$YZ}.  But this is a giant step.
The next big jump in using an ellipsis is leaving out the closing
element.  Consider {\tt ABC$\dots$}.  Now {\tt ABC$\dots$} means
{\tt ABCD$\dots$}, and so on.  But there is no terminal letter present,
no {\tt Z}.  Maybe the last letter in the ellipsis is {\tt Z}, maybe
not.  There is uncertainty.  So we feel uncertain about {\tt ABC$\dots$}
but not about {\tt ABC$\ldots$XYZ}.  And this uncertainty carries over to
such things as $x_1, x_2, \dots$ as opposed to $x_1, x_2, \ldots, x_{12},
x_{13}$.  Now let's apply all of this to numbers.  Consider
$0.4999\dots$.  I claim that this number is exactly the same as
$0.5$ and $0.5000\dots$.  Maybe it's not correct to say {\sl exactly the
same}, maybe it's more correct to say ``equivalent to.''
$$\eqalign{x\ &=\ 0.4999\dots\cr
           10x\ &=\ 4.9999\dots\cr
           10x-x\ &=\ 4.5000\dots\cr
           9x\ &=\ 4.5\dots\cr
            x\ &=\ {4.5\over9}\ =\ 0.5\cr}\eqno{\rm(A.3)}$$
Is there any difference between $0.5000\dots$ and $0.5$?  Clearly, there
is a difference in the way the two are {\sl presented}.  But, aside from
that, is there any real difference?  Anyone would say, ``no.''  In fact,
scientific calculators all substitute trailing blanks for trailing
zeros for ``exact calculations.''  So, it's easy to understand that
$0.5000\dots$ and $0.5$ are exactly the same---it is not at all 
transparent that $0.5$ and $0.4999\dots$ are the same.  There seems to be
something else going on here.  And that's just what we're going to explore.
But first, there is never any problem understanding that $0.5$, $.5$,
$+0.5$, and $+.5$ represent exactly the same number, $1\over2$. The leading
plus sign ($+$) is always understood, in fact it looks funny when it
is present---like a printout from some old manual calculator.
There is a problem with understanding why $0.5$ and $0.4999\dots$
are equivalent.  It is deep, subtle, and elusive.  The mathematician
has a name for this phenomenon---{\it limit}.  This is some finality in
that word also, as in the expression ``that's the limit.''
It's easy to see that $0.5$ can also be written $1\over2$.  What about
$0.4999\dots$?  This will prove to be the key:
$$\eqalign{0.49\ &=\ {1\over2} - {1\over100}\cr
           0.499\ &=\ {1\over2} - {1\over1000}\cr
           0.4999\ &=\ {1\over2} - {1\over10000}\cr
           \vdots\ &=\ \vdots\cr
           0.4999\dots\ &=\ {1\over2} -\lim_{n\to\infty} {1\over10^n}\cr}
\eqno{\rm(A.4)}$$
This all looks very complicated, especially the gaggle of symbols
$$\lim_{n\to\infty} {1\over10^n}.\eqno{\rm(A.5)}$$
Here is that funny symbol ``$\infty$'' which is known as the infinity
sign, or lemniscate, or ``lazy-eight'' in Texas-talk.  This lemniscate is
also seen in the occult hovering over the head of the magician.  And
what is one to make of the arrow in the expression $n\to\infty$?  Well, if
it is true that $0.4999\dots$ $\equiv$ $0.5$, then that cluster of symbols
better be equal to zero.  It turns out that it is:
$$\lim_{n\to\infty} {1\over10^n}\ =\ 0.\eqno{\rm(A.6)}$$
This looks too deep for mathematical preliminaries.  The point is that
the innocuous, harmless equivalence $0.4999\dots$ $\equiv$ $1\over2$
is more than an oddity.  It is a model, or paradigm, of a much deeper
mathematical situation.  And this reasoning is at the very foundation of
Simpson's Rule and Newton's Method.  As soon as the ellipsis is removed,
the answer is no longer exact, it is an approximation.  Only in the world
of theory does the ellipsis make sense.  What has happened is that
mathematicians have taken a literary device and extended it's application
into another realm.  And it all looked so simple with {\tt ABC\dots}!
\medskip
\noindent
It's easy to convince engineers that $1\over3$ is $0.333\dots$ and
$2\over3$ is $0.666\dots$.  However, it's no so easy to convince
them that $1\equiv0.999\dots$.  Even so, consider
$$\eqalign{  {1\over3}\ &=\ 0.333\dots\cr
             {2\over3}\ &=\ 0.666\dots\cr
           1\ =\ {1\over3}+{2\over3}\ &=\ 0.333\dots\ +\ 0.666\dots
                     \ =\ 0.999\dots\cr}\eqno{\rm(A.7)}$$
\vfill\eject
\headline{\tenrm\hfill References}
\centerline{\bf Appendix B}
\medskip
\centerline{\bf References}
\bigskip
{\narrower\noindent
\llap{[1]\quad}Eisberg, Robert and Robert Resnick, {\it Quantum Physics of
Atoms, Molecules, Solids, Nuclei, and Particles, Second Edition}.
NY: John-Wiley \& Sons, 1985.
\medskip\noindent
\llap{[2]\quad}{\it Handbook of Mathematical, Scientific, and Engineering
Formulas, Tables, Functions, Graphs, Transforms}. Piscataway, NJ: Research \&
Education Assoc., 1972.
\medskip\noindent
\llap{[3]\quad}James, Glenn and Robert C. James, 
{\it Mathematics Dictionary---Multilingual Edition}.
Princeton, NJ: D. Van Nostrand Company, Inc., 1959.
\medskip\noindent
\llap{[4]\quad}Knuth, Donald E., {\it The \TeX book}. Reading, Massachusetts:
Addison-Wesley Publishing Company, 1970.
\medskip\noindent
\llap{[5]\quad}Kreyszig, Erwin, {\it Advanced Engineering Mathematics, 5th
Edition}.  NY: John Wiley \& Sons, 1983
\medskip\noindent
\llap{[6]\quad}{\it Numerical Analysis, 2nd Edition}. Schaum's Outline Series,
edited by Francis Scheid.
NY: McGraw-Hill Book Company, 1968.
\medskip\noindent
\llap{[7]\quad}Olmsted, John M. H., {\it Advanced Calculus}.
NY: Appleton-Century-Crofts, Inc., 1961.
\medskip\noindent
\llap{[8]\quad}Stoer, J. and R. Bulirsch, {\it Introduction to Numerical
Analysis}.  NY: Springer-Verlag, 1980.
\medskip\noindent
\llap{[9]\quad}Taylor, John R., {\it An Introduction to Error Analysis---The
Study of Uncertainties in Physical Measurements}.
Mill Valley, CA: University Science Books, 1982.
\medskip\noindent
\llap{[10]\quad}Thomas, George B., {\it Calculus and Analytic Geometry, 2nd
Edition}.  Reading, Massachusetts: Addison-Wesley Publishing Company, Inc.,
1966.
\medskip\noindent
\llap{[11]\quad}{\it Webster's Ninth New Collegiate Dictionary}. Springfield,
Massachusetts: Merriam-Webster, Inc., 1984.
\medskip
}
{\noindent\hskip-6pt
[12]\quad{Wichura, Michael J., {\it The\/} \PiCTeX\ {\it Manual},
\TeX\-niques, Publication for the \TeX\ Com- \hfill}\break
\indent  munity, Number 6.  Providence, RI: \TeX\ User Group, 1987.
}
\vfill\eject
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%                                                                     %%%%%
%%%%                   Appendix C                                        %%%%%
%%%%                                                                     %%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\headline{\tenrm\hfill A Geometric Persuasion}
\centerline{\bf Appendix C}
\medskip
\centerline{\bf A Geometric Persuasion}
\bigskip
\rightline{ 6/28/89}
\rightline{ H. Watson}
\rightline{ Corona, CA}
\medskip
\noindent
Assuming that mass is volume times density, consider first a sphere
of radius $r_0$,
$$m = \rho V = \rho\, {4\over3}\, \pi r_0^3. \eqno{\rm(C.1)}$$
$$\beginpicture
  \setcoordinatesystem units <0.5in,0.5in>
  \setplotarea x from -1.0 to 1.0, y from -1.0 to 1.0
\ifexpressmode
  \put {\bf EXPRESSMODE} at 0 0
\else
  \circulararc 360 degrees from 1.0 0.0 center at 0.0 0.0
  \arrow <10pt> [.2,.4] from 0 0 to 0.7071 0.7071
\fi
  \put {$r_0$} [b] at 0.0 0.2
\endpicture $$
For a charged sphere of radius $r_0$, suppose a standing wave exists
on its surface.  Then, due to energy absorption the wave ``splits'' forming
two surfaces:  an {\sl inner surface\/} whose circumference is twice the
radius, $r_0$, and an {\sl outer\/} whose radius is twice the circumference,
$2\pi r_0$.  Thus
$$R = 2\cdot2\pi r_0 = 4\pi r_0 \eqno{\rm(C.2)}$$
and
$$r = {2 r_0 \over 2 \pi} = {r_0 \over \pi}. \eqno{\rm(C.3)}$$
$$\beginpicture
  \setcoordinatesystem units <1in,1in>
  \setplotarea x from -1.0 to 1.0, y from -1.0 to 1.0
\ifexpressmode
  \put {\bf EXPRESSMODE} at 0 0
\else
  \circulararc 360 degrees from 1.0 0.0 center at 0.0 0.0
  \circulararc 360 degrees from 0.2 0.0 center at 0.0 0.0
  \arrow <10pt> [.2,.4] from 0 0 to 0.7071 0.7071
  \arrow <5pt>  [.2,.4] from 0 0 to 0.2 0.0
  \setdashes
  \circulararc 360 degrees from 0.5 0.0 center at 0.0 0.0
\fi
  \put {$R$} [rb] <-2pt,0pt> at 0.5 0.5
  \put {$r$} [t] <0pt,-3pt> at 0.1 0.0
\endpicture $$
The resulting configuration ``collapses'' to a figure whose
radius is $R_0 = 4\pi r_0 - {r_0\over\pi}$ and containing
a ``cone'' whose base is $4\pi r^2 = 4\pi \big(r_0/\pi\big)^2
= 4r_0^2/\pi$ and height $R_0$.  The (area of the) base of the
cone equals the surface (area) of the inner sphere:
$4\pi r^2$ $=$ $4\pi\big(r_0/\pi\big)^2$ $=$
$4r_0^2/\pi$.  The volume of the collapsed figure
is, then,
$$ V_1 = {4\over3}\pi R_0^3 - {1\over3}bh
     = {4\over3}\pi\left[ 4\pi r_0 - {r_0\over\pi}\right]^3
    - {4\over3}\pi\left[{\big(r_0/\pi\big)}^2\right]
       \cdot\left[4\pi r_0 - {r_0\over\pi}\right]. \eqno{\rm(C.4)}$$
$$V_1 = {4\over3}\pi\left[{\left(4\pi - {1\over\pi}\right)}^3
    - \left(4\pi-{1\over\pi}\right){\left({1\over\pi}\right)}^2\right]
     \,r_0^3\eqno{\rm(C.5)}$$
Define $V_0 = {4\over3}\pi r_0^3$ and look at the ratio
$$\eqalign{{V_1\over V_0}\ &=\ \left(4\pi-{1\over\pi}\right)^3
  - \left(4\pi-{1\over\pi}\right)\left({1\over\pi^2}\right)\cr
\ &=\ \left(64\pi^3 -48\pi+{12\over\pi}-{1\over\pi^3}\right)
  - {4\over\pi} + {1\over\pi^3}\cr
  = 64 \pi^3 - 48 \pi + {8\over\pi}.\cr}\eqno{\rm(C.6)}$$
Compute (C.6) numerically:
$${V_1\over V_0} = 1836.151739256348 \eqno{\rm(C.7)}$$
which lies between the (two) most recent values for the 
proton-electron ratio: 
$$1836.15152\ <\ {V_1\over V_0}\ <\ 1836.152701.\eqno{\rm(C.8)}$$
\medskip
\noindent
$$ 64\pi^3 - 48\pi + {8\over\pi} =
   \left(4\pi\right)\cdot\left(4\pi-{1\over\pi}\right)\cdot
   \left(4\pi-{2\over\pi}\right) \eqno{\rm(C.9)}$$
\medskip
\hrule
\medskip
\noindent Looking at $A= 4r_0^2/\pi$ as the surface area of the inner sphere and
$S=4\pi R_0^2$ as the surface area of the larger sphere, we can also compute
$V_1/V_0$ by assuming that that portion of the solid sphere of radius $R_0$
subtended from the center to the area $A$ does not contribute to the total
volume.  This is to say that the volume element whose base 
region {\sl lying on the surface of the sphere\/} has the same volume as a
right circular cone, with equal base area.
$$ {V_1\over V_0}\ =\ \left(1-{A\over S}\right)\cdot R_0^3.\eqno{\rm(C.10)}$$
Direct numerical calculation will reveal that
$$\left(1-{4/\pi\over4\pi(4\pi-1/\pi)^2}\right)\cdot
  \left(4\pi-{1\over\pi}\right)
  \ \approx\ 1836.151738,\eqno{\rm(C.11)}$$
or the same result as equation (C.7).  
\vfill\eject
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%                                                                     %%%%%
%%%%                   Appendix D                                        %%%%%
%%%%                                                                     %%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\headline{\tenrm\hfill A Contour Integral}
\centerline{\bf Appendix D}
\medskip
\centerline{\bf A Contour Integral}
\bigskip
\rightline{20 Feb 92}\par
\vskip6pt\noindent
The residue of $f(z)$ at infinity is defined as follows:  If $f(z)$ has an
isolated singularity at infinity, and if $C$ is a large circle which encloses
all the singularities of $f(z)$ except $z=\infty$, then the residue at
$z=\infty$ is defined to be
$${1\over2\pi i}\,\int_C f(z)\,dz\eqno{\rm (D.1)}$$
taken around $C$ in the negative sense (negative with respect to the
origin), provided the integral has a definite value.  If we apply the
transformation $z=1/\zeta$ to the integral it becomes
$${1\over2\pi i}\,\int -f\left({1\over\zeta}\right)\,{d\zeta\over\zeta^2}
\eqno{\rm (D.2)}$$
taken positively around a small circle centered at the origin.
%
$$\beginpicture    
%\ninepoint
    \setcoordinatesystem units <1.00in,1.00in>
    \setplotarea x from -2.25 to 2.25, y from -2.25 to 2.25 %
    \plotheading {\lines{%
      The Beta function \cr
      $\int_0^1 x^\alpha(1-x)^{1-\alpha}\,dx$.\cr}}
    \put {$\scriptscriptstyle\cdot$} at 0 0 %
    \put {$\scriptscriptstyle\cdot$} at 1 0 %
%    \put {$\curvearrowleft$} at -.01 0.02 %
%    \put {$\curvearrowleft$} at 0.99 0.02 %
%    \replot "beta.t01" %
%    \savelinesandcurves on "beta.t01"
%    \writesavefile {Feb 20, 1992 Homework}
\ifexpressmode
  \put {\bf EXPRESSMODE} at -1 -1
\else
    \circulararc 360 degrees from 2 0 center at 0 0 %
    \circulararc 350 degrees from 0.9 -.01 center at 1 0 %
    \circulararc 350 degrees from 0.1 0.01 center at 0 0 %
    \plot  1.1 0.0     2   0.0   / %
    \plot  0.9 0.01    0.1 0.01  / %
    \plot  0.9 -.01    0.1 -.01  / %
\fi
    \put {$\leftharpoonup$} [b] at 1.5 .015 %
    \put {$\rightharpoondown$} [t] at 1.5 -.02 %
    \put {$\leftharpoonup$} [b] at 0.5 .015 %
    \put {$\rightharpoondown$} [t] at 0.5 -.03 %
    \put {$\uparrow$} [l] at -2 0 %
    \put {$\searrow$} at 1.38 1.38 %
    \put {$\swarrow$} at 1.38 -1.38 %
    \put {${\bf C}$ $=$ Complex Plane} at 2 2 %
    \put {$f(z)=z(1/z-1)^{1-\alpha}$} at 0 1 %
    \put {$\partial D(0,R)$\lower6pt\hbox{$\searrow$}} [rb] at -1.42 1.42 %
\endpicture $$
\vskip6pt\noindent
$$\int\limits_{\partial D(0,R)} z^\alpha(1-z)^{1-\alpha}\,dz
  - \int_0^1 x^\alpha(1-x)^{1-\alpha}\,dx + e^{2\pi i\alpha}\,
           \int_0^1 x^\alpha(1-x)^{1-\alpha}\,dx = 0.
\eqno({\rm D.3})$$
Let $I= \int_0^1 x^\alpha(1-x)^{1-\alpha}\,dx$.  
$$\eqalign{I - e^{2\pi i\alpha}\,I 
\ &=\ \int\limits_{\partial D(0,R)} z^\alpha(1-z)^{1-\alpha}\,dz\cr
  &=\ 2\pi i \,\Res_{\infty}(f).\cr}\eqno{\rm (D.4)}$$
Let $h(w)=(w-1)^\beta$, where $w=1/z$ and $\beta=1-\alpha$.
$$\eqalign{h(w)\ &=\ (w-1)^\beta\cr
          h'(w)\ &=\ \beta(w-1)^{\beta-1}\cr
         h''(w)\ &=\ \beta(\beta-1)(w-1)^{\beta-2}\cr
                 &\vdots\cr
     h^{(n)}(w)\ &=\ \beta(\beta-1)\cdots \cr
                 &\qquad(\beta-n+1)(w-1)^{\beta-n}\cr
                 &\vdots\cr}
\qquad
\eqalign{h(0)\ &=\ (-1)^\beta\cr
        h'(0)\ &=\ (-1)^{\beta-1}\beta\cr
       h''(0)\ &=\ (-1)^{\beta-2}\beta(\beta-1)\cr
               &\vdots\cr
   h^{(n)}(0)\ &=\ (-1)^{\beta-n}\beta(\beta-1)\cdots\cr
               &\qquad(\beta-n+1)\cr
               &\vdots\cr}$$
$$h(w)=h(0)+wh'(0) + {w^2\over2}h''(0)+ \ldots+{w^n\over n!}h^{(n)}(0)+\dots$$
$$\eqalign{h(w)
\ &=\ (-1)^\beta + w\beta(-1)^{\beta-1}+{w^2\over2}\beta(\beta-1)+\ldots
    +(-1)^{\beta-n}{w^n\over n!}\beta(\beta-1)\cdots(\beta-n+1)+\dots\cr
\ &=\ (-1)^\beta\left(1 -w\beta+{w^2\over2}\beta(\beta-1)
           +\ldots+(-1)^n{w^n\over n!}
     \beta(\beta-1)\cdots(\beta-n+1)+\dots\right)\cr}$$
$$h(1/z)
=\ (-1)^\beta\left(1 -{\beta\over z}+{\beta(\beta-1)\over 2z^2}
      +\ldots+(-1)^n{\beta(\beta-1)\cdots(\beta-n+1)\over z^n n!}
        +\dots\right)$$

$$\eqalign{f(z)\ &=\ zh(1/z)\cr
\ &=\ (-1)^\beta\left(z -\beta+{\beta(\beta-1)\over2}{1\over z}
       +\ldots+(-1)^n{\beta(\beta-1)\cdots(\beta-n+1)\over z^{n-1}n!}
       +\dots\right)\cr}$$
Therefore,
$$\eqalign{\Res_{z=\infty} f(z)\ =\ \Res_{\infty} (f)
\ &=\ -a_{-1}=(-1)^{\beta-1}{\beta(\beta-1)\over2}\cr
\ &=\ (-1)^{-\alpha}{\alpha(1-\alpha)\over2}\cr
\ &=\ e^{-i\pi\alpha}{\alpha(1-\alpha)\over2}}\eqno{\rm (D.5)}$$
Now we apply the results of our calculation of the residue in
equation (D.5) to equation (D.1):
$$I\cdot(1-e^{i2\pi\alpha}) = 2\pi
i\cdot e^{-i\pi\alpha}\cdot\left({\alpha(1-\alpha)\over2}\right)$$
Now we equate the real parts:
$$I\cdot\big(1-\cos(2\pi\alpha)\big) = \pi\sin(\pi\alpha)\alpha(1-\alpha)$$
$$1-\cos 2\theta = 2\sin^2\theta$$
$$I\cdot\big(2\sin^2(\pi\alpha)\big) 
= \pi\sin(\pi\alpha)\alpha(1-\alpha)$$
$$I = {\pi\alpha(1-\alpha)\over2\sin(\pi\alpha)}.\eqno{\rm (D.6)}$$
Equation (D.6) is the sought-for result.
\vfill\eject
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%                                                                     %%%%%
%%%%                   Appendix E                                        %%%%%
%%%%                                                                     %%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\headline{\tenrm\hfill Malgrange-Ehrenpreis Theorem}
\centerline{\bf Appendix E}
\medskip
\centerline{\bf The Malgrange-Ehrenpreis Theorem}
\bigskip
\mathsurround=1pt
\def\bull{\vrule height .9ex width .8ex depth -.1ex} % square bullet
\proclaim The Malgrange-Ehrenpreis Theorem.  If $p(D)$ is a
constant coefficient partial differential operator on ${\bf R}^n$,
then there is a distribution $E\in\cal D'$ such that
$p(D)E =\delta$.  {\rm (See [5], page 48.)}\par

\vskip6pt\noindent
{\bf Definitions:}  We are using the standard multi-index notation.  
A multi-index 
$$\alpha = \langle \alpha_1,\ldots,\alpha_n\rangle$$
is an $n$-tuple of nonnegative integers.  The collection of all such
multi-indices will be denoted by $I_{+}^n$.  The symbols $|\alpha|$,
$x^\alpha$, $D^{\alpha}$, and $x^2$ are defined thusly:
$$\eqalign{|\alpha|\ &=\ \sum_{i=1}^n \alpha_i\cr
x^\alpha\ &=\ x_1^{\alpha_1}x_2^{\alpha_2}\cdots x_n^{\alpha_n}\cr
D^{\alpha}\ &=\ {\partial^{|\alpha|} \over \partial x_1^{\alpha_1}
\partial x_2^{\alpha_2}\cdots\partial x_n^{\alpha_n}}\cr
x^2\ &=\ \sum_{i=1}^n x_i^2\cr}$$

\vskip6pt\noindent
We define the set ${\cal S}({\bf R}^n)\subset C^{\infty}({\bf R}^n)$ such that
$f\in{\cal S}$ iff
$$\sup_{x\in{\bf R}^n}\left|\, x^\beta D^\alpha f(x)\right|
  < \infty\qquad\left(x^\beta = \prod_{j=1}^n x_j^{\beta_j}\right)$$
for every $\alpha=\langle \alpha_1,\alpha_2,\ldots,\alpha_n\rangle$
and $\beta=\langle\beta_1,\beta_2,\ldots,\beta_n\rangle$ with nonnegative
integers $\alpha_j$ and $\beta_k$.  Such functions are called rapidly
decreasing (at $\infty$) or functions of rapid decrease.  We
call ${\cal S}({\bf R}^n)$ the Schwartz space of $C^{\infty}$ functions of rapid
decrease or simply the Schwartz space.

\vskip6pt\noindent
{\bf Definition:}  Let $f\in{\cal S}({\bf R}^n)$.  The Fourier Transform
of $f$ is the function
$$\hat f(\lambda) = {1\over(2\pi)^{n/2}}\int_{{\bf R}^n} e^{-ix\cdot\lambda}
      f( x)\,dx\eqno{\rm (E.1)}$$
where $x\cdot\lambda = \sum_{i=1}^n x_i\lambda_i$.  On occasion it will
be necessary to write $\hat f= {\cal F} f$.  The inverse Fourier Transform
of $f$ is the function
$$\check f(\lambda) = {1\over(2\pi)^{n/2}}\int_{{\bf R}^n} e^{ix\cdot\lambda}
     f(x)\,dx.\eqno{\rm (E.2)}$$
We will occasionally write $\check f$ $=$ ${\cal F}^{-1} f$.

\vskip6pt\noindent  We define the polynomial $p(x) = p(x_1,\ldots,x_n)$
to be a polynomial of total degree $k$ in $n$ variables.  Recall that the
total degree, $k$, of a polynomial $p$ is the maximum of the degrees, $k_i$,
of the factors $a_ix_1^{k_{i1}}\cdots x_n^{k_{in}}$ such that 
$k_i = k_{i1}+\ldots+k_{in}$ and $a_i\not= 0$.  If $x=\langle
x_1,\ldots,x_n\rangle$, then $p(D)$ denotes the partial differential operator
obtained by substituting $\partial/\partial x_i$ for $x_i$ whenever it
occurs in the polynomial $p(x)$.

\vskip6pt\noindent{\bf Definition:}  Let $b\in{\bf R}$.  The symbol $\delta_b$
is the linear functional $\delta_b(\varphi) = \varphi(b)$.  $\delta_b\in
{\cal S}'({\bf R})$.
$$\delta_b(\varphi) = \int \varphi(x)\delta(x-b)\,dx\eqno{\rm (E.3)}$$
$\delta(x)$ is not a function; it is a symbolic expression known as the
(Dirac) delta function, sometimes written $T_{\delta_0}$.  

\vskip6pt
We will now discuss a {\sl distribution\/} or {\sl generalized function}.
To do this, some background is necessary.  Let $\Omega$ be an open
connected set in ${\bf R}^n$ and $C_{0}^{\infty}(\Omega)$ be the family of
functions with compact support in $\Omega$.  (Recall that if $f(x)$ is a
complex-valued function defined on $\Omega$, we call the support of $f$,
written supp($f$), to be the closure of 
$\big\{\,x\in\Omega\,\big|\,f(x)\not=0\,\big\}$ in the space $\Omega$.
It may be equivalently
defined as the smallest closed set of $\Omega$ outside which $f$ vanishes
identically.)  Let $K_n$ be an increasing family of compact sets with
$\bigcup K^{o}_n = \Omega$, that is
$$K_1 \subseteq K_2 \subseteq K_3 \subseteq \ldots K_n \subseteq \dots$$
where $K^{o}_n$ is the interior of $K_n$ (the largest open subset of $K_n$).
It is not hard to show that each of the $C_0^\infty(K_n)$ is a linear space,
where
$$(f_1+f_2)(x) = f_1(x) + f_2(x), \qquad (\alpha f)(x)=\alpha f(x).
\eqno{\rm(E.4)}$$
For each compact set $K_n$ and each multi-index $\alpha$, 
we can define the semi-norm $\Vert\cdot\Vert$ by
$$\Vert f\Vert_{\alpha,\infty} = \sup \big\{\,|\,D^\beta f(x)|\,
  \big|\, |\beta| \le \alpha, f\in C_0^\infty, x\in K_n\,\big\}.
\eqno{\rm (E.5)}$$
Define
$$\Vert D^\alpha f\Vert_\infty = \Vert f\Vert_{\alpha,\infty}.
\eqno{\rm (E.6)}$$
Put the Fr\'echet topology on $C_0^\infty(K_n)$ generated by the
$\Vert D^\alpha f\Vert_\infty$ norms.  The set $C_0^\infty(\Omega)$
with the inductive limit topology obtained by $C_0^\infty(\Omega)$ $=$
$\bigcup C_0^\infty(K_n)$ is denoted ${\cal D}_\Omega$.  If $\Omega$ is the
space ${\bf R}^n$, then we denote ${\cal D}_{{\bf R}^n}$ by ${\cal D}$.
The space of all continuous linear functionals on ${\cal D}_\Omega$ is
denoted ${\cal D}'_\Omega$, where ${\cal D}'_{{\bf R}^n}$ $=$ ${\cal D}'$.
It can be shown that the topology on ${\cal D}_\Omega$ is independent of
the choice of the $K_n$.  It can also be shown that a sequence 
$f_m\in{\cal D}$ converges to $f\in{\cal D}$ iff all the $f_m$ are in some
$K_n$ and $f_m\to f$ in the topology of $K_n$.  We finally have all the
preliminaries to define a distribution.

\vskip6pt
\proclaim Definition.  A distribution or generalized function is a continuous
linear functional on ${\cal D}_\Omega$ (${\cal D}$ when $\Omega = {\bf R}^n$).
The space of all continuous linear functionals on ${\cal D}_\Omega$ is denoted
by ${\cal D}'_\Omega$ (${\cal D}'$ when $\Omega = {\bf R}^n$).

\vskip6pt
\proclaim Definition.  Suppose $f,g$ $\in$ ${\cal S}\big({\bf R}^n\big)$.  Then
the convolution of $f$ and $g$, denoted $f\ast g$ is the function
$$(f\ast g)(y) = \int_{{\bf R}^n} f(y-x)g(x)\,dx.
\eqno{\rm (E.7)}$$\par


\vskip6pt\noindent
The partial differential operator $p(D) =\sum_{|\alpha|\le k} a_\alpha
x^\alpha$ extends to ${\cal D}'$ by the formula
$$\big(p(D)T\big)(\varphi) = T\left(\sum_{|\alpha|\le k} (-1)^{|\alpha|}
D^\alpha\big(a_\alpha \varphi\big)\right).\eqno{\rm (E.8)}$$

\vskip6pt
\proclaim Definition.  A fundamental solution for the partial differential
operator $p(D)$ is a distribution $E\in{\cal D}'$ 
such that $p(D)E=\delta$.\par

\vskip6pt\noindent
The reason for studying fundamental solutions is that if we define
$u=E\ast f$, where $f\in C_0^\infty({\bf R}^n)$, then
$$\eqalign{p(D)u\ &=\ p(D)(E\ast f)\ =\ p(D)E\ast f\cr
                  &=\ \delta\ast f\ =\ f.\cr}\eqno{\rm (E.9)}$$
Thus, if we can find a fundamental solution, then we have an existence theorem
for all the partial differential equations $p(D)u=f$ where 
$f\in C_0^\infty({\bf R}^n)$.  Furthermore, if we can find an expression for
the distribution $E$, then we have an explicit representation of a solution,
namely
$$u = E\ast f.\eqno{\rm (E.10)}$$
The convolution operation, $E\ast f$, for distributions and functions can
be accomplished on digital computers.  Note that, in general, partial  
differential equations will have many fundamental solutions in ${\cal D}'$,
and sometimes have more than one in ${\cal S}'({\bf R}^n)$.
Convolutions frequently occur when one uses
Fourier transforms because the Fourier transform takes products into
convolutions.  In particular, for distributions and functions, 
if $T$ is a distribution ($T\in{\cal D}'$) and $f\in{\cal D}$, let 
$f^{\dagger}(x)$ $=$ $f(-x)$.
$$\big( T\ast f\big)(\varphi) = T(f^{\dagger}\ast\varphi)\qquad
\forall\varphi\in{\cal D}.\eqno{\rm (E.11)}$$
The Malgrange-Ehrenpreis theorem states that every constant coefficient
partial differential operator $p(D)$ has a fundamental solution.
The proof of this theorem breaks into two parts:  the first part is a 
complex variables argument and the second is an application of the
Hahn-Banach theorem.

\vskip12pt\noindent
{\bf Proof of the Malgrange-Ehrenpreis Theorem:}  Define 
$p^{\dagger}(x) = p(-x)$ and $q(x) = p^{\dagger}(ix)$, with $i=\sqrt{-1}$.
Let $\varphi \in C_0^\infty({\bf R}^n)$, where $C_0^\infty({\bf R}^n)$
is the space of all infinitely differentiable functions with compact
support. The first thing we observe is that
$${\cal F}\big(p^{\dagger}(D)\varphi(x)\big) = \hat\varphi(s)q(s).
\eqno{\rm (E.12)}$$
This beautiful and exciting result follows immediately from the following:
\proclaim Lemma 1.  The maps ${\cal F}$ and ${\cal F}^{-1}$ are continuous
linear transformations of ${\cal S}({\bf R}^n)$ into 
${\cal S}({\bf R}^n)$.  Furthermore, if $\alpha$ and $\beta$ are
multi-indices, then
$$\big((i\lambda)^\alpha D^\beta\hat f\big)(\lambda)
 = {\cal F}\left(\,D^\alpha\big((-ix)^\beta f(x)\big)\right)
\eqno{\rm (E.13)}$$\par

\vskip6pt\noindent
If we assume that $f\in C_0^\infty({\bf R}^n$), then for all
$\zeta = \langle\zeta_1,\ldots,\zeta_n\rangle\in {\bf C}^n$,
the integral
$$\hat f(\zeta) 
= (2\pi)^{n/2}\int\limits_{{\bf R}^n} e^{-i\zeta\cdot x}f(x)\,dx
 \eqno{\rm (E.14)}$$
is well defined.  Furthermore, $\hat f(\zeta)$ is an entire analytic function
of the $n$ complex variables $\zeta_1$, $\zeta_2$, $\ldots$, $\zeta_n$ since
we can differentiate under the integral sign.  The formal differentiation
under the integral sign is permitted since the right hand side is
uniformly convergent in $\zeta$.  This is contained in the following:
\proclaim Paley-Weiner theorem.  An entire analytic function of $n$
complex variables $g(\zeta)$ is the Fourier transform of a 
$C_0^\infty({\bf R}^n)$ function with support in the ball
$\{\,x\,|\,|x|\le R\,\}$ if and only if for each $N$ there is a
$C_N$ such that
$$|g(\zeta)| \le {C_N e^{R|\hbox{\sevenrm Im}(\zeta)|} 
      \over\left(a+|\zeta|\right)^N}
  \eqno{\rm (E.15)}$$
for all $\zeta\in{\bf C}^n$.\ {\rm (See [5], page 16.)}\par

\vskip6pt\noindent  
We apply the Paley-Weiner theorem to the partial differential
operator $p^{\dagger}(D)$ acting on the function 
$\varphi\in C_0^\infty({\bf R}^n)$.
For each $y=\langle y_1,\ldots,y_n\rangle\in {\bf R}^n$, we have
$$\left({\cal F}\big(p^{\dagger}(D)\varphi\big)\right)(y+\zeta)
 = \hat\varphi(y+\zeta)\,q(y+\zeta)\eqno{\rm (E.16)}$$
is an entire (analytic) function of $\zeta\in{\bf C}^n$. Now define $Q(x)$ as
follows:
$$Q(x) = \sum_\alpha |\,D^\alpha q(x)|;\eqno{\rm (E.17)}$$
notice that $Q$ is positive and bounded away from zero.  If $k$ is the total
degree of $p$, then there is some multi-index $\alpha$ such that
$|\alpha|=k$.  $D^\alpha q(x) = a_\alpha$, where $a_\alpha$ is the coefficient
of $x^\alpha$, hence $\alpha_k\not=0$.  Therefore,
$$\sum_\alpha |D^\alpha q(x)| \ge |a_\alpha| > 0.\eqno{\rm (E.18)}$$
Having defined $Q$, we will have to use some results from complex analysis. 
We will state four lemmas:
\proclaim Lemma 2 (Malgrange).  If $f(z)$ is a holomorphic function of a 
complex variable $z$ for $|z|\le 1$ and $p(z)$ is a polynomial
$$p(z)=a_mz^m + a_{m-1}z^{m-1} + \ldots + a_1z + a_0\eqno{\rm (E.19)}$$
of degree $m$, then
$$\left| a_mf(0) \right| 
\le {1\over 2\pi}\,\int_0^{2\pi} 
\left|f\big(e^{i\theta}\big)p\big(e^{i\theta}\big)\right|\,d\theta.
\eqno{\rm (E.20)}$$\par
\proclaim Lemma 3.  If $f(z)$ is a holomorphic function of a 
complex variable $z$ for $|z|\le 1$ and $p(z)$ is a polynomial
$$p(z)=a_mz^m + a_{m-1}z^{m-1} + \ldots + a_1z + a_0\eqno{\rm(E.21)}$$
of degree $m$, then
$$\left| f(0)\, p^{(k)}(0) \right| 
\le {m!\over (m-k)!\,2\pi}\,\int_0^{2\pi} 
\left|f\big(e^{i\theta}\big)p\big(e^{i\theta}\big)\right|\,d\theta.
\eqno{\rm (E.22)}$$\par
\proclaim Lemma 4.  Let $F(\zeta)$ be an entire function of $n$ complex
variables and let $P(\zeta)$ be a polynomial of degree $m$.  Suppose that
$g(\zeta)$ is a nonnegative integrable function with compact support
depending only on $|\zeta_1|$, $|\zeta_2|$, $\ldots$, $|\zeta_n|$.  
$$\left|F(0)D^\alpha P(0)\right|\,\int_{{\bf C}^n} |\zeta|^\alpha
   g(\zeta)\,d\zeta \le C_0\,\int_{{\bf C}^n} \left|F(\zeta)P(\zeta)\right|
    \,g(\zeta)\,d\zeta,\eqno{\rm(E.23)}$$
where $d\zeta$ denotes the Lebesgue measure on ${\bf C}^n$ and $C_0$ is a
constant depending on $\alpha$ and $m$.  $d\zeta$ is the measure denoted
$d\xi_1d\eta_1\cdots d\xi_nd\eta_n$, where $\zeta_k = \xi_k+i\eta_k$; and,
$m$ is regarded as the multi-index $\langle m,m,\ldots,m\rangle$.\par
\proclaim Lemma 5.  Suppose $\tilde q(x)$ $=$ 
$\big(\sum_{|\alpha|\ge k}|q^{(\alpha)}(x)|^2\big)^{1/2}$, where
$q^{(\alpha)}(x)=D_x^{\alpha}q(x)$, $k$ is the total degree of $q$, 
$\varphi\in C_0^{\infty}({\bf R}^n)$, and $\epsilon>0$.  Then
$$\left|\tilde q(x)\hat\varphi(x)\right| \le C_1\,\int_{|\zeta|\le\epsilon}
\left|\hat\varphi(x+\zeta)\,q(x+\zeta)\right|\,d\zeta.
\eqno{\rm (E.24)}$$\par
\vskip6pt\noindent
The first step in the proof uses the Cauchy integral formula to show that
$$\eqalign{\left|Q(x)\hat\varphi(x)\right|
\ &\le\ C_1\,\int_{|\zeta|\le\epsilon}
    \left|\hat\varphi(x+\zeta)\,q(x+\zeta)\right|\,d\zeta\cr
  &=\ C_1\,\int_{|\zeta|\le\epsilon}\left|{\cal
        F}\big(p^{\dagger}(D)\varphi\big)(x+\zeta)\right|\,d\zeta\cr}
\eqno{\rm (E.25)}$$
where $C_1$ depends on $\epsilon$ but is independent of $\varphi$.  As
defined in Lemma 4, 
$d\zeta$ is the Lebesgue measure on ${\bf C}^n$.
To derive equation (E.25), first let $u\in C_0^\infty({\bf R}^n)$ and
$\varphi = p(D)u$.  We will make the following substitutions in equation
(E.23):
$$g(\zeta)\ =\ \cases{1, & if $|\zeta| \le \epsilon$,\cr 
                      0, & otherwise;\cr}\eqno{\rm (E.26)}$$
$$F(\zeta)=\hat\varphi(x+\zeta)\quad\hbox{and }\ P(\zeta)=q(x+\zeta).
\eqno{\rm (E.27)}$$
Equation (E.23) immediately becomes
$$\left|\hat\varphi(x)D^\alpha q(x)\right|\,\int_{|\zeta|\le\epsilon}
   |\zeta|^\alpha \,d\zeta \le C_0\,\int_{|\zeta|\le\epsilon} 
    \left|\hat\varphi(x+\zeta)q(x+\zeta)\right|\,d\zeta.
\eqno{\rm (E.28)}$$
Now we may define the constant (scalar) $C_1$ as
$$C_1\ =\ {C_0\cdot|\alpha|\over\int_{|\zeta|\le\epsilon} 
      |\zeta|^\alpha\,d\zeta},
   \eqno{\rm (E.29)}$$
where $|\alpha|$ $=$ $\sum_\alpha 1$ $=$ $\alpha_1+\alpha_2+\ldots+\alpha_n$.
Recall that $C_0$ is a constant depending only on $\alpha$ and $m$; 
therefore, $C_1$ is a constant depending only on $\alpha$, $m$, and 
$\epsilon$.  In particular, $C_1$ does {\sl not\/} depend on 
the choice of the function $\varphi$.  Now, re-write equation (E.28):
$$\left|\hat\varphi(x)D^\alpha q(x)\right|
   \le {C_1\over|\alpha|}\,\int_{|\zeta|\le\epsilon} 
    \left|\hat\varphi(x+\zeta)q(x+\zeta)\right|\,d\zeta.
  \eqno{\rm (E.30)}$$
$$\eqalign{\left|\hat\varphi(x)Q(x)\right|
\ &\le\ \sum_\alpha \left|\hat\varphi(x)D^\alpha q(x)\right|\cr
  &\le {C_0\over\int_{|\zeta|\le\epsilon} |\zeta|^\alpha\,d\zeta}
   \,\left(\sum_\alpha 1\right)\,\int_{|\zeta|\le\epsilon} 
    \left|\hat\varphi(x+\zeta)q(x+\zeta)\right|\,d\zeta\cr
  &=\ C_1\,\int_{|\zeta|\le\epsilon}
     \left|\hat\varphi(x+\zeta)q(x+\zeta)\right|\,d\zeta\cr}
\eqno{\rm (E.31)}$$
%
We will substitute formula (E.10) into the right-hand side of equation
(E.32) to get
$$\left|\hat\varphi(x)Q(x)\right|
   \le C_1\,\int_{|\zeta|\le\epsilon} 
    \left|{\cal F}\big(p^{\dagger}(D)\varphi\big)(x+\zeta)\right|\,d\zeta.
    \eqno{\rm (E.32)}$$


\vskip12pt\noindent
From Fourier's (inversion) theorem, we have that
$$\varphi(0) = {1\over(2\pi)^{n/2}}\int_{{\bf R}^n} 
\hat \varphi(y)\,dy\eqno{\rm (E.33)}$$ 
$$|\varphi(0)| \le 
{1\over(2\pi)^{n/2}}\int_{{\bf R}^n} |\hat \varphi(y)|\,dy
\eqno{\rm (E.34)}$$ 
$$\eqalign{|\varphi(0)|\ 
&\le\ {1\over(2\pi)^{n/2}}\int_{{\bf R}^n} |\hat \varphi(y)|\,dy\cr
&\le\ C_2\,\int_{{\bf R}^n}\left(\int_{|\zeta|\le\epsilon} 
    \left| {\cal F}\big(p^{\dagger}(D)\varphi\big)(y+\zeta)\right|
    \,\left|Q(y)\right|^{-1}\,d\zeta\right)\,dy\cr
&=\ C_2\,\int_{{\bf R}^n}\int_{|\lambda|^2+|\mu|^2\le\epsilon^2} 
    \left| {\cal F}\big(p^{\dagger}(D)\varphi\big)(y+\lambda+i\mu)\right|
    \,\left|Q(y)\right|^{-1}\,d\lambda d\mu dy\cr}
\eqno{\rm (E.35)}$$
For $|\lambda|\le\epsilon$, we will demonstrate a constant $C_3$ such that
$Q(y+\lambda)\big(Q(y)\big)^{-1}\le C_3$
independently of $y$.
Take the generalized Taylor series expansion for
$D^\alpha q(y+\lambda)$.  $\lambda = \langle \lambda_1,
\lambda_2, \ldots, \lambda_n\rangle$ $\in$ ${\bf R}^n$ 
and $|\lambda| \le \epsilon$. 
$$D^\alpha q(y+\lambda) 
= \sum_\beta {\lambda^\beta\over\beta!}\ D^{\alpha+\beta} q(y),
\eqno{\rm (E.36)}$$
where $q$ is a polynomial of total degree $k$; therefore, $D^{\alpha+\beta} q(y)
= 0$ whenever $|\alpha+\beta| > k$.  Therefore, for all $\alpha$,
$|\alpha| \le k$
$$\left|D^\alpha q(y+\lambda)\right| \le \sum_{|\alpha+\beta|\le k}
      \left|{\lambda^\beta \over \beta !} D^{\alpha+\beta} q(y)\right|;
\eqno{\rm (E.37)}$$
$$\lambda^\beta =
       \lambda_1^{\beta_1}\lambda_2^{\beta_2}\cdots\lambda_n^{\beta_n}.$$
$$|\lambda| \le \epsilon \ \Longrightarrow\ |\lambda|^{|\beta|} \le
       \epsilon^{|\beta|}$$
$$|\beta| \le k \ \Longrightarrow\ |\lambda|^{|\beta|} \le \epsilon^k\ 
\hbox{for all multi-indices } \beta.$$
Therefore,
$$\eqalign{\left|D^\alpha q(y+\lambda)\right| 
\ &\le\ \sum_{|\alpha+\beta|\le k}
      \left|{\lambda^\beta \over \beta !}\right|
      \left| D^{\alpha+\beta} q(y)\right|\cr
\ &\le\ \sum_{|\alpha+\beta|\le k}
      \left|{\epsilon^k \over \beta !}\right|
      \left| D^{\alpha+\beta} q(y)\right|\cr
\ &\le\ \epsilon^k\cdot
      \left(\sum_{|\alpha+\beta|\le k} \left|D^{\alpha+\beta}
             q(y)\right|\right)\cr}\eqno{\rm (E.38)}$$
$$\eqalign{\left|Q(y+\lambda)\right|
\ &=\ \sum_{|\alpha|\le k} \left|D^\alpha q(y+\lambda)\right|\cr
  &\le\ \sum_{|\alpha|\le k}\ \sum_{|\alpha+\beta|\le k}
      \left|{\lambda^\beta\over\beta!}\right|\cdot\left|D^{\alpha+\beta}
               q(y)\right|\cr
  &\le\ \sum_{|\alpha|\le k} \epsilon^k\sum_{|\alpha+\beta|\le k}
         \left|D^{\alpha+\beta} q(y)\right|\cr
  &\le\ \left(\epsilon^k\,\sum_{|\beta|\le  k} 1\right)\cdot
         \sum_{|\alpha|\le k} |D^\alpha q(y)|\cr
  &=\ \left(\epsilon^k\,\sum_{|\beta|\le  k} 1\right)\cdot
         Q(y).\cr}\eqno{\rm (E.39)}$$
Therefore, since $|Q(y+\lambda)| = Q(y+\lambda)$,
$${|Q(y+\lambda)|\over |Q(y)|} \le \epsilon^k\cdot\sum_{|\beta|\le k} 1.$$
Let $C_3 = \epsilon^k\cdot\sum_{|\beta|\le k} 1$.
Setting $C_4=C_2\cdot C_3$ and substituting,
$$\eqalign{|\varphi(0)|
\ &\le\ C_4\,\int_{{\bf R}^n}\int_{|\lambda|^2+|\mu|^2\le\epsilon^2}
  \left|{\cal F}\big(p^{\dagger}(D)\varphi\big)(y+\lambda+i\mu)\right|
   \,\big(Q(y+\lambda)\big)^{-1}\,d\lambda d\mu dy.\cr
  &\le\ C_5\,\int_{{\bf R}^n}\int_{|\mu|^2\le\epsilon^2}
  \left|{\cal F}\big(p^{\dagger}(D)\varphi\big)(y+i\mu)\right|
   \,\big(Q(y)\big)^{-1}\,d\mu dy.\cr}\eqno{\rm (E.40)}$$
We now define the norm $\Vert\cdot\Vert_Q$ as follows
$$\Vert\varphi\Vert_Q = \int_{{\bf R}^n} \int_{|\mu|\le\epsilon}
   \left|\hat\varphi(y+i\mu)\big(Q(y)\big)^{-1}\right|\,d\mu dy.
   \eqno{\rm (E.41)}$$
We will first show that $\Vert\cdot\Vert_Q$ is a continuous norm on
${\cal D}$.   To see that $\Vert\cdot\Vert_Q$ is a norm, observe that
$$\Vert\cdot\Vert_Q\colon\ C_0^\infty({\bf R}^n) \to [0,\infty)
\eqno{\rm(E.42)}$$
and that $\varphi\not=0$ implies that
$$\Vert\varphi\Vert_Q > 0.\eqno{\rm (E.43)}$$
This follows immediately from the fact that $\Vert\hat\varphi\Vert_2$ $=$
$\Vert\varphi\Vert_2$.  This is a corollary to the Fourier
inverson theorem.
If each of $\varphi$ and $\psi$ are in $C_0^\infty({\bf R}^n)$, then we 
know (from the Paley-Wiener theorem) that each of the following integral
exist:
$$\hat\varphi(\lambda) 
= {1\over2\pi}\,\int_{{\bf R}^n} e^{-i\lambda\cdot x} \varphi(x)\,dx,$$
$$\hat\psi(\lambda)
= {1\over2\pi}\,\int_{{\bf R}^n} e^{-i\lambda\cdot x} \varphi(x)\,dx,$$
$$\Vert\varphi\Vert_Q 
= \int_{{\bf R}^n}\int_{|\mu|\le\epsilon} \left|\hat\varphi(y+i\mu)
   \big(Q(y)\big)^{-1}\right|\,d\mu dy,$$
$$\Vert\psi\Vert_Q 
= \int_{{\bf R}^n}\int_{|\mu|\le\epsilon} \left|\hat\psi(y+i\mu)
   \big(Q(y)\big)^{-1}\right|\,d\mu dy,$$
$$\Vert\varphi+\psi\Vert_Q 
= \int_{{\bf R}^n}\int_{|\mu|\le\epsilon} 
   \left|\big(\hat\varphi+\hat\psi\big)(y+i\mu)
   \big(Q(y)\big)^{-1}\right|\,d\mu dy.\eqno{\rm (E.44)}$$
From the so-called triangle inequality, we have that
$|\hat\varphi(\zeta) + \hat\psi(\zeta)|$ $\le$ $|\hat\varphi(\zeta)|$
$+$ $|\hat\psi(\zeta)|$.  This implies immediately that
$$\Vert\varphi+\psi\Vert_Q \le \Vert\varphi\Vert_Q + \Vert\psi\Vert_Q
   \eqno{\rm (E.45)}$$
Next, we have to show that $\Vert\cdot\Vert_Q$ is continuous.  Since
$Q$ is bounded from below, there is a positive number $C_\ell$ such
that $C_\ell < Q(y)$ for all $y\in {\bf R}^n$.  
$$\big(Q(y)\big)^{-1} < {1\over C_\ell} 
   = C_6\qquad\hbox{for all }y\in C_0^\infty({\bf R}^n). 
  \eqno{\rm (E.46)}$$
$$\eqalign{\Vert\varphi\Vert_Q 
\ &\le\  C_6\,\int_{{\bf R}^n}\int_{|\mu|\le\epsilon}
    \left|\hat\varphi(y+i\mu)\right|\,d\mu dy\cr
\ &\le\ C_7\, \sup_{\scriptstyle y\in{\bf R}^n 
                          \atop \scriptstyle|\mu|\le\epsilon}
    \left|(1+y^2)^{n+1}\hat\varphi(y+i\mu)\right|\cr
\ &\le\ C_7\,\sup_{|\mu|\le\epsilon} \Vert (I-\Delta)^{n+1}e^{\mu\cdot x}
   \varphi(x)\Vert_1.\cr}\eqno{\rm (E.47)}$$
If we define $p_1(D)$ $=$ $1-\Delta$ $=$ 
$1-\sum_{j=1}^n \partial^2/\partial x_j^2$, then we will have $p_1(x)$
$=$ $1-x^2$, $q_1(x)=p_1^{\dagger}(ix) = 1+x^2$ and
$${\cal F}\left((1-\Delta)^{n+1}e^{\mu\cdot x}\varphi(x)\right)
 = (1+y^2)^{n+1}\hat\varphi(y+i\mu).\eqno{\rm (E.48)}$$
We implicitly made use of the fact that for $\alpha\in{\bf C}$
if $h(x)=g(x)e^{i\alpha x}$ then $\hat h(s) = \hat g(t-\alpha)$.
In equation (E.47), we have used the fact that if $g\in L^1$, then
$\hat g\in C_0$ and
$$\Vert\hat g\Vert_\infty \le \Vert g\Vert_1.\eqno{\rm (E.49)}$$
The inequality is a direct consequence of the definition of the Fourier
transform.  The function space inclusions are is  consequence of the 
Lebesgue dominated convergence theorem.
The right-hand side is a continuous norm on $C_0^\infty(K)$ for each
compact set $K\subset {\bf R}^n$.  Since ${\cal D}$ has the inductive
limit topology, $\Vert\cdot\Vert_Q$ is a continuous norm on ${\cal D}$.
The basic estimate (equation (E.34)) shows that the map
$$\tilde E\colon p^{\dagger}(D)\varphi \to \varphi(0) \eqno{\rm(E.50)}$$
is well-defined.  That is, if $p^{\dagger}(D)\varphi_1$ 
$=$ $p^{\dagger}(D)\varphi_2$,
then $\varphi_1(0) = \varphi_2(0)$.  $\tilde E$ is continuous since
$\Vert\cdot\Vert_Q$ is a continuous norm.  
We now recall the Hahn-Banach Theorem:
\proclaim The Hahn-Banach Theorem.  If $M$ is a subspace of a normed linear
space $X$ and if $\tilde T$ is a bounded linear functional 
on $M$, then $\tilde T$ can be extended to a bounded linear functional $T$
on $X$ such that $\Vert T\Vert$ $=$ $\Vert\tilde T\Vert$.  
\ {\rm (See [6], page 104.)}\par
\noindent
Thus by the Hahn-Banach theorem
there is an $E$ in ${\cal D}'$ which extends $\tilde E$.  Since
$$\big(p(D)E\big)(\varphi) = E\big(p^{\dagger}(D)\varphi\big) = \varphi(0),
  \eqno{\rm(E.51)}$$
we have found a fundamental solution for $p(D)$.  \bull
\bigskip
\proclaim Lemma 1.  The maps ${\cal F}$ and ${\cal F}^{-1}$ are continuous
linear transformations of ${\cal S}({\bf R}^n)$ into 
${\cal S}({\bf R}^n)$.  Furthermore, if $\alpha$ and $\beta$ are
multi-indices, then
$$\big((i\lambda)^\alpha D^\beta\hat f\big)(\lambda)
 = {\cal F}\left(\,D^\alpha\big((-ix)^\beta f(x)\big)\right)
\eqno{\rm(E.52)}$$\par

\vskip6pt\noindent
{\bf Proof:}  The map ${\cal F}$ is clearly linear.  We directly compute
the Fourier transform of $\big(\lambda^\alpha D^{\beta} \hat f\big)(\lambda)$
from the definition.
$$\eqalign{\big(\lambda^\alpha D^{\beta} \hat f\big)(\lambda)\ 
 &=\ {1\over(2\pi)^{n/2}}\,\int_{{\bf R}^n} 
   \lambda^\alpha_x(-ix)^\beta e^{-i\lambda\cdot x}f(x)\,dx\cr
 &=\ {1\over(2\pi)^{n/2}}\,\int {1\over(-i)^\alpha}
    \big(D_x^\alpha e^{-i\lambda\cdot x}\big)(-ix)^\beta f(x)\,dx\cr
 &=\ {(-i)^\alpha\over(2\pi)^{n/2}}\,\int_{{\bf R}^n}
   e^{-i\lambda\cdot x}D_x^\alpha\big((-ix)^\beta f(x)\big)\,dx\cr}
\eqno{\rm(E.53)}$$

\vskip6pt\noindent
The above formulas require some justification.  First of all, we observe that
for 
$$\hat f(\lambda) = {1\over(2\pi)^{n/2}}\,\int_{{\bf R}^n}
        e^{-ix\cdot\lambda}f(x)\,dx,\eqno{\rm(E.54)}$$
formal differentiation under the integral sign is permitted because the right
hand side is uniformly convergent.
$$D^\beta\hat f(\lambda) = {1\over(2\pi)^{n/2}}\,\int_{{\bf R}^n}
        e^{-ix\cdot\lambda}(-i)^{|\beta|} x^\beta f(x)\,dx.
  \eqno{\rm(E.55)}$$
We now multiply both sides by $(i\lambda)^\alpha$.
$$(i\lambda)^\alpha D^\beta\hat f(\lambda) 
 = {1\over(2\pi)^{n/2}}\,\int_{{\bf R}^n}
     (i\lambda)^\alpha\, e^{-ix\cdot\lambda}(-i)^{|\beta|} x^\beta f(x)\,dx.
 \eqno{\rm(E.56)}$$
The next operation is integration by parts.  We define a function
$\phi^{(\alpha)}$ $=$ $(-i\lambda)^\alpha e^{-i\lambda\cdot x}$
$=$ $D_x^\alpha e^{-i\lambda\cdot x}$.  Again, by
uniform convergence we have
$${1\over(2\pi)^{n/2}}\,\int_{{\bf R}^n}
   (i\lambda)^\alpha\, e^{-ix\cdot\lambda}(-i)^{|\beta|} x^\beta f(x)\,dx$$
$$\qquad\ =\ (-1)^\alpha{1\over(2\pi)^{n/2}}\,\int_{{\bf R}^n}
     (-1)^{-\alpha}\, e^{-i\lambda\cdot x}\,
      D_x^\alpha \big((-ix)^\beta f(x)\big)\,dx.\eqno{\rm(E.57)}$$
The final result is
$$\eqalign{(i\lambda)^\alpha D^\beta\hat f(\lambda) 
\ &=\ {1\over(2\pi)^{n/2}}\,\int_{{\bf R}^n}
      e^{-i\lambda\cdot x}\,
      D_x^\alpha\big( (-ix)^\beta f(x)\big)\,dx\cr
  &=\ {\cal F}D_x^\alpha\big( (-ix)^\beta f(x)\big).\cr}
\eqno{\rm(E.58)}$$
This is the sought-for result.
Now we observe that
$$\eqalign{\Vert\hat f\Vert_{\alpha,\beta}\ 
&=\ \sup_\lambda \left|\lambda^\alpha\big(D^\beta\hat f\big)(\lambda)\right|\cr
&\le\ {1\over(2\pi)^{n/2}}\int_{{\bf R}^n} \big| D_x^\alpha (x^\beta
f)\big|\,dx\ <\ \infty.\cr}\eqno{\rm(E.59)}$$
Thus ${\cal F}$ takes ${\cal S}({\bf R}^n)$ into ${\cal S}({\bf R}^n)$.
There is a positive integer $k$ such that
$$\int_{{\bf R}^n} \big(1+x^2)^{-k}\,dx\ <\ \infty,\eqno{(E.60)}$$
where $dx=dx_1dx_2\cdots dx_n$, $x=\langle x_1,x_2,\ldots,x_n\rangle$.
$$\eqalign{\Vert \hat f\Vert_{\alpha,\beta}
\ &\le\ {1\over(2\pi)^{n/2}}\,\int_{{\bf R}^n}
  {\big(1+x^2\big)^{-k}\over\big(1+x^2\big)^{-k}}\left| D_x^\alpha(-ix)^\beta
              f(x)\right|\,dx\cr
  &\le\ {1\over(2\pi)^{n/2}}\,\left(\int_{{\bf R}^n} \big(1+x^2\big)^{-k}
      \,dx\right)\ \sup_{x}\left\{\big(1+x^2\big)^{+k}\left|D_x^\alpha
          (-ix)^\beta f(x)\right|\,\right\}\cr}\eqno{\rm(E.61)}$$
Using Leibnitz's rule we easily conclude that there exist multi-indices
$\alpha_j$, $\beta_j$ and constants $c_j$ such that
$$\Vert\hat f\Vert_{\alpha,\beta} \ \le\ \sum_{j=1}^M
            c_j\cdot\Vert
            f\Vert_{\alpha_j,\beta_j}\eqno{\rm(E.62)}$$
Therefore, ${\cal F}$ is bounded.  A bounded linear functional is always
continuous.  A symmetric argument holds for the inverse operator 
${\cal F}^{-1}$.\quad \bull
\bigskip
\proclaim Lemma 2 (Malgrange).  If $f(z)$ is a holomorphic function of a 
complex variable $z$ for $|z|\le 1$ and $p(z)$ is a polynomial
$$p(z)=a_mz^m + a_{m-1}z^{m-1} + \ldots + a_1z + a_0\eqno{\rm(E.63)}$$
of degree $m$, then
$$\left| a_mf(0) \right| 
\le {1\over 2\pi}\,\int_0^{2\pi} 
\left|f\big(e^{i\theta}\big)p\big(e^{i\theta}\big)\right|\,d\theta.
\eqno{\rm(E.64)}$$\par

\vskip6pt\noindent
{\bf Proof:}  For $j=1,\ldots,m$, let $z_j$ be a zero of the polynomial 
$p(z)$.  Order the roots such that for $j=1,\ldots,k$, $z_j$ is a zero
of the polynomial interior to the unit circle, that is $|z_j| < 1$
for $j=1,\ldots,k$ and $|z_j|\ge 1$ for $k<j\le m$.  Define $q(z)$ as follows
$$q(z) = p(z)\,\prod_{j=1}^k{\bar z_j z -1 \over z-z_j}.
\eqno{\rm(E.65)}$$
Clearly, $p(z) = p_1(z)\cdot(z-z_1)\cdots(z-z_k)$ so that
$$q(z) = p_1(z)\cdot\prod_{j=1}^k \left(\bar z_j z-1\right)
=\left(a_m\cdot\prod_{j=k+1}^m(z-z_j)\right)\cdot\prod_{j=1}^k(\bar z_jz-1).
\eqno{\rm(E.66)}$$
Since $q(z)$ is a polynomial in ${\bf C}$, $q(z)$ is analytic.  Moreover,
for $0\le \theta<2\pi$, $\big|e^{i\theta}\big|=1$, and
$$\eqalign{\left|q\big(e^{i\theta}\big)\right|\  
  &=\ \left|p_1\big(e^{i\theta}\big)\cdot(\bar z_1e^{i\theta}-1) 
      (\bar z_2e^{i\theta}-1)\cdots(\bar z_ke^{i\theta}-1)\right|\cr 
  &=\ \left|p_1\big(e^{i\theta}\big)\right|
       \cdot\prod_{j=1}^k\left|\bar z_je^{i\theta}-1\right|\cr
  &=\ \left|p_1\big(e^{i\theta}\big)\right|
       \cdot\prod_{j=1}^k\left|\bar z_j -e^{-i\theta}\right|\cr
  &=\ \left|p_1\big(e^{i\theta}\big)\right|
       \cdot\prod_{j=1}^k\left|z_j - e^{i\theta}\right|\cr
  &=\ \left|p\big(e^{i\theta}\big)\right|.}\eqno{\rm(E.67)}$$
Thus $q(z)$ is regular in the unit circle and $|p(z)|$ $=$ $|q(z)|$
for $|z| = 1$.  We apply the Cauchy integral formula, 
$\int_{\partial\Omega} g(\xi)\,d\xi/(\xi-z)$ $=$ $2\pi i\cdot g(z)$, to the
function $g(z)$ $=$ $f(z)q(z)$ to obtain
$$\eqalign{{1\over2\pi}\,\int_0^{2\pi}
  \left|f\big(e^{i\theta}\big)\right|\cdot
  \left|p\big(e^{i\theta}\big)\right|\,d\theta  
  &=\ {1\over2\pi}\,\int_0^{2\pi} \left|f\big(e^{i\theta}\big)
        p\big(e^{i\theta}\big)\right|\,d\theta\cr
  &=\ {1\over2\pi}\,\int_0^{2\pi} \left|f\big(e^{i\theta}\big)
        q\big(e^{i\theta}\big)\right|\,d\theta\cr
  &\ge\ {1\over2\pi}\,\left|\ \int_0^{2\pi} f\big(e^{i\theta}\big)
        q\big(e^{i\theta}\big)\,d\theta\right|\cr
  &=\ \left| f(0)q(0) \right|.\cr}\eqno{\rm(E.68)}$$
Now consider $q(0)$.  
$|\,q(0)|$ $=$ $|p_1(0)|$ $=$ $|a_m|\cdot\prod_{j=k+1}^m |z_j|$  
$\ge$ $|a_m|$.
$${1\over2\pi}\,\int_0^{2\pi}
  \left|f\big(e^{i\theta}\big)p\big(e^{i\theta}\big)\right|\,d\theta  
  \ge\ \left| f(0)q(0) \right| \ge \left| a_mf(0)\right|.
\eqno{\rm(E.69)}$$
\noindent This completes the lemma.\quad\bull
\bigskip
$$\beginpicture
%    \ninepoint
    \setcoordinatesystem units <1.00in,1.00in>
    \setplotarea x from -1.75 to 1.75, y from -1.25 to 1.25 %
    \plotheading {\lines{%
      $p(z) = a_7z^7+\ldots+a_1z+a_0$ \cr
      a polynomial of degree $7$.\cr}}
    \put {$\scriptstyle\bullet$} at 0 0 %
    \put {$0$} [rt] <-2pt,-2pt> at 0 0 %
%    \replot "feb15.t01" %
%    \savelinesandcurves on "feb15.t01"
%    \writesavefile {Feb 15, 92 Homework}
\ifexpressmode
  \put {\bf EXPRESSMODE } at .7071 .7071
\else
    \circulararc 360 degrees from 1 0 center at 0 0 %
    \arrow <10pt> [.1, .4] from 0 0 to .7071 .7071 %
\fi
    \put {$\scriptstyle\circ$} at -.6 0 %
    \put {$\scriptstyle\circ$} at .6 -.2 %
    \put {$\scriptstyle\circ$} at -.1 -.6 %
    \put {$\scriptstyle\circ$} at 0 .5 %
    \put {$\scriptstyle\circ$} at 1.2 1.0 %
    \put {$\scriptstyle\circ$} at -1.2 .8 %
    \put {$\scriptstyle\circ$} at -1.6 -.6 %
    \put {$z_1$\raise3pt\hbox{$\,$}} [rt] at -.6 0 %
    \put {$z_2$\raise3pt\hbox{$\,$}} [rt] at .6 -.2 %
    \put {$z_3$\raise3pt\hbox{$\,$}} [rt] at -.1 -.6 %
    \put {$z_4$\raise3pt\hbox{$\,$}} [rt] at 0 .5 %
    \put {$z_5$\raise3pt\hbox{$\,$}} [rt] at 1.2 1.0 %
    \put {$z_6$\raise3pt\hbox{$\,$}} [rt] at -1.2 .8 %
    \put {$z_7$\raise3pt\hbox{$\,$}} [rt] at -1.6 -.6 %
    \put {$1$} at .4 .2 %
    \put {$\scriptstyle\circ$ zeros } [lb] at 1.5 .4 %
    \put {$m = 7 $} [lb] at 1.5 0 %
    \put {$k= 4 $} [lb] at 1.5 -.4 %
    \put {${\bf D} = $ unit disk} [lb] at 1 -1.2 %
\endpicture $$
\vskip6pt
$$q(z) = p(z)\cdot\prod_{j=1}^k {\bar z_jz-1 \over z-z_j}$$
$$q(z) = p_1(z)\,\prod_{j=1}^k(\bar z_jz-1)$$
$$p_1(z) = a_m\cdot\prod_{j=k+1}^m (z-z_j)\quad\hbox{above } 
    p_1(z)= a_7(z-z_5)(z-z_6)(z-z_7)$$
$$p_1(0) = a_m\cdot\prod_{j=k+1}^m z_j$$
$$|p_1(0)| = |a_m|\cdot\prod_{j=k+1}^m|z_j| \ge |a_m|\cdot 1= |a_m|$$
$$\Longrightarrow\ |q(0)| \ge |a_m|$$
$${1\over2\pi}\int_0^{2\pi}|f(e^{i\theta})p(e^{i\theta})|\,d\theta
  \ge {1\over2\pi}\left|\int_0^{2\pi}
            f(e^{i\theta})q(e^{i\theta})|d\theta\right|
  = |f(0)q(0)| \ge |a_mf(0)|$$
\bigskip
\proclaim Lemma 3.  If $f(z)$ is a holomorphic function of a 
complex variable $z$ for $|z|\le 1$ and $p(z)$ is a polynomial
$$p(z)=a_mz^m + a_{m-1}z^{m-1} + \ldots + a_1z + a_0\eqno{\rm(E.70)}$$
of degree $m$, then
$$\left| f(0)\, p^{(k)}(0) \right| 
\le {m!\over (m-k)!\cdot2\pi}\,\int_0^{2\pi} 
\left|f\big(e^{i\theta}\big)p\big(e^{i\theta}\big)\right|\,d\theta.
\eqno{\rm(E.71)}$$\par

\vskip6pt\noindent
{\bf Proof:}  Let $p$ be a monic polynomial of degree $m$ such that
$$\eqalign{p(z)\ &=\ z^m + a_{m-1}z^{m-1} + \ldots + a_1z +a_0\cr
                 &=\ \prod_{j=1}^m (z-z_j),\cr}\eqno{\rm(E.72)}$$
where $\big\{\,z_j\,\big|\,j=1,2,\ldots,m\,\big\}$ are the roots of $p$.
Apply the previous lemma to the polynomial of degree $k$
$$\prod_{j=1}^k (z-z_j)\eqno{\rm(E.73)}$$
and the holomorphic (analytic) function
$$f(z)\cdot\prod_{j=k+1}^m (z-z_j),\eqno{\rm(E.74)}$$
for some $1\le k\le m$.  This yields the formula
$$\left| f(0)\prod_{j=k+1}^m z_j\right| \le
   {1\over2\pi}\,\int_0^{2\pi} \left|f\big(e^{i\theta}\big)
        p\big(e^{i\theta}\big)\right|\,d\theta.\eqno{\rm(E.75)}$$
We need to consider the derivatives of $p(z)$.
$$p'(z) = p^{(1)}(z) = (z-z_2)\cdots(z-z_m) + (z-z_1)(z-z_3)\cdots(z-z_m)
    +\ldots$$
$$\qquad\ + (z-z_1)(z-z_2)\cdots(z-z_{m-1});\eqno{\rm(E.76)}$$
the first derivative has $m$ terms and each term has $m-1$ factors of the
form $(z-z_j)$.
The second derivative can be written
$$\eqalign{p''(z)\ &=\ p^{(2)}(z)\cr 
  &=\ (z-z_3)\cdots(z-z_m)+(z-z_1)(z-z_4)\cdots(z-z_m)\cr
  &\qquad +\ldots+(z-z_1)\cdots(z-z_{m-2})+(z-z_2)(z-z_4)\cdots(z-z_m)\cr
  &\qquad +(z-z_1)(z-z_3)(z-z_5)\cdots(z-z_m)+\dots.\cr}\eqno{\rm(E.77)}$$
The second derivative has $m(m-1)$ terms each of which is a product
of $(m-2)$ factors of the form $(z-z_j)$.
$$p^{(k)}(z) = (z-z_{k+1})(z-z_{k+2})\cdots(z-z_m) +\ldots
         +(z-z_1)(z-z_2)\cdots(z-z_{m-k+1}).\eqno{\rm(E.78)}$$
The $k$th derivative has $m\cdot(m-1)\cdots(m-k+1)$ terms and each
term has $m-k$ factors.  
Recall that
$$m\cdot(m-1)\cdots(m-k+1) = {m!\over(m-k)!}.\eqno{\rm(E.79)}$$
Each of the $m!/(m-k)!$ terms has the form
$$\prod_{j=k+1}^m \left(z-w_j\right),\eqno{\rm(E.80)}$$
where $\{ w_{k+1},\ldots,w_m\}$ is one of the $m!/(m-k)!$
possible subsets of $m-k$ elements from $\{z_1,z_2,\ldots,z_m\}$.
Thus, we have that $p^{(k)}(0)$ is the sum of $L$ $=$ $m!/(m-1)!$
terms of the form
$$(-1)^{m-k}\,\prod_{j=k+1}^m w_{\ell,j},\eqno{\rm(E.81)}$$
for some $\ell\in L$.  This yields
$$\eqalign{\left|f(0)\cdot p^{(k)}(0)\right|\ 
&=\ \left|\, f(0)\cdot \sum_{\ell\in L} (-1)^{m-k} 
    \prod_{j=k+1}^m w_{\ell,j}\right|\cr
&\le\ \sum_{\ell\in L} \left|\,f(0)\cdot \prod_{j=k+1}^m w_{\ell,j}\right|\cr
&\le\ {m!\over(m-k)!}\,\left|\,f(0)\cdot\prod_{j=k+1}^m w_j\right|\cr
&\le\ {m!\over(m-k)!\cdot2\pi}\,\int_0^{2\pi} \left|f\big(e^{i\theta}\big)
         p\big(e^{i\theta}\big)\right|\,d\theta.\cr}\eqno{\rm(E.82)}$$
For a general polynomial $p$, multiply both sides of the inequality
by $|a_m|>0$.  This proves the lemma.\quad\bull 
\bigskip
\proclaim Lemma 5.  Suppose $\tilde q(x)$ $=$ 
$\big(\sum_{|\alpha|\le k}|q^{(\alpha)}(x)|^2\big)^{1/2}$, where
$q^{(\alpha)}(x)=D_x^{\alpha}q(x)$, $k$ is the total degree of $q$, 
$\varphi\in C_0^{\infty}({\bf R}^n)$, and $\epsilon>0$.  Then
$$\left|\tilde q(x)\hat\varphi(x)\right| \le C_1'\,\int_{|\zeta|\le\epsilon}
\left|\hat\varphi(x+\zeta)\,q(x+\zeta)\right|\,d\zeta.\eqno{\rm(E.83)}$$\par

\vskip6pt\noindent
{\bf Proof:}  Put $p(D) u = v$, where $u\in C_0^{\infty}({\bf R}^n)$.
Then $q(\zeta)\hat u(\zeta) = \hat v(\zeta)$.  Apply the previous lemma
(Lemma 4) with $F(\zeta) = \hat v(\xi+\zeta)$, with 
$p(\zeta)$ replaced by $p(\xi+\zeta)$ and with $|g(\zeta)|=1$ when
$|\zeta|\le\epsilon$ and $=0$ otherwise.  Since $\tilde q(\xi)$ $\le$ 
$\sum_\alpha |D^\alpha q(\xi)|$, we obtain, from the previous lemma,
$$\eqalign{\left|\hat u(\xi)\tilde q(\xi)\right| 
\ &\le\ C_1'\,\int_{|\zeta|\le\epsilon} \left|\hat u(\xi+\zeta)\,
   p(\xi+\zeta)\right|\,d\zeta\cr
  &=\ C_1'\,\int_{|\zeta|\le\epsilon} \left|\hat v(\xi+\xi)\right|\,d\zeta.
  \cr}\eqno{\rm(E.84)}$$
We then apply Fourier's integral theorem,
$$\eqalign{|u(0)|
\ &\le\ (2\pi)^{-n/2}\,\int_{{\bf R}^n} |\hat u(\xi)|\,d\xi\cr
  &\le\ C_1''\,\int_{|\zeta|\le\epsilon}\left(\int_{{\bf R}^n}
        |\hat v(\xi+\zeta)|/\tilde q(\xi)\,d\xi\right)\,d\zeta\cr
  &\le\ C_1''\,\int_{{\bf F}^n}
    \left(\int_{\zeta^{\prime2}+\eta^{\prime2}\le\epsilon^2}
     |\hat v(\xi+\xi'+i\eta')|/\tilde q(\xi)\,d\zeta' d\eta'\right)\,d\zeta.
            \cr}\eqno{\rm(E.85)}$$
On the other hand, we have
$$\tilde q(\xi+\xi')/\tilde q(\xi) \le C_2'\quad\hbox{when}\quad
              |\zeta'|\le\epsilon,\eqno{\rm(E.86)}$$
because
$$D^\alpha q(\xi+\xi') = \sum_\beta {(\xi')^\beta\over\beta !}\,
      D^{\alpha+\beta} q(\xi),\eqno{\rm(E.87)}$$
so that $|D^\alpha q(\xi+\xi')|/\tilde q(\xi)$ is bounded when
$|\xi'|\le\epsilon$.  Therefore,
$$\eqalign{|u(0)|
\ &\le\ C_1''C_2'\,\int_{{\bf R}^n} 
   \left(\int_{\zeta^{\prime2}+\eta^{\prime2}\le\epsilon^2}
      |\hat v(\xi+\xi'+i\eta')|/\tilde q(\xi+\xi'))\,d\xi' d\eta'\right)
         \,d\xi\cr
\ &\le\ C_3'\,\Vert v\Vert^{\prime}\cr}\eqno{\rm(E.88)}$$
where
$$\Vert v\Vert^{\prime} = \int_{|\eta|\le\epsilon}
      \left(\int_{{\bf R}^n} |\hat v(\xi+i\eta)|\big(\tilde q(\xi)\big)\right)
          \,d\eta.\eqno{\rm(E.89)}$$
Recall that $u\in C_0^\infty({\bf R}^n)$.  Now observe that $C_3''$ is a
constant which depends only on $\epsilon$ (and, of course, on $k$ and $m$,
the fixed parameters).\quad\bull
\vfill\eject
\centerline{\bf References for Appendix E}
\vskip24pt\noindent
\llap{[1]\quad }Ehrenpreis, Leon; Solutions of some problems of division.  {\sl Amer. J. 
    Math.} {\bf 76}, 883--903 (1954)\par
\vskip6pt\noindent
\llap{[2]\quad }H\"ormander, Lars; Local and global properties of fundamental solutions.
  {\sl Math. Scand.} {\bf 5}, 27--39 (1957)\par
\vskip6pt\noindent
\llap{[3]\quad }Lewy, Hans; An example of a smooth linear 
partial differential equation
without solutions. {\sl Ann. of Math.} {\bf 66}, 155--158 (1957)\par
\vskip6pt\noindent
\llap{[4]\quad }Malgrange, B.; Existence et approximation 
des solutions des \'equations 
aux d\'eriv\'ees partielles et des \'equations de convolution.
{\sl Ann. Inst. Fourier} {\bf 6}, 271--355 (1955--56).\par
\vskip6pt\noindent
\llap{[5]\quad }Reed, Michael and Barry Simon; {\sl Methods of Modern Mathematical 
Physics II: Fourier Analysis, Self-Adjointness} (San Diego: Academic Press, 
Inc., 1975)\par
\vskip6pt\noindent
\llap{[6]\quad }Rudin, Walter; 
{\sl Real and Complex Analysis--Third Edition\/} (NY: McGraw-Hill Book
Company, 1987)\par
\vskip6pt\noindent
\llap{[7]\quad }Yosida, K\^osaku; {\sl Functional Analysis---Sixth Edition\/} 
(Berlin:  Springer-Verlag, 1980)\par

\vfill\eject\end








%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%                                                                     %%%%%
%%%%%                          Chapter 4                                  %%%%% 
%%%%%                                                                     %%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
\headline={\tenrm\hfill Adaptive Quadrature Routine}
\centerline{\bf Chapter 4}
\bigskip
\noindent{\bf\llap{4.0\quad}Introduction.}  Sometimes it's not easy to tell
just how many subdivisions are needed to do a numerical integration. 
The reasons for this uncertainty are many.
Function exhibiting bizarre behavior at a point or in a 
subinterval require special attention.  
Some continuous functions fail to have derivatives.
Moreover,  a function can be integrable
even if it is not be continuous at every point. 
For these cases and many others it's nice to
have some kind of automatic system for determining just when an approximation
is ``good enough.''  Fast modern computers may still be too slow.
In particular, if a function is
defined in terms of an integral its numerical 
evaluation may be time consuming; moreover, functions so defined may
be prone to significant error propagation.
In general, a function defined over an interval may be easier to integrate
over some subintervals than over others.  The methods which have been 
devised to deal with this phenomenon are call {\sl adaptive quadrature
routines}, or {\sl AQRs}.  Adaptive methods make use of error estimates to
automatically determine the number of subintervals, $n$, and
the interval size $h$.  These methods may also use smaller values of the
interval size $h$ over subintervals where it is needed.  There are many
systematic ways of doing this.  There is one method which is easily 
programmed (in BASIC), but which is theoretically involved.  This method
entails comparing Simpson's rule with the so-called trapezoid rule.  Another
method, not so easily programmed and requiring recursion, compares two 
different estimates from Newton-C\^otes' formulas.  In the following
paragraphs, we'll develop computer programs for the simplest 
computers---the programmable calculators, and compare results.
\vfill\eject
\noindent{\bf\llap{4.1\quad}A Simple Approach.}  The best introduction to this
subject is to work simple examples and then develop the theory as needed. 
Excellent numerical results can be obtained using a very simple approach.
We are given a function, $f(x)$, an interval $[\alpha,\,\beta\,]$, and an
error bound $\epsilon>0$.
First, approximate the integral with two (different) approximations:
the trapezoid rule and Simpson's rule.  Begin with $n=10$ subintervals.
Second, compute the error estimate (relative) as follows
$$E\ =\ \left| {A_T - A_S \over A_T + A_S} \right|, \eqno(67)$$
where $A_T$ is the approximation obtained from the trapezoid rule and
$A_S$ is the value from Simpson's rule.  If $E < \epsilon$, then the
results are OK.  If not, then set $n=20$ ($n=2\cdot n$) subintervals
and repeat the process.  Continue until either $E < \epsilon$ or $n$
get too large, say $n > 1000$.  This is easy to program and fast
to run.  The first prototype would look like this:
\medskip
{\tt\parindent=0pt\obeylines
10 \ DEF FNA(X) = SQR(1.0-SIN(EXP(1)/2){\char94}2*SIN(X){\char94}2)
20 \ N=10 : A=0.0 : B=2.0*ATN(1.0)  
30 \ H=(B-A)/N : S = 0.0 : T = 0.0
40 \ FOR I=1 TO N STEP 2
50 \ X=A+(I-1)*H : S = S + FNA(X) : T = T + FNA(X)
60 \ X=A+I*H : T = T + 4.0*FNA(X)
70 \ X=A+(I+1)*H : S = S + FNA(X) : T = T + FNA(X)
80 \ NEXT I : S1 = H*S : S2 = H*S/3.0 : E = ABS((S1-S2)/(S1+S2))
90 \ IF (E>0.000001) THEN N = 2*N : GOTO 30
100 PRINT "Integral = "; T, N
}
\medskip\noindent
The answer is given immediately as
\smallskip
{\tt\parindent=0pt\obeylines
Integral = 1.054686 \qquad 20
}
\smallskip\noindent
From our previous calculations, we know that this is precisely the
correct number of subintervals for this precision.  It is a good idea
to examine an alternative means of coding the same program.  It runs
faster if we use a {\tt GOSUB} construction instead of the defined
function.  The answers will be identical and we can consult the table
prepared in paragraph 1.2 for numerical results.
\medskip
{\tt\parindent=0pt\obeylines
10 \ D=0.0001 : INPUT "A,B = ";A,B : N=10
20 \ N=N+10 : H=(B-A)/N : X=A : S=0.0 : T=0.0  
30 \ FOR I=1 TO N STEP 2
40 \ GOSUB 100 : S = S + Y*H/3 : T = T + Y*H/2 : X = X + H
50 \ GOSUB 100 : S = S + 4*Y*H/3 : T = T + Y*H : X = X + H
60 \ GOSUB 100 : S = S + Y*H/3 : T = T + Y*H/2 : NEXT I
70 \ IF ABS((S-T)/(S+T)) > D THEN 20
80 \ PRINT "Integral = "; T, N
90 \ PAUSE : INPUT "Error \% = ";D : GOTO 20
100 Y = SQR(1.0-SIN(EXP(1)/2){\char94}2*SIN(X){\char94}2) : RETURN
}
\medskip\noindent
This program gives one the ability to increase the error estimate and
thereby determine closely the exact number of steps required.
Using this program with a hand-held calculator will give a very
good qualitative insight into the workings of the algorithm.  The 
interested reader will later be invited to experiment with other
adaptive quadrature routines to determine which is better.
\vfill\eject
\noindent{\bf\llap{4.2\quad}An Adaptive Quadrature Routine.}  Suppose that we
have a function $f(x)$ defined over an interval $[a,b]$.  We would like to
compute the integral close to the actual value.  Let's assume that we know
that the function is bounded, $|f(x)| \le M$ for all $x\in[a,b]$.  Then, for
an answer within say 0.01\% of the actual error, we would choose
$$\epsilon = {0.0001 \over M + 1}.\eqno(68)$$
With an overall error of $\epsilon$, we want to divide up the interval
$[a,b]$ so that the total error of integration is less than $\epsilon$.
There is an error formula which depends on the value of the fourth 
derivative of $f(x)$ on $[a,b]$, but we will depend on another criterion.
If we choose Simpson's rule
$$A_2 = {h\over3}\big( y_0 + 4y_1 + 2y_2 + 4y_3 + y_4\big),\eqno(69)$$
then the doubled interval rule
$$A_1 = {2h\over3}\big( y_0 + 4y_2 + y_4\big),\eqno(70)$$
is a good approximant.  The estimate of the error is\footnote{\dag}{Francis
Scheid, {\it Schaum's Outline Series Theory and Problems of Numerical
Analysis 2nd edition}, (NY: McGraw-Hill Book Company, 1988), page 124.}
$$E_2 \simeq {|A_2-A_1| \over 15 }.\eqno(71)$$
This follows from a comparison of the two error estimates:
$$E_1 = -{(b-a)(2h)^4y^{(4)}(\xi_1)\over180}\qquad
  E_2 = -{(b-a)h^4y^{(4)}(\xi_2)\over180},\eqno(72)$$
where $E_1$ is the error estimate for Simpson's rule with an interval length
of $2h$ and $E_2$ is the error estimate for Simpson'r rule with interval
length of $h$.
We can accept a value of $A_2$ whenever $|A_2-A_1| \le 15\epsilon/2^k$
is reached, for an interval being halved $k$ times.
An alternate derivation\footnote{\ddag}{Mike Stimpson, ``Numerical Integration
Using Adpative Quadrature,'' {\it The C Users Journal}, May 1992, page 35.}
supposes that $A$ is the actual value of the integral and requires
$E_1 = |A-A_1|$, $E_2 = |A-A_2|$, where $A_1$ is Simpson's rule and $A_2$ is
the so-called five-point approximation (Simpson's rule with interval length
of $h$).  From the triangle inequality and the inequality
$$E_2 \le {1\over16}\,E_1,\eqno(73)$$
the result follows.  The programming makes use of a technique known as
{\sl recursion}.  Recursion occurs when a computer subroutine calls
itself.  The situation in analysis occurs when a function, say $f(x)=\sin(x)$,
is used to define $g(x) =\sin\big(\sin(x)\big)=f\big(f(x)\big)$. Now we will
examine a ``C'' program which uses recursion to do adaptive quadrature:
\medskip
{\tt\obeylines\parindent=0pt
\#include <math.h>
\#include <float.h>
\#include <stdio.h>
double adaptive (double a, double b, double (*f)(double x), double *err);
double f(double x);
int main() {\char123}
\ double a,b,*err,int1;
\ a = (double) 0.0;
\ b = (double) 2.0*atan(1.0);
\ *err = (double) 0.00000001;
\ printf("{\char92}nleft endpoint \ = \%.16lf",a);
\ printf("{\char92}nright endpoint = \%.16lf",b);
\ printf("{\char92}nerror estimate = \%.16lf",*err);
\ int1 = adaptive(a, b, f, err);
\ printf("{\char92}nIntegral \ \ \ \ \ \ = \%.16lf",int1);
\ printf("{\char92}nTheoretical \ \ \ = \%.16lf",1.0);
\ return(0);
{\char125}
double f(double x) {\char123}
\  double y;
\  y = sin(x);
\  return (y);
{\char125}
double adaptive (double a, double b, double (*f)(double x), double *err) {\char123}
\  double h, s1, s2, s3, s4, s5, error, err1, err2, t1, t2;
\  h = b - a;
\  s1 = (*f)(a);  s2 = (*f)(a+0.25*h);  s3 = (*f)(a+0.5*h);
\  s4 = (*f)(b-0.25*h);  s5 = (*f)(b);
\  t1 = h*(s1+4.0*s3+s5)/6.0;
\  t2 = h*(s1+4.0*s2+2.0*s3+4.0*s4+s5)/12.0;
\  error = fabs(t1-t2)/15.0;
\  if (error < *err) {\char123}
\quad    *err = error;  return(t2);
\quad {\char125}
\  else {\char123}
\quad   err1 = err2 = *err/2.0;  t2 = adaptive(a, a+0.5*h, f, \&err1);
\quad    t2 += adaptive(a+0.5*h, b, f, \&err2);  return(t2);
\quad  {\char125}
{\char125}
/* End of File */
}
\medskip
\noindent The output of the program consists of the endpoints, the error
estimate (maximum), the computed answer and the theoretical answer.  Here, we
know that $\int_0^{\pi/2} \sin(x)\,dx = 1$.  The quadrature should return an
answer close to 1; we see that it does.
\medskip
{\tt\obeylines
left endpoint \ = 0.0000000000000000
right endpoint  = 1.5707963267948970
error estimate  = 0.0000000100000000
Integral \ \ \ \ \ \ = 1.00000000025973520
Theoretical \ \ \ = 1.0000000000000000
}
%
\medskip
\noindent While the adaptive quadrature routine (AQR) is impressive, it often
fails to evaluate improper integrals.  If the integrand of an improper
integral experiences an anomoly (``blows up'') at a point, it is necessary
to compute successive approximations for values approaching that point.  There
is no ``quick and easy'' solution to improper integrals.  One cardinal rule is
to graph the function first.  It is frequently possible to determine the
correct approach from a graph when so-called adaptive methods fail.
\vfill\eject
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%                                                                     %%%%%
%%%%                   Chapter 5---Metafont                              %%%%%
%%%%                                                                     %%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\headline{\tenrm\hfill METAFONT}
\centerline{\bf Chapter 5}
\bigskip
\noindent{\bf\llap{5.0\quad}Introduction.}  The creation of fonts is
usually something not approached lightly.  With the typesetting software of
\TeX\ this is especially true.  The problem of merging text with graphics
has been resolved in the earlier chapters using the macros of \PiCTeX, with
varying degrees of success.  Now it is necessary to investigate the more
sophisticated means of doing graphics.  \PiCTeX\ can be compared to assembly
language; {\bf METAFONT} to machine code.  \PiCTeX\ is straightforward and
easy to use; however, it uses a great deal of randon access memory (RAM) and
it is slow.  {\bf METAFONT}, on the other hand, creates a font which is 
read in directly.  It uses little memory and is fast.  There is a price which
must be paid for using {\bf METAFONT}---it has to be prepared in its own
language and pre-processed through its own processor.
\medskip
\noindent
The public domain version of {\bf METAFONT} does not include a user-friendly
view program.  It is necessary to produce files and then process them through
a \TeX\ program named {\tt testfont.tex}.  The program {\tt testfont.tex} is
well-written and clever.  It produces excellent output; however, one must
have a commercial previewer to use it on-line.  Otherwise, the output must be
printed on a printer.  The public domain previewer, {\tt cdvi12.arc}, only
employs the sixteen basic \TeX\ fonts; {\bf METAFONT} actually produces 
{\sl new\/} fonts---that is, the graphics from {\bf METAFONT} are fonts.  Each
graph is actually a single letter or blocks of letters ``tiled'' together.
Donald E. Knuth is correct in saying, ``Geometric design are rather easy;
$\dots$.''  {\bf METAFONT} does allow the user a great deal of freedom in
constructing graphs.  There are pitfalls, however, and some experimentation
is needed to construct the desired output.
\medskip
\noindent
In {\bf METAFONT}, the subscripted variables $x_1$ and $y_1$ may be
written either as $x[1]$, $y[1]$ or as $x1$, $y1$; likewise, $x_2$, $y_2$ may be
written either as $y[2]$, $y[2]$ or $x2$, $y2$, and so on.  The
letter $z$ stands for the ordered pair $(x,y)$ (just as is the case
in complex analysis).  Thus $z_1$ is equivalent to $(x_1, y_1)$,
$z_2$ is equivalent to $(x_2, y_2)$, etc.  When we write $z3$ we mean
$(x3, y3)$ or $(x[3], y[3])$.  This shorthand aids in construction.
For those who recall FORTRAN (the FORmula TRANSlator of the 1960's),
the symbol {\tt **} meant exponentiation.  $x${\tt **}$y$ meant
$x^y$.  This was the first encounter many of us had with a ``two
character'' operator.  (The binary operators $+$, $-$, $*$ (times),
and $/$ were well understood.)  In BASIC (Beginner's All-purpose
Symbolic Instruction Code), FORTRAN's {\tt **} has been replaced by
the single character operator {\tt \char94}.  {\bf METAFONT} has
re-introduced the two character operator.  This time $z1${\tt..}$z2$
means draw a (curved) line from $z1$ ($z[1]$) to $z2$ ($z[2]$) and
$z1--z2$ means draw a straight line from $z1$ to $z2$.  The symbol
``$:=$'' means assignment while ``$=$'' means simple equality
(replacement).  Variables followed by the hash mark (American pound
sign) ({\tt\#}) are called ``sharped'' and have absolute values
(values that are resolution-independent) 
as opposed to other variables whose values are derived and
determined programmatically.
\vfill\eject
\noindent{\bf\llap{5.1\quad}A Metafont File.}  We will begin by
listing a simple {\bf METAFONT} file that will produce a small
Gaussian distribution. 
\medskip
\noindent
{\tt\obeylines\parindent=0pt\ttraggedright
\% The statement "mode\_setup" adapts METAFONT to the current task.
mode\_setup;
\ em\#:=30pt\#; cap\#:=10pt\#;
\% The statement "define\_pixels( , )" converts the values
\% \quad of its (two) arguments into pixel (picture element) units.
define\_pixels(cap, cap);
\% The "beginchar" operation assigns values to the variables
\% \quad w, h, and d, which represent the width, height,
\% \quad and deptgh of the current character's bounding rectagle.
beginchar("A",em\#,em\#,0); "A for Gaussian Distribution";
\ pickup pencircle scaled 0.4pt; \quad \% Use a pen of diameter 0.4 point.
\ z1=(0,0.39894b); 
\ z2=(0.1a,0.39695b);
\ z3=(0.2a,0.39104b);
\ z4=(0.3a,0.38138b);
\ z5=(0.4a,0.36827b);
\ z6=(0.5a,0.35206b); 
\ z7=(0.6a,0.33322b);
\ z8=(0.7a,0.31225b);
\ z9=(0.8a,0.28969b);
\ z10=(0.9a,0.26608b);
\ z11=(a,0.24197b); 
\ z12=(1.2a,0.19418b);
\ z13=(1.4a,0.14972b);
\ z14=(1.6a,0.11092b); 
\ z15=(1.8a,0.07895b);
\ z16=(2a,0.05399b); 
\ z17=(2.5a,0.01752b);
\ z18=(3a,0.00443b);
\ z19=(3.5a,0.00087b);
\ z20=(4a,0.0001338b);
\ a=w; b=8h;
\ draw z1..z2..z3..z4..z5..z6..z7..z8..z9..z10
\qquad        ..z11..z12..z13..z14..z15..z16..z17..z18..z19..z20;
\ for k=2 step 1 until 20: 
\ z[20+k]=(-x[k],y[k]); endfor
\ draw z1..z22..z23..z24..z25..z26..z27..z28..z28..z30
\qquad        ..z31..z32..z33..z34..z35..z36..z37..z38..z39..z40;
\ draw (0,-0.02b)--(0,0);
\ draw (1a,-0.02b)--(1a,0);
\ draw (2a,-0.02b)--(2a,0);
\ draw (3a,-0.02b)--(3a,0);
\ draw (4a,-0.02b)--(4a,0);
\ draw (-1a,-0.02b)--(-1a,0);
\ draw (-2a,-0.02b)--(-2a,0);
\ draw (-3a,-0.02b)--(-3a,0);
\ draw (-4a,-0.02b)--(-4a,0);
\ draw (-4a,0)--(4a,0);
\ draw (1.5a,0)--(1.5a,0.12952b);
\ endchar; \quad \% Write character to output file.
}
\medskip\noindent
It is not possible to display the above graph using the public domain
view program {\bf CDVI12.arc}.  {\tt CDVI-2.com} and the other previewers
use only the sixteen basic \TeX\ fonts.  Some discipline is needed to use
public domain software; it is not as flexible as the commercially procured
products.  However, there is much to be said for the persons and organizations
that devote their time and energy to the creation of public domain
instruments.  Without such volunteerism, many scholars and scientists would
simply be unable to afford needed tools.  Certain activities are not able
to budget for commercial software; governmental agencies are only able
to procure software within the strict limits of their charters---and then
only with ADPE (Automatic Data Processing Equipment) approval from their MIS
(Management Information System) directors.  Public domain materials are an
invaluable tool for research and development.  
\medskip\noindent
Since a public domain \TeX\ previewer is to be employed and only sixteen
fonts are allowed, one cannot view a newly created font.  The graphs
produced by {\bf METAFONT} are, essentially, new fonts.  In particular,
each graph is one letter in a font.  Some display of the {\bf METAFONT}
graph is needed;
therefore, a \PiCTeX\ picture is provided
to give the reader an idea of the construction.
%\vbox{\narrower\narrower\narrower\noindent{\bf 25.} The main point here
%is that you can't have \PiCTeX\ draw a curve by simply giving it the formula
%for that curve --- you have to supply explicit coordinate points.  Here
%is the code that produced the figure.}
%
\bigskip
\centerline{%
 \beginpicture %
   \setcoordinatesystem units <.5in,2.5in> %
   \setplotarea x from -3 to 3, y from 0 to .4 %
   \plotheading {\lines {%
     The density $\varphi(\zeta) = e^{-\zeta^2\!/2}/\sqrt{2\pi}$ of the\cr %
     standard normal distribution.\cr}} %
   \axis bottom ticks numbered from -3 to 3 by 1 %
     length <0pt> withvalues $\zeta$ / at 1.5 /  / %
   \linethickness=.25pt %
   \putrule from  1.5 0  to  1.5 .12952 % (.12952 = density at 1.5)
   \setbox0 = \hbox{$swarrow$}%
   \put {$\swarrow$ \raise6pt\hbox{$\varphi(\zeta)$}} %
     [bl] at 1.5 .12952 %
\ifexpressmode
  \put {\bf EXPRESSMODE} at 0 .2 %
\else
   \setquadratic \plot
     0.0      .39894
     0.16667  .39344  0.33333 .37738   0.5  .35207   0.66667  .31945 
     0.83333  .28191  1.      .24197   1.25 .18265   1.5      .12952 
     1.75     .08628  2.      .05399   2.25 .03174   2.5      .01753 
     2.75     .00909  3.0     .00443 /                              
   \setquadratic \plot
     0.0      .39894