That is, the integral from - to + infinity of e^(-x^2) dx = sqrt(pi).
I remember being given this as an exercise and just being totally shocked by how beautiful it was as a result (when I eventually managed to work out how to evaluate it).
It's the gateway drug to Laplace's method (Laplace approximation), mean field theory, perturbation theory, ... QFT.
The wikipedia link would have made things quite clear :)
What is also worth pointing out and which was somewhat glanced over is the close connection between the weight function and the polynomials. For different weight functions you get different classes of orthogonal polynomials. Orthogonal has to be understood in relation to the scalar product given by integrating with respect to the weight function as well.
Interestingly Gauss-Hermite integrates on the entire real line, so from -infinity to infinity. So the choice of weight function also influences the choice of integration domain.
Like, is it possible to infer that Chebyshev polynomials would be useful in approximation theory using only the fact that they're orthogonal wrt the Wigner semicircle (U_n) or arcsine (T_n) distribution?
The weight function shows the Chebyshev polynomials' relation to the Fourier series . But they are not what you would usually think of as a good candidate for L2 approximation on the interval. Normally you'd use Legendre polynomials, since they have w = 1, but they are a much less convenient basis than Chebyshev for numerics.
But I guess what I was asking was: is there some kind of abstract argument why the semicircle distribution would be appropriate in this context?
For example, you have abstract arguments like the central limit theorem that explain (in some loose sense) why the normal distribution is everywhere.
I guess the semicircle might more-or-less be the only way to get something where interpolation uses the DFT (by projecting points evenly spaced on the complex unit circle onto [-1, 1]), but I dunno, that motivation feels too many steps removed.
But your last paragraph is exactly it... it is a "basic" fact but the consequences are profound.
If you are familiar with the Fourier series, the same principle can be applied to approximating with polynomials.
In both cases the crucial point is that you can form an orthogonal subspace, onto which you can project the function to be approximated.
For polynomials it is this: https://en.m.wikipedia.org/wiki/Polynomial_chaos
There are polynomials that aren't orthogonal that are suitable for numerics: both the Bernstein basis and the monomial basis are used very often and neither are orthogonal. (Well, you could pick a weight function that makes them orthogonal, but...!)
The fact of their orthogonality is crucial, but when you work with Chebyshev polynomials, it is very unlikely you are doing an orthogonal (L2) projection! Instead, you would normally use Chebyshev interpolation: 1) interpolate at either the Type-I or Type-II Chebyshev nodes, 2) use the DCT to compute the Chebyshev series coefficients. The fact that you can do this is related to the weight function, but it isn't an L2 procedure. Like I mentioned in my other post, the Chebyshev weight function is maybe more of an artifact of the Chebyshev polynomials' intimate relation to the Fourier series.
I am also not totally sure what polynomial chaos has to do with any of this. PC is a term of art in uncertainty quantification, and this is all just basic numerical analysis. If you have a series in orthgonal polynomials, if you want to call it something fancy, you might call it a Fourier series, but usually there is no fancy term...
In this case it is about the principle of approximation by orthogonal projection, which is quite common in different fields of mathematics. Here you create an approximation of a target by projecting it onto an orthogonal subspace. This is what the Fourier series is about, an orthogonal projection. Choosing e.g. the Chebychev Polynomials instead of the complex exponential gives you an Approximation onto the orthogonal space of e.g. Chebychev polynomials.
The same principle applies e.g. when you are computing an SVD for a low rank approximation. That is another case of orthogonal projection.
>Instead, you would normally use Chebyshev interpolation
What you do not understand is that this is the same thing. The distinction you describe does not exist, these are the same things, just different perspectives. That they are the same easily follows from the uniqueness of polynomials, which are fully determined by their interpolation points. These aren't distinct ideas, there is a greater principle behind them and that you are using some other algorithm to compute the Approximation does not matter at all.
>I am also not totally sure what polynomial chaos has to do with any of this.
It is the exact same thing. Projection onto an orthogonal subspace of polynomials. Just that you choose the polynomials with regard to a random variable. So you get an approximation with good statistical properties.
> What you do not understand is that this is the same thing.
It is not the same thing.
You can express an analytic function f(x) in a convergent (on [-1, 1]) Chebyshev series: f(x) = \sum_{n=0}^\infty a_n T_n(x). You can then truncate it keeping N+1 terms, giving a degree N polynomial. Call it f_N.
Alternatively, you can interpolate f at at N+1 Chebyshev nodes and use a DCT to compute the corresponding Chebyshev series coefficients. Call the resulting polynomial p_N.
In general, f_N and p_N are not the same polynomial.
Furthermore, computing the coefficients of f_N is much more expensive than computing the coefficients of p_N. For f_N, you need to evaluate N+1 integral which may be quite expensive indeed if you want to get digits. For p_N, you simply evaluate f at N+1 nodes, compute a DCT in O(N log N) time, and the result is the coefficients of p_N up to rounding error.
In practice, people do not compute the coefficients of f_N, they compute the coefficients of p_N. Nevertheless, f_N and p_N are essentially as good as each other when it comes to approximation.
If you would like to read what I'm saying but from a more authoritative reference that you feel you can trust, you can just take a look at Trefethen's "Approximation Theory and Approximation Practice". I'm just quoting contents of Chapter 4 at you.
Again, like I said in my first response to you, what you're saying isn't wrong, it just misses the mark a bit. If you want to compute the L2 projection of a function onto the orthogonal subspace of degree N Chebyshev polynomials, you would need to evaluate a rather expensive integral to compute the coefficients. It's expensive because it requires the use of adaptive integration... many function evaluations per coefficient! Bad!
On the other hand, you could just do polynomial interpolation using either of the degree N Chebyshev nodes (Type-I or Type-II). This requires only N+1 functions evaluations. Only one function evaluation per coefficient. Good!
And, again, since the the polynomial so constructed is not the same polynomial as the one obtained via L2 projection mentioned in paragraph 3 above, this interpolation procedure cannot be regarded as a projection! I guess you could call it an "approximate projection". It agrees quite closely with the L2 projection, and has essentially the same approximation power. This is why Chebyshev polynomials are so useful in practice for approximation, and why e.g. Legendre polynomials are much less useful (they do not have a convenient fast transform).
Anyway, I hope this helps! It's a beautiful subject and a lot of fun to work on.
It would be better shown as a table with 3 numbers. Or, maybe two columns, one for integral value and one for error, as you suggest.