To some degree all of these questions require knowing the spectra of atoms, which can in theory be calculated by Quantum mechanics. However the calculations of these spectra for arbitrary systems from first principles is prohibitively difficult and computationally intensive (which is why techniques such as Density Functional Theory are used).
This post will roughly outline to calculate the spectrum of the smaller atoms by explicitly diagonalising a matrix, whose elements are simple combinatorial quantities.
The non-relativistic Hamiltonian in the Born-Oppenheimer approximation of an n-electron atom in SI units is given by where is the momentum of the ith electron, is its distance from the nucleus, $r_ij$ is the distance between the ith and jth electron, m is the mass of an electron, Z is the charge of the nucleus, and e is the charge of an electron.
To simplify matters choose units such that , and , these will be used in the rest of this article. Then . The terms correspond to the kinetic energy of the electrons, the electron-atom interaction, and the electron-electron interactions respectively. If we neglect the third term we recover the equation of a Hydrogenic atom which can be solved algebraically.
Our approach is to calculate the the elements of the Hamiltonian matrix in the Hydrogenic basis. We can then explicitly diagonalise the matrix in this basis; if the electron-electron term is small the matrix will be almost-diagonal. I will only cover the case of the bound states; the unbound states do need to be considered at a future point, but at least near the ground state their contribution should be negligible.
The Hydrogenic atom can be simultaneously diagonalised in a number of different basis sets (corresponding to different coordinate systems); we need a basis whose symmetry is preserved by the perturbation. Since the perturbed Hamiltonian is spherically symmetric, we choose the basis H, L, Lz where the bound states are characterised by the quantum numbers n, l, m, s ( with corresponding eigenvalues , l(l+1), m$). The electronic states of an n-electron atom, neglecting electron-electron interactions, are then the antisymmetric tensor products of these states.
We now proceed to calculate the matrix elements of the full Hamiltonian in this basis. The only non-trivial part of the calculation are the terms . The spins and terms i, j not equal to s, t factor through, giving Kronecker deltas. The remaining calculation is . Since the term commutes with and , it is also proportional to .
Integrate in spherical coordinates over first , then setting the z-axis of the second coordinate system along the vector . Then is the angle between the two electrons at the nucleus, and the integrand is , and consequently the first solid angle integral is trivial.
Thus we just need to evaluate (notice that the integral must be invariant under interchange of all 1 labels with 2 labels; in practice we make the choice that makes the integral easiest).
We now separate the integral into two regions; where it is less than the term can be expanded as where is a Legendre Polynomial, and in the other region we switch with .
The integral then becomes plus the integral switching r1 with r2 (after a fiddling change of coordinates).
The angular integral is a combinatorial quantity, which can be expressed in terms of the Clebsch-Gordan coefficients. It can be expressed using recurrance relations which can be used to compute this part of the integral. [In fact there is an explicit combinatorial representation, although in practice it would be quicker to compute it using recurrance.]
The inner radial part of the integral can be calculated by expanding the Legendre polynomials as a power series and using the relation , and the outer part of the integral can then be calculated using this relation again with R=0. This is simply a combinatorial factor than needs to be determined.
Thus once we have evaluated these combinatorial quantities, and combined them all to get an expression for the matrix elements of the total Hamiltonian H we can truncate it to a finite basis, and then diagonalise it computationally. It is a very interesting question as to how the truncation affects the eigenvalues.
An automoton is (roughly) a set of symbols, and a set of states, along with transitions for each state that take a symbol and return another state. They can be used to model (and verify) simple processes.
Automata can be brought into correspondence with formal languages in a very natural way; given an initial state s, and a sequence of symbols (a1, a2, …, an) the automata has a naturally assigned state (… ((s a1) a2) … an) (where “(state symbol)” represents the state obtained from the transition on symbol using state). Then if we nominate an initial state, and a set of “accepting” valid states, we say a string is in the language of the automata if and only if when applied to the initial state it ends in a final state.
This gives a very useful pairing in computer science; formal languages are useful tools, and automata (often) give an efficient way to implement them on a computer.
To get a little more mathematical a semigroup is a a closed associative binary operation; if we add a two sided identity it is called a monoid, and if we additionally add inverses it becomes a group. For instance under addition the set of positive numbers (1, 2, …) is a semigroup, the set of non-negative numbers is a monoid with identity 0, and the set of integers is a group with -a being the inverse of a. Clearly every group is a monoid (forgetting about the inverses) and every monoid is a semigroup (forgetting about the identity).
In the same way a group often arises as a set of invertible transformations (isomorphisms), a monoid often arises as a set of transformations (morphisms). Another useful example of a monoid is sets under union, with 0 corresponding to the empty set.
The free monoid generated by a set S, denoted S*, is the set of all (finite or infinite) sequences of elements of S with multiplication defined as concatenation.
For example {x}* has set (where represents the sequence with no elements), and multiplication is given by . As a further example some elements of {x, y}* are , x, y, xx, xy, yx, yy, xxx, xxy, xyx, yxx, xyy, yxy, yyx, yyy, …
In computer science terms the free monoid generated by the set S is precisely the set of all strings (or words) in the alphabet S, with the monoid product corresponding to string concatenation.
Using this notation a language over a finite set S is a subset of S*; that is an element of the power set . More generally we can define a language over a monoid, M, as an element of the power set of M.
There is a natural product on the power set of a monoid; , and so it too is a monoid with identity . There is another natural monoidal structure on any collection of subsets; union with the additive identity of the empty set. Notice that and similarly , and . Consequently the power set of a monoid naturally has the structure of a semiring.
Given a subset S of a monoid M, denote S* (the Kleene star) to be its monoidal closure; the smallest submonoid of M containing S. The regular expressions over a set (alphabet) Σ is defined to be the set generated by the elements using the Kleene algebra formed by the semiring with the Kleene star.
On the other hand a deterministic finite automoton (DFA) over a set (alphabet) Σ is a finite set of states S, an initial state s in S, a subset F of accepting states of S, and a transition map . The language of a DFA is the set of all strings (a b … x) of symbols in Σ such that . Often a DFA is represented diagramatically using circles to represent states, and labelled arrows to represent the transitions between states [this looks rather like a category theory diagram]. The initial state is denoted by a horizontal arrow pointing to it, and the final states are represented by a double circle.
In the example above the alphabet is {0, 1} the states are {S1, S2}, the initial state is S1, and the final states are {S1}, the transitions are t(S1, 0) = S2, t(S1, 1) = S1, t(S2, 1) = S2, t(S2, 0) = S1.
Often the transitions are represented as a table with states listed vertically and transitions listed horizontally e.g.
0 | 1 | |
S_{1} | S_{2} | S_{1} |
S_{2} | S_{1} | S_{2} |
More algebraically we can consider the transition to be the monoidal action; since the elements of Σ generate Σ* freely, the transition extends uniquely to a function such that T(t, xy) = T(T(t, x), y) for any state t and elements of Σ* x and y.
Rephrasing and generalising slightly, a DFA over a monoid M is a set of states S, a (contravariant) monoid homomorphism (where Map(S) represents all functions from S to S; i.e. ), an initial state s from S, and a subset F of accepting states in S. Then the language of a DFA is precisely .
Theorem: The regular languages are equivalent to the languages representable by a DFA.
This theorem can be proved as follows: a DFA is inductively transformed into a regular expression by transforming the DFA that can only pass through an increasingly large subset of states. A regular expression is transformed into a nondeterministic finite automoton, which is in turn transformed into a DFA.
A nondeterministic finite automaton (NFA) over a set Σ is a set S, an initial state s, a set of accepting states F and a transition map t . Then a string (a b … c) is in the language of the NFA if and only if there is some such that .
Sometimes epsilon transitions are allowed; that is transitions that take no input so t , and we allow arbitrary epsilon insertions in the string. As with DFAs, NFAs can be represented diagramatically and implemented efficiently as a table (though in this case we need to trace every possible path of a transition).
These can be extended more algebraically as follows: a NFA over a monoid M is a set S, a set of initial states I, a set of accepting states F, and a monoid/semigroup homomorphism . The language of an NFA is . (The previous definition is simply M = Σ*, I = {s} and ).
The monoidal case corresponds to no epsilon transitions, and the semigroup case allows monoidal transitions (for then need not be the identity). To promote a semigroup homomorphism ρ to a monoidal homorphism η we simply define (considering as a Kleene algebra in the obvious way).
It is almost trivial to represent a given NFA as a DFA; we take the set of the DFA to be , the initial state to be S, the final states to be any state intersecting F, and the transition function to be the same. This is the so called power-set construction.
So DFA=NFA=Regular Languages.
The defining reference for DVI files is David R Fuch’s article in TUGboat Vol 3 No 2.
To find out what information is contained in a particular DVI file use Knuth’s dvitype, which outputs the operations contained in the bytecode in human readable format.
This article goes into gory detail the instructions contained in a very simple DVI file.
DVI is designed as a description of how to typeset horizontally and vertically a black and white document (in a left-to-right alphabetic language) for printing.
The basic operations are to typeset a character (in the specified font) optionally advancing by the character’s width, to typeset a rectangular box optionally advancing the box’s width, and setting variables (including the font and position where to typeset).
The file is encoded as a series of 8-bit bytes; the first byte is an operation followed by a given number of arguments. Each argument is either a fixed number of bytes in length, or has a length given by a prior argument. There are four kinds of parameter: unsigned integer (represented by its bytes as a binary number), signed integer (using two’s complement), pointer, or strings (represented as a series of 1-byte character codes) used for information and filesystems.
Internally there are a number of state parameters; the current font f (a 4-byte signed integer), the position and spacing variables (h, v, w, x, y, z) (each a 4-byte signed integer) and a stack of position and spacing variables. (h, v) represents the point h units to the right and v units down from the top left corner of the page. w, x are horizontal spacing parameters and y, z are vertical spacing parameters. The units are determined in the file itself.
Below are the operations, I will use the notation n[4] to represent a parameter of 4 bytes, and curly braces {} to represent a range of commands.I will use characters a, b, c, … to represent signed integers; i, j, k, l, m, n, … to represent unsigned integers; p, q, … to repesent pointers and A, B, … to represent characters and X to represent a custom (user implemented) type. These types have been inferred and not checked, so use with caution.
Hex code | Name | Params | Function |
---|---|---|---|
{0-7F} | set_char_{1-127} | Typeset character {1-127} in font f at (h, v), then advance h by the width of that character | |
{80-83} | set{1-4} | m[{1-4}] | Typeset character m in font f at (h, v), then advance h by the width of that character |
84 | set_rule | a[4], b[4] | Typeset box of width a, height b at (h, v), then advance h by a |
{85-88} | put{1-4} | m[{1-4}] | Typeset character m in font f |
89 | put_rule | a[4], b[4] | Typeset box of width a, height b at (h, v) |
8A | nop | No operation | |
8B | bop | a{0-9}[4] p[4] | New page, a{0-9} are TeX registers \count{0-9} to identify the page for reference, p is a pointer to previous page (or -1 for first page). All state is reset |
8C | eop | End of page, output page. Stack should be empty | |
8D | push | Push (h, v, w, x, y, z) onto stack | |
8E | pop | Pop from stack, and set variables | |
{8F-92} | right{1-4} | a[{1-4}] | Advance h by a |
93 | w0 | Advance h by w | |
{94-97} | w{1-4} | a[{1-4}] | Set w to a and advance h by w |
98 | x0 | Advance h by x | |
{99-9C} | x{1-4} | a[{1-4}] | Set x to a and advance h by x |
{9D-A0} | down{1-4} | a[{1-4}] | Advance v by a |
A1 | y0 | Advance v by y | |
{A2-A5} | y{1-4} | a[{1-4}] | Set y to a and advance v by y |
A6 | z0 | Advance v by z | |
{A7-AA} | z{1-4} | a[{1-4}] | Set z to a and advance v by z |
{AB-EA} | fnt_num_{0-64} | Set f to {0-64} | |
{EB-ED} | fnt{1-3} | m[{1-3}] | Set f to m |
EE | fnt4 | a[4] | Set f to a |
{EF-F2} | xxx{1-4} | m[1-4] X[m] | Implementation dependent; nop in general. Sent via TeX’s \special. |
{F3-F5} | fnt_def{1-3} | i[{1-3}] j[4] k[4] l[4] m[1] n[1] A[m+n] | Sets font h to be the font loaded from subpath “A[0:m]/A[m:n]” of the standard fonts directory, with checksum j, scaled by k/l. k and l must be less than 2^27 |
F6 | fnt_def4 | a[4] j[4] k[4] l[4] m[1] n[1] A[m+n] | Sets font a as for F3-F5. |
F7 | pre | i[1] j[4] k[4] l[4] m[1] A[m] | Preamble; i is DVI version number which is 2. l is considered a magnification; 1 unit is set to and the entire document is scaled by a factor of . A is an information header |
F8 | post | See below | Postamble; see below |
F9 | post_post | See below | Post postamble; see below |
FA-FF | undefined |
A dvi must start with a preamble, followed by 1 or more pages ends with a postamble. A page is a bop followed by any instructions and terminating in an eop. The only operations that can go between these chunks are nop and font definitions.
The postamble has a 4-byte pointer to the beginning of the last page, then the parameters j, k, and l from the preamble (called numerator, denominator, and magnification respectively), a 4-byte signed integer giving the height+depth of the tallest page, a 4-byte signed integer giving the width or the widest page, a 2-byte unsigned integer giving the maximum stack depth in the DVI, and a 2-byte unsigned integer giving the total number of pages (bop commands).
Then each font must be defined. Each font must be defined exactly twice in the document, once before its first use (before the postamble) and once in the postamble.
The postamble concludes with the post-postamble, which contains a 4-byte pointer to the beginning of the postamble, followed by the version number i from the preamble followed by 4 of more of DFs (why not 8As? I have no idea).
The file is thus designed to be read forwards (one operation at a time) or backwards (one page at a time), and useful set-up information (page size and maximum stack depth) are at the back. The files are typically very compact, and because of their linear nature can be processed rapidly. Notice that the page is the minimum displayable unit; a page may be typeset non-linearly, but after a bop the page can no longer be affected.
I typeset the following file Hello.tex
Hello World!
\bye
and then ran commands
tex Hello.tex
xxd Hello.dvi
which yielded
0000000: f702 0183 92c0 1c3b 0000 0000 03e8 1b20 .......;....... 0000010: 5465 5820 6f75 7470 7574 2032 3031 332e TeX output 2013. 0000020: 3038 2e31 323a 3138 3034 8b00 0000 0100 08.12:1804...... 0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000050: 0000 00ff ffff ff8d 9ff2 0000 8ea0 0283 ................ 0000060: 33da 8da0 fd86 cc26 8d91 1400 00f3 004b 3......&.......K 0000070: f160 7900 0a00 0000 0a00 0000 0563 6d72 .`y..........cmr 0000080: 3130 ab48 656c 6c6f 9103 5555 5791 ff2a 10.Hello..UUW..* 0000090: aa6f 726c 6421 8e8e 9f18 0000 8d92 00e8 .orld!.......... 00000a0: 60a3 318e 8cf8 0000 002a 0183 92c0 1c3b `.1......*.....; 00000b0: 0000 0000 03e8 029b 33da 01d5 c147 0002 ........3....G.. 00000c0: 0001 f300 4bf1 6079 000a 0000 000a 0000 ....K.`y........ 00000d0: 0005 636d 7231 30f9 0000 00a5 02df dfdf ..cmr10......... 00000e0: dfdf dfdf ....
Let’s walk through this byte by byte.
f7 02 018392 c01c3b0000 000003e8 1b 20 54 65 58 20 6f 75 74 70 75 74 20 32 30 31 33 2e 30 38 2e 31 32 3a 31 38 30 34
The first line starts with the pre opcode, followed by the version 02, numerator = 25400000, denominator = 473628672 and magnitude = 1000. This means 1 unit is . There are 72.27 standard points in an inch and 2.54 cm in an inch, so a standard point is . Thus a unit is standard points; what TeX calls a scaled point (sp). The document is then scaled by 1000/1000=1.
The second line is the documentation string; 1b states it consists of 27 bytes. The bytes then form the ASCII string
TeX output 2013.08.12:1804
Evidently I ran TeX at 18:04 pm on the 12th of August 2013 A.D.
8b 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ffffff
The preoperation is followed by the values in the registers count 0 through 9. By default TeX uses \count0 for the page number and doesn’t affect the other counts, so we get \count0 = 1 (first page), \count{1-9}=0. Finally since this is the first page the pointer to the previous page is -1 (ffffff).
8d 9f f20000 8e
Pushes the current (h, v, w, x, y, z) onto the stack; (0, 0, 0, 0, 0), then moves v down by -917504 sp, then pops the stack resetting v; overall achieving nothing. I presume this would probably put a header in a more complex document.
a0 028333da 8d a0 fd86cc26 8d
Move v down by 42152922 sp = 8.84 inches, pushes onto the stack (0, 42152922, 0, 0, 0, 0), then drops v down -41497562 sp, then pushes onto the stack and pushes onto the stack (0, 655360, 0, 0, 0, 0) (this is 10 standard points down from the top of the page).
91 140000
Moves h right by 1310720 sp = 20 standard points.
f3 00 4bf16079 000a0000 000a0000 00 05 63 6d 72 31 30
f3 is fnt_def1, and we set font number 0, the checksum is 4bf16079 and we the “scale size” and “design size” are both 655360 sp = 10 standard points, so it comes out at its default size of 10 points.
The next line is the path reference: the length of the directory name is 0, the length of the file name is 5, and the file name is cmr10
.
ab
Set the font to font 0 (cmr10)
48 65 6c 6c 6f
Typeset Hello
, advancing at each character.
91 035555 57
Move right 3 1/3 standard points and typeset w
, advancing.
91 ff2aaa 6f 72 6c 64 21
We now move left by 5/6ths of a standard point (this is TeX performing kerning) and then typeset orld!
8e 8e 9f 180000 8d
Pop twice, giving the current state (0, 42152922, 0, 0, 0, 0), so we are down about 8.84 inches; then move down another 1572864 sp = 24 standard points and push this position onto the stack.
92 00e860a3 31 8e 8c
Move right 15229091 sp ~= 3.2 inches which is almost half way accross the 6.5 inch width (at the end), typeset 1
then pop the stack and end the page.
f8 0000002a 018392c0 1c3b0000 000003e8 029b33da 01d5c147 0002 0001
Begin the postamble; the last page begins at the 42nd byte (2a).
Restate the numerator, denominator and magnification.
The maximum page height+depth is 43725788 sp ~= 9.23 inches, and the maximum page width is 30785863 sp ~= 6.5 inches.
The maximum stack depth is 2.
There is 1 page.
f3 00 4bf16079 000a0000 000a0000 00 05 63 6d 72 31 30
Repeat the font definition.
f9 000000a5 02
The post-postamble declares the post-amble starts at byte 165, and reiterates this is version 02 of DVI.
df df df df df df df
Finally pad with df’s; the file should be a multiple of 4 bytes.
Keep in mind the DVI format was invented in 1979; it’s amazing how well it’s stood up. Most parameters can be specified by a 32-bit number; this means distances are specified by more than 1 part in 2 billion, and numbers by more than 1 part in 4 billion.
In particular using the scaled point we can specify positions to half a nanometre for pages up to 23 metres in length! (And we could magnify this by more than 16 orders of magnitude!)
Our fonts can have up to 2^32 characters; so it can encode even fonts that conform to the widest Unicode convention today UTF-32. We can also have up to 2^32 fonts; we could use this to hold fonts in different colours, styles (e.g. bold, italic, …) or perhaps something more exotic like angled lines of different slopes and thicknesses.
However we can’t embed our fonts, if someone wants to view the DVI they have to ensure they have all the correct fonts, in the correct directory (which is determined in part by the DVI viewer) which can be no more than 1 level deep. Also there is no way in the DVI specification of fetching information about the font (height, width, etc.), so if we want to put an accent on a character we have to hard code its properties into the DVI. This would be OK if put_char didn’t advance by the character’s width; this badly breaks orthogonality of the system. Another bad example would be right-to-left text; you would have to hard code jumps relating to each character’s width.
Because we have to have a pointer to the postamble the DVI can’t much more than 4Gb in size; even if all our characters are Unicode, and we have 80 characters per line and 50 lines per page we should still be able to produce a document well over a hundred thousand pages in size. The stack can be at most 65535 level deep (which I think you’d only exceed if you were trying). The 10 counters also allow us to collate our document in a highly non-linear fashion.
It’s odd there are two widths w, x and two heights y and z; in fact zero would suffice by directly adding to h and v. I think the reason for these variables must be buried in TeX.
There’s no way to insert images, change the colour of text, and for computer viewers no way to insert hyperlinks, videos and sounds or take user input. There’s also no way to draw complex objects without either rendering them specially as fonts, or explicitly constructing them using rectangular boxes as pixels. However the instruction xxx (from TeX’s \special{}) allows us to implement these things on an ad hoc basis. So it is extremely extensible; but we have to work to make those extensions portable.
It uses Descartes’ rule of signs: given a polynomial the number of real positive roots (counting multiplicites) is bounded above by the number of sign variations in the sequence .
So as an example has a sequence of coefficients which contains 1 sign change (we ignore zeros), and so has at most one positive root; in fact we know it has exactly one positive root 1. On the other hand the bound is necessary: has 2 sign changes, but no positive real roots.
Descartes’ theorem tells us about the number of zeros of a degree n polynomial on the open interval ; what if we wanted to know about the number of zeros on some other interval ? We could perform a projective transformation (); in order to still have a polynomial we need to multiply out the denominator to get . The positive zeros of q(x) are the positive zeros of and the zeros of p(x) between and . In particular if we choose a=u, b=v, c=1, d=1 the number of positive zeros of q(x) is precisely the number of zeros of p(x) in the interval .
By the rule of signs if q(x) has zero sign variations then p(x) has no root in . This leads to our iterative bisection strategy for finding the zeros of a polynomial on an interval. Given a sequence of intervals bisect each interval and find the sign variations of the polynomial projected on each subinterval; if it is zero then discard it, otherwise add it to the next sequence of intervals. This yields a sequence of intervals which may contain zeros of p. However we don’t know which intervals contain no zeros or multiple zeros.
Consider the case of one sign variation; for sufficiently small x, p(x) will have the sign of the terms at the end of the sequence (towards the constant term), and for sufficiently large x, p(x) will have the sign of the leading term. Consequently, since these signs are opposite, by the intermediate value theorem there exists a positive real x such that p(x) is zero. By Descartes’ rule of signs there is at most one real zero; hence there must be exactly one real zero.
Hence we can adjust our algorithm; if there is one sign variation then we add it to a list of definite zeros. However we’re still not sure that the intervals not containing zero will be eliminated; we need a sort-of converse to Descartes’ theorem.
This converse is given by a pair of theorems due to Obreshkoff: Given a degree n polynomial p(x)
When we translate this by a projective transformation we get a picture like this (taken from Arno Eigenwillig’s thesis)
If p(x) has at least p roots in above, then the transformed sign variations are bounded above by p. If p(x) has at most q roots in above then the transformed sign variations are bounded below by q.
Essentially the sign variations can only see zeros “nearby” within these arcs. Since these arcs get smaller as the interval gets smaller it is guaranteed that for sufficiently small intervals (depending on the distance between the roots of the polynomial) the number of sign variations will equal the number of roots.
In particular if all the real roots are simple then the bisection process above will eventually terminate; all intervals will eventually have zero sign variations (in which case there are no roots) or one sign variation (in which case they contain a root).
Hence we have an algorithm for isolating the distinct real roots of a polynomial p(x) over the integers on a bounded interval I.
It’s worth noting that transforming the polynomial can be done just with the operations multiply by two, divide by two and add (see On the Various Bisection Methods Derived From Vincents Theorem).
Why just the integers? Polynomials over the rationals can be solved by the same method, by first factoring out the denominators. The real numbers are much more subtle: we can’t calculate the gcd, and worse we can’t even necessarily calculate the sign of a coefficient! (I mean in an algorithmic manner; c.f. Richardson’s Theorem).
One excellent thing about this is the intervals are guaranteed to contain exactly one root; we can then use something like the bisection method to find the zeros to any desired accuracy.
I haven’t been sufficiently precise with my algorithm to analyse it, but there are implementations that use binary operations on polynomials of degree n on integer t bit coefficients.
There are, of course, other methods of finding all the roots of a real polynomial; but few of them are global and stable like this one. (Though a variation of Newton’s method isn’t a bad candidate, albeit without precise bounds).
Suppose we have an n-dimensional affine space over a skewfield k. We can construct an (n-1)-dimensional projective space by taking the pencils of lines as points (a pencil of lines is a complete set of parallel lines; that is the lines under the equivalence relation of parallelism), and 2-pencils of planes (complete sets of parallel planes) as lines, and so on. A less invariant way is to choose an origin, so we get a vector space; then each pencil is represented by a unique line through the origin, and each 2-pencil is represented by a unique plane through the origin; thus the points of projective space are lines through the origin and the lines are planes through the origin. Thus a projective space of dimension n can be constructed as the quotient of an affine or vector space of dimension n+1.
Now given an n-dimesional projective space, consider an arbitrary (n-1)-dimensional projective subspace P. We can use this to form an affine subspace of dimension n: The points are the projective points not on P, the lines are the set of all lines not in P, and two lines are said to be parallel if they intersect at the same point of P. Given any line l and a point L not on that line the line through L and (the intersection of l with P) is the only line parallel to l through L. Thus a projective space of dimension n can be decomposed into an affine space of dimension n and a projective space of dimension (n-1).
This second construction can in fact be extended to all projective planes. Given a skewfield we have a natural way of representing two dimensional (left)-affine lines, as the set of points (x,y) satisfying the equation y = xa+b for some a and b. For general planes we extend the notion of this triple T(x,a,b)=xa+b. A planar ternary ring is a set R with at least two elements and a ternary operation that satisfies:
Then the following construction, roughly following Wiebel (who’s following Hall), yields a projective plane by approximately the reverse of the construction above. The points are (the ordinary points), R and (the projective line at infinity). The lines are for each a in R and b in R, for each c in R (these are the ordinary lines) and (the line at infinity). One can check that this does in fact define a projective plane.
Two ternary rings (R, T) and (R, T’) are comparative if there exist permutations on R such that for all a,b, x in R. It is easy to show comparative rings yield isomorphic projective planes.
We can conversely construct a ternary ring from a projective plane, but there are clearly a lot of choices to be made; we need to choose a line at infinity, a y-axis and some isomorphisms between the y-axis minus it’s intersection with infinity and other lines, and use the lines and these isomorphisms to define T(x, a, b). Different choices need not yield comparative ternary rings (see here for a necessary and sufficient condition).
If we add additional structure this is sometimes unique; for instance given a projective plane coordinitised by an alternative division ring, isomorphic projective planes yield isomorphic alternative division rings (see Bruck and Kleinfeld – The structure of alternative division rings, for a proof). I’m not sure if this is the best result one can obtain; in the finite case the coordinate rings are isomorphic (in the sense of Hall ternary rings; ternary rings with a 1 and a 0) if and only if they are a finite field.
It’s interesting to note a generalised construction of projective planes of the first type for every division algebra over the real numbers (including the octonions). The points and lines of the space are each the 3-dimensional Hermitian idempotents of unit trace (that is 3×3 matrices P satisfying , so P is a projection, and trace(P)=1). A point P lies on a line Q when PQ+QP=0. It can be shown this is a projective plane (see Conway and Smith, On Quaternions and Octionions, for the details).
Then for d>=3 is equivalent to the projective space of lines over a division ring (or skew field).
Kolmogorov asked the question what projective spaces can we do analysis on? In order to do things such as find tangent lines we are going to need some sort of topology.
Kolmogorov apparently proved that for a (Desarguian) projective space if the set of points is compact and infinite, the set of lines is compact and the function mapping two distinct points to the line they lie on is continuous then the underlying division ring is infinite and locally compact (in a paper translated as “The Axiomatics of Projective Geometry” in Selected works of A. N . Kolmogorov edited by V. M. Tikhomirov). Such an object is called a continuous projective geometry.
In response Pontryagin proved (see his book “Topological Groups”) proved that every locally compact infinite division ring contains one of: the real numbers, the p-adic numbers, the power series over the integers modulo p (p prime). Moreover we can classify these by their connectedness and characteristic: if the division ring is connected it contains the real numbers, otherwise it is totally disconnected.
Combining this with the Frobenius theorem we have the following: A locally compact connected field is isomorphic to the real numbers, the complex numbers or the quaternions.
Separation theorems allow us to define regions and boundaries of regions, so we can start to talk about ‘relative lengths’ and ‘relative areas’. One way to approach the separation theorems in projective geometry is via ordered fields: Veblen and Young pursue such an approach; of course this doesn’t apply to an unordered field such as the complex numbers. Another is via topology; e.g. a line separates the plane it lies in into two (topologically) connected sets.
In some sense all this indicates the “natural” projective spaces to do calculus in are precisely the projective spaces over the real numbers, complex numbers or quaternions (and maybe the octonions?).
The calculus of real and complex numbers is well known; is there a corresponding exterior differential calculus of quaternions? Given two n-simplices in an n-dimensional affine space, there is a unique affine transformation from one to the other. The ratio of their hypervolumes is the determinant of the linear transformation. Is there an analogous determinant for quaternions (or octonions)?
Essentially no; Dieudonne extended the determinant to a non-commutative field by defining it as a map from matrices to the the division ring over its commutator subgroup (see Artin’s Geometric Algebra for details). This is about as good as you can do; any map from the general linear group on an n-dimensional (right) quaternion vector space to the quaternions that satisfies
then the image of the determinant is commutative.
To see an example of such an obstruction, consider the 2×2 quaternion matrices. Given a diagonal matrix would the determinant be ab or ba? For a commutative ring, a 2×2 matrix satisfies . A little experimentation shows there isn’t a similar formula for the quaternions (we can’t get rid of the off-diagonal elements). In fact taking the trace gives the formula for the determinant . If we try to apply this to a quaternion matrix we get . Notice that since ij=-ji, ik=-ki, jk=-kj this yields a real number. (The parallel actually extends into the spectral theory of quatenionic matrices)
In fact given any two distinct maps satisfying axioms 1-3, one is a real power of the other. One way to construct such a determinant is to notice that quaternions can be represented by 2×2 complex matrices of the form where a and b are complex numbers. We can then take the absolute value of the complex determinant (this is called the Study determinant, which is the square of the Dieudonne determinant). Alternatively we could repeat a similar expansion for complex numbers in terms of real numbers, giving a quaternion as a 4×4 real matrix. We then define the determinant of an nxn quaternion matrix, as the determinant of the corresponding 4nx4n real matrix; this is called the q-determinant and is the square of the Study determinant.
Interestingly it’s just as possible to go the other way, if we’re careful about what we mean by a geometry. I will loosely follow Artin’s book Geometric Algebra. In particular we have the undefined terms of point, line and the undefined relation of lies on. Then, for a fixed positive integer, the axioms are:
There are obviously a couple of definitions wanting. A linear manifold is a collection of points such that given any pair of distinct points in the collection, every point that lies on the line is also in the collection. The span of a set of points is the smallest linear manifold containing each of the points (that such exists follows from the fact the collection of all points is a linear manifold, and the intersection of two linear manifolds is a linear manifold). A plane is a set that is spanned by 3 points and no fewer.
We can define a dilation to be a mapping of points onto points such that the image of all points that lie on a given line lie on a line (that is a transformation that preserves lying on a line). We define a translation to be an injective dilation with no fixed points, or the identity.
Any line containing a point and the translation (or more generally an injective dilation) of the point is called a trace of the translation. A scalar multiplication is a group automorphism of the translations such that each trace of a translation is a trace of its image.
Given two scalar multiplications and we define their sum on a translation T by and their product by , define 0 to be the scalar multiplication sending all translations to the identity, and 1 to be the identity scalar multiplication.
Theorem: The scalar multiplications form a division ring under the multiplications given above if and only if .
Then we can go ahead and choose any point, which we call the origin, and d other points which, with the origin, span the space. We denote the d translations from the origin to the other points by . Then any vector can be uniquely written for unique in the space. Now given there is a unique translation from the origin to any point, we can identify a vector with its action on the origin. Thus we obtain a coordinitisation of the space; a correspondence with for some division ring K. This is of course not cannonical; our choice of the d + 1 points were arbitrary.
It is interesting to note how such geometric axioms (although carefully chosen) correspond so exactly with the algebraic notions of division ring and vector space.
What about d = 1 and 2? For d=1 there is no hope, since axiom 3 is trivial and we could just take the line with n points for n not a power of a prime number. Since there is no division ring with n elements the theorem could not be true.
The case d = 2 turns out to be quite interesting. In dimensions at least 3 we can prove Desargues theorem. This has two parts:
1. Let p, q, r be parallel lines, and let P, P’ be distinct points on p, Q, Q’ be distinct points on q and R, R’ be distinct points on R. If PQ is parallel to P’Q’ and QR is parallel to Q’R’ then PR is parallel to P’R’.
This is shown in the figure below (where line segments are lines). To prove this one assumes first the lines lie in different planes, and then to prove in the plane projects into the plane.
Artin proves that this is equivalent to for every pair of distinct points there exists a translation from one to the other. This underlies the vector structure.
2.(X) Let p, q, r be three lines which all meet in a point X and P, P’ lie on p, Q, Q’ lie on q and R, R’ lie on r. If PQ is parallel to P’Q’ and QR is parallel to Q’R’ then PR is parallel to P’R’.
A diagram is shown below. If it is true for one point X then it can be shown to be true for all points. Again the proof in dimensions at least 3 is done by first proving the case where the 3 lines are not coplanar and then projecting them into the same plane.
Geometrically Artin shows this is equivalent to: Given three collinear, distinct points P, Q, R there exists a dilation with fixed point P mapping Q onto R. This is essentially saying we can get to any point by a scaling operation, and underlies the scalar multiplication structure.
One may ask whether this axiom necessarily holds in dimension 2. It doesn’t. An interesting counterexample is the octonionic plane (the octonions are a non-associative (but alternative) division algebra over the reals). Because the octonions are non-associative you can’t really do linear algebra over them; in particular consider a potential line through an origin . Now consider the ‘line’ through the origin and another point on this ‘line’, because in general this line would have different points to the original.
For a plethora of examples of non-Desarguesian planes see this review.
Reflecting back there are a couple of interesting things to note about this construction. Axiom 3 is inherently 2-dimensional, so all the geometry of a d-dimensional affine space is determined by the geometry of its planes. Notice how the structure of 2 and 3 dimensions completely determines the structure of higher dimensions; this may have something to do with our familiarity with 2 and 3 dimensions in the choice of our axioms.
Axiom 3 can be replaced by a projective equivalent, such as
3P. (Veblen and Young) Given a triangle (that is three non-collinear points) any line that intersects two sides of the triangle (a side of a triangle is the line between two of the points, excluding the points themselves) intersects the third.
With any of these replacements all the appropriately projectivised statements above are true; in particular Desargues’ theorem has a more elegant statement. We can recover the affine space by choosing a point at infinity; an interesting question is whether the fields constructed from two different points at infinity are canonically isomorphic.
It’s also remarkable there is a geometric proposition that is satisfied if and only if the space is commutative, Pappus’ theorem.
Above all the geometric approach shows the space for geometry (as I’ve argued before) is the affine/projective plane itself, and not its group of transformations, the vector space. From an algebraic perspective that is to say the geometric element is not the vector space itself but the set on which the vector space acts transitively and freely (it can be represented by the same underlying set as the vector space, but does not have the same algebraic structure).
The modern concept of a limit, central to how we understand analysis today, was not formulated until (arguably) 1821 by Cauchy, despite calculus being invented in the late 17th century and limiting approaches extending back into Greek mathematics. A major reason for the time it took for the rigorous foundations of analysis to develop is because they were not really necessary – most manipulations in mathematics and science involved “well-behaved” analytic functions. (This was well before the notion of a set was in vogue, so a function was generally considered to be a “formula” or a geometric concept, not an arbitrary mapping of one set into another). However there were exceptions to this; for instance trigonometric series, used by Bernoulli to solve a vibrating string problem in 1753 and by Fourier in 1821 to solve a heat equation.
Trigonometric (or Fourier) series were useful for solving a wide range of physical problems, and at the same time challenged mathematical intuition, for instance Fourier found converged to a square wave – a discontinuous function.
In investigating these sorts of pathological functions rigorous notions of continuity, differentiability, uniform convergence and integrability arose. In particular Riemann’s definition of an integral (I can’t find his original definition, so this is a modern version)
A partition of an interval is a sequence of points . The mesh of such a partition is the maximum of for . A tagging of such a partition is points satisfying for .
A function on is integrable with integral if and only if for every positive quantity there is a positive quantity such that for any tagged partition with mesh less than , .
Informally the integral of a positive function is the area under its graph. The idea behind Riemann’s definition is to approximate the area by adding together rectangles, with the length given by a partition and height the value of the function at the tag. Riemann’s definition essentially says that if, as the width of these rectangles approaches zero, their sum approaches a constant number independent of the way we choose the rectangles, this must be the area under the graph.
There are ‘problems’ with Riemann’s definition though, one of which is that not every derivative is integrable. To demonstrate this we construct Volterra’s function.
We begin by constructing a Smith-Volterra-Cantor set (a.k.a a Fat Cantor set). Start (step 0) with the interval [0,1] and inductively at step n remove an open set of length from each of the connected subsets. The intersection of all these sets forms a nowhere-dense set. Its ‘outer measure’ is .
I won’t explicitly detail the construction of the Volterra function. It uses the function which is differentiable, but the derivative is not continuous at 0. The Volterra function then uses this function to construct a function that is differentiable, but the derivative is not continuous on the Smith-Volterra-Cantor set and so (by Lebegue’s criterion for Riemann integrability) isn’t Riemann integrable.
So it’s worth looking for a better integral. Since integration corresponds to finding the area of a graph, one method is to try to assign a size to sets in the plane. But even assigning lengths to subsets of the line is difficult.
Peano and Jordan assigned a size to sets using intervals (or in 2 dimensions, rectangles) using a method similar to the “proof by exhaustion” used by the Greeks. The idea behind proof by exhaustion is to prove the area of an object is A by proving it is not less than A and proving that it is not more than A. The inner content of a set is the supremum of the finite sums of lengths of intervals with non-intersecting interiors that are contained by the set. The outer content of a set is the infimum of the finite sums of lengths of intervals with non-intersecting interiors that contain the set. A set is Jordan measurable if its inner content is equal to its outer content and this value is called the Jordan measure.
Unfortunately the Smith-Volterra-Cantor set is not Jordan mesurable: it contains no intervals so its inner content is 0, but its outer content is 1/2.
Borel took a different approach, by defining the lengths of countable disjoint unions of intervals to be the sums of the lengths of the intervals, and the length of B\A to be the length of B minus the length of A. Consequently the Smith-Volterra-Cantor set is measurable.
However the cardinality of the Borel measurable sets is (the cardinality of the reals) since every Borel measurable set can be constructed from the intervals (which have cardinality ) by complementation and countable unions. To contrast the Cantor set has outer content zero, so every subset will have outer content zero, and hence be Jordan measurable. Since the Cantor set is uncountable this implies the cardinality of the Jordan measurable sets is at least .
Lebesgue’s criterion for a subset of [a,b] to be measurable is a subtle play on Borel’s and Peano-Jordan’s. The outer measure of a set S, , is the sum of the infimum of the sum of countably many intervals which have a union containing the set. The inner content of a set is the length of [a,b] minus the outer measure of the sets complement in [a,b], that is . A subset of [a,b] is Lebesgue measurable if its inner measure and outer measure are equal. A subset of the real line is Lebesgue measurable if its intersection with [-n,n] is for every positive integer n. Carathéodory came up with the equivalent criterion: a set E is Lebesgue measurable if for all sets A.
This final result is normally what is presented early in a course on Lebesgue integration, but I hope it seems a little less mysterious now. The countable unions and insisting if A contains E is essential to measure sets such as the Smith-Volterra-Cantor set. Using the outer measure ensures we automatically get the non-Borel sets of measure zero (and in fact a subset of the real line is Lebesgue measurable if and only if it is the disjoint union of a Borel measurable set and a set of measure zero).
The Lebesgue integral is great; it can integrate a much larger suite of functions than the Riemann integral, it has strong convergence theorems (dominated convergence theorem, Fubini’s theorem) and it readily abstracts (and forms the basis for modern probability theory). However it still can’t integrate every derivative: consider the sinc function , it is the derivative of the function defined on the whole real line with Taylor expansion , but it is not Lebesgue integrable over the whole real line. The problem is the integral oscillates too fast; the total area above the x-axis is infinite and the total area below the x-axis is infinite, but if you add them from the origin out they cancel to a finite sum (it is the limit of Riemann integrals). (There are examples on bounded intervals too).
There is an integral more powerful than the Lebesgue integral on the real line, the Henstock-Kurzweil-Denjoy-Perron integral, or as it is sometimes known, the generalized Riemann integral, defined on a bounded interval as follows:
The function f is integrable on [a,b] with integral A if for every positive number there exists a positive function on [a,b] such that every tagged partition satisfying , .
This integral (once extended to the whole real line) contains every Lebesgue integrable function, limits of Lebesgue integrable functions and some new functions to boot! In particular every derivative is integrable. (Robert Bartle’s A modern theory of integration gives an accessible exposition on the subject).
Of course there’s more: Cesaro-Denjoy integrals, approximate Perron integrals and generalizations to higher dimensions (where finding an integral for which ‘every derivative’ is integrable is, to my knowledge, unsolved).
I’d like to conclude by reflecting how solving problems in physics by questionably performing operations on infinite trigonometric series was a major source of inspiration for mathematics, and I wonder how much impact resolving the questionable path-integrals in Quantum Field Theory will have (and has already had) on mathematics.
To begin I want to consider linear representations of the cyclic group of order n: that is I want to assign to each element of the group a linear operator on an inner product space in a way consistent with the group structure [or if you prefer, to find a homomorphism from the cyclic group to the group of automorphisms of an inner product space (an orthogonal group)]. There are lots of ways to do this, for lots of different vector spaces – the simplest is to map every group element to the identity (the trivial (linear) representation).
It would be nice to have some sort of canonical linear representation. Given a set we can form a vector space by taking all formal linear combinations of its elements (that is we consider the elements of the set to be linearly independent vectors, and the vector space is their span). If a group acts on that set we can extend it to a linear representation of the induced vector space by extending the group linearly; this is called the permutation representation.
For example if the set is the vector space is three dimensional and consists of all elements of the form . The group of all permutations on three elements acts on the set, and given such a permutation it is represented by the linear mapping .
Now the group G acts on the set G by left multiplication, and so we can construct a permutation representation. This is called the regular representation of G.
What does this look like for a cyclic group of order n? The vector space has a basis of , and the group element 1 is represented by the linear transformation S satisfying (where addition is modulo n). The group element k=1+1+…+1 is represented by .
There is also a natural inner product and this is invariant under S (that is S is unitary). As a matrix .
Now since S is unitary it is normal and hence by the spectral theorem unitarily diagonalizable. So let’s look for it’s eigenvectors and eigenvalues: since it’s clear its eigenvalues must be nth roots of unity, so denote (the choice of sign, and to some extent root, is arbitrary). We can in fact easily see that is a normalised eigenvector of S with eigenvalue (go on, check it!). Actually the normalised eigenvectors are only determined up to an overall phase, so would work equally well, but I’ll stick to these phase conventions for convenience.
The diagonalising matrix is then .
So . In fact F diagonalises every group element by multiplication:
F is precisely the discrete Fourier transform (up to a choice of normalisation): if , then .
Many of the properties of the discrete Fourier transform follow immediately; we know it is unitary by the spectral theorem which is precisely the Plancherel theorem. In particular it is invertible, which gives completeness. One half of the shift theorem is also immediate . One can see from the explicit form for F that and so if we define the operator then (though this would be different if we had chosen a different normalisation condition), so applying F to the half of the shift theorem above gives the other half (is there an easier way to see this?).
What about convolutions? Given that each basis vector corresponds to a group element, there is a natural algebraic structure on the vector space, namely (where as usual addition is modulo n). This is precisely a convolution; Excercise: by requiring to be distributive and expanding in component prove . What about the convolution theorem? Well we don’t really have an idea of a multiplicative structure (yet) so it doesn’t really make sense.
What is the exact structure on V? There’s an inner product, but there’s also a relative ordering of the basis elements; it doesn’t matter where we start numbering the basis elements (except in the definition of convolutions) but S defines an order for them relative to each other. So to say the Fourier transform is defined by a complex inner product space is lying a little, because there is this extra structure. [Also, considering the Fourier transform is only defined up to a phase it could be more natural to think of two vectors being equivalent if they differ only by a phase.] Actually there is a much more natural way to introduce this structure.
There is another way to think of a permutation representation. We form the vector space associated to a set as the vector space of all linear functions from the set to the complex numbers. The basis vector corresponding to the element s is the characteristic function of s, which maps s to 1 and every other element to 0. (Exercise: Show this is equivalent to the description given before, at least if the set is finite). An arbitrary function can be decomposed into the basis of characteristic functions: . The action of a group element is .
Now let’s look back at the regular representation of the cyclic group through this lens. We consider functions , with the inner product and we have the shift operator given by . The Discrete Fourier Transform is given by . The diagonalisation property is that is a multiplicative operator, equivalent to pointwise multiplication by the function . (Indeed Halmos notes that any normal operator can be unitarily mapped to a multiplicative operator is one way of viewing the spectral theorem).
A convolution is then . Now taking the Fourier transform of a convolution of basis elements , and using that the pointwise product (no sum) means we can rewrite it as that is . Applying linearity gives one half of the convolution theorem: . The other half is readily obtained using . Thus the Fourier transform maps the additional ring structure given by pointwise multiplication to the convolution structure given by the regular representation.
So what have we got? We started looking at regular linear representations of the cyclic group, and to change to a basis in which the group operations were diagonal we invented the discrete Fourier transform.
The power in this idea is there are many generalisations. We could have a look at more complicated groups or even more general algebraic structures. The representation theory of cyclic groups is very simple since they are abelian, there’s a lot more involved in trying to diagonalize the representations of non-abelian groups. We could then have other notions of convolutions and Fourier-type transforms. We could also look at mapping to other vector spaces or even to different geometric structures. If instead of constructing vector spaces over the complex numbers we constructed it over finite fields we would get (for the right combination of dimension of the vector space and characteristic of the field) the finite Fourier transform which is important in coding theory. One could also look at what happens to direct sums, tensor products and the like of the regular representations.
As we were taught in high school the roots of the quadratic equation can be found by completing the square, giving .
There are some problems with implementing it: firstly we need to be able to take square roots (that is solve ). This isn’t too bad there are lots of algorithms, geometrically we can do it with a compass and a straightedge (indeed this is where the first examples of irrational numbers came from) and recently it’s been done using DNA. A more serious problem is round-off error: if then and so if you only calculate this using a few decimal places there will be siginificant roundoff error in (whether this is important depends on the application). A simple workaround is to notice that at most one of the roots will have this roundoff error and the product of the roots is c so we can use this to find the other one.
A much more detailed analysis of algorithms for solving the quadratic are in a pair of articles by James Blinn. In fact he also has a series of articles on solving the cubic using the analagous cubic formula. One important thing to draw from this is the amount of work involved in recasting the classical quadratic and cubic formulae in a numerically stable way. Also as Blinn points out the utility of these methods depends heavily on your tools and application: depending on your computer (whether it be a pen and paper, an old fashioned calculator, a GPU or a molecular computer) it may be faster to iterate a solution than solve using a formula, and whether it makes a difference depends on how many polynomial equations you have to solve and how long each takes to solve. [ There are specific zero finding methods for polynomials, for example the Jenkins-Traub algorithm that will converge much faster than generic methods such as Newton’s or gradient methods]. To solve a single cubic (or even a couple hundred) on a modern PC you wouldn’t notice a difference, but in some graphical applications you may need to solve thousands a second.
Incidentally Felix Klein found the quintic equation was tied up with the geometry of the icosahedron. In 1989 it was shown further that the quintic could be solved by an iterative algorithm, which (I think) means that to each quintic is assigned a rational function and the roots can be found by repeatedly applying the rational function. The whole kit – from the insolubility by radicals to this algorithm – is explained in detail in this excellent set of notes. (I would love an excuse to implement this algorithm in a stable manner).
Why limit ourselves to radicals? Why not consider a perfectly good solution (it’s in a familiar ‘nice’ form). This is the type of question Timothy Chow asks. More precisely he looks at the ‘EL numbers’, the smallest subfield of the complex numbers closed under exponentiation and its compositional inverse, taking logarithms and asks what sorts of equations can you solve with it. The answer isn’t known, it lies in transcendental theory.
It’s possible, once you know the trick to show and are transcendental. In fact the Lindemann-Weierstrass theorem states that given linearly independent algebraic numbers over the rationals their exponentials are linearly independent over the rationals. The Gelfond-Schneider theorem states that all values of are transcendental for and irrational. There are some more theorems and a handful of other transcendental numbers known but a great deal is still unknown, for example are and transcendental. The constant problem of determining when a given transcendental function is zero (useful for computer algebra) has only been solved or proven algorithmically undecidable in certain cases.
A huge conjecture in transcendence theory is Schnaul’s conjecture: given n complex numbers linearly independent over the rationals, then some collection of n terms taken from these numbers and their exponentials are algebraically independent. It would have strong implications: the Lindeman-Weirstrass and Gelfond-Schneider theorems are speical cases, it would imply that Euler’s identity is (in an appropriate sense) essentialy the only algebraic relationship between and , and would bring us closer to understanding which algebraic and transcendental equations are solvable in the ‘EL numbers’ and their closure (the elementary numbers).
Of course one often talks as well about elementary functions (functions generated by constant functions, identitity function and exponentiation under addition, multiplication, composition and their inverse operations) and it’s often said that isn’t elementary. This apparently can be proved using Picard-Vessiot theory and differential galois theory. This is very closely related to my previous posts on integrable systems, Lie algebras and symmetries.
One last trick to leave you with. Solving one linear equation is easy, and in first year mathematics courses we are taught how to solve systems of linear equations. Since we’ve discussed solving one polynomial equation it’s natural to ask how would one solve a system of linear equations… one approach is a Gröbner basis