No account? Create an account # kistaro

## Kistaro Windrider, Reptillian Situation Assessor

### Unfortunately, I Really Am That Nerdy

Complicated programming things I'd like to write if and only if nobody else has yet
Here's something that could be useful in numerical analysis, as far as C++ is ever used for numerical aplications.

A common problem in all programming languages except for those highly specialized for mathematics (and it even then must happen for irrational numbers) is that of floating-point precision. There are only so many different numbers you can represent in eighty bits, and it provides only an approximation to the set of real numbers- but usually, a sufficient approximation.

The c++ data type double, the most popular floating point type, on an Intel x86 processor (current generation), holds a mantissa worth about 16 digits of precision and an exponent part that can offer rather large values. The problem is what happens when operations are done that exceed this precision- and then the value drops, so what looks like it should be within precision has actually long since died to what's known as the cancellation catastrophe. The classic example:

double x = 2.49E80; //that's 2.49 X 10^80
x = x + 1;
cout << x << endl; //print x
x = x - 2.49E80;
cout << x << endl;

This block of code makes a very large number for X, adds one to it, prints the result, subtracts the original number, and prints the final result. If this was with real numbers instead of doubles, the obvious result would be that the last line will print one. This is not the nature of reality; the 1 gets lost in the noise. The result is error of greater than 100%, it is a relative error of infinite magnitude: no matter what you multiply the zero by, you're not getting the right answer! The +/- error at this point is actually on the order of 1064, conservative estimate, and a 1 is long gone.

It is quite possible to be completely unaware of this error until it slaps you in the face. This sort of error has killed a number of programs at TopCoder, where answers have to be within 10^-9, either relative or absolute (plus or minus that OR plus or minus that times the value, whichever gives a more generous range) to be deemed correct, when accumulating error through a loop or recursion gives a program its failure.

What would be useful is a class that behaves like a long double, with eighty-bit precision (a double is 64 bits), but is actually an object and so has methods to report the error on the object. Operator overloading and implicit constructor casts would make it appear like a double in all basic mathematical operations, but it would keep track of the growing error in the value, a +/- range kept by proper statistical rules. This would start out at -inf (MIN_INT- given the limits of long double exponents, this can be a short int) if an integer has been represented exactly and cast to an ErrorDouble (for a lack of a better idae), or -18 (representing 10^-18, which for a long double is actually worse than the real value of the error and I need to find out what it is) relative to the exponent in base 10 of the double; realistically, instead of tens, the error will be relative to powers of 2. You can also construct it with an explicit level of error; if this error is less than the error of a long double, it will be kept only until actual math is done (same with an int-constructed "perfect double") at which point it will revert to the error of the math, capped at the best precision the long double could have. Error propogates, of course, and this would be kept; multiplying or adding values does appropriate things to the error.

The thing is, I'm a lazy sort of lizard, and I expect that this already exists. If it does, where is it? And if it doesn't, is it really worth the trouble to write? The floating-point headaches I've run into don't seem to be consistent; multiplying two doubles doesn't always yield the same result. With that in mind, I'm not sure how you measure the error level in a double value. I understand your desire, though - a thought I once had was to create a more precise double by custom-building a class of larger size, like you suggest, and then implementing the CPU's ALU functions in overloaded operators. Ouch, I know - I guess that's why I never went for it.

I've never heard of an existing improved precision class; I get around it by keeping "fresh" doubles (setting the value explicitly often), by using an epsilon for comparisons, and by accepting that I can have either precision or speed. At speed with an epsilon, I can keep precision to six decimal places, which is great for what I need. But if you want to build a fast-running precision class and then open-source it, well... I'll use it! For integer data, I've often considered making my own class of long numbers where the system keeps track of every single digit in a dynamically resizing array.

For example, the number 31415 would be stored as
 = 5
 = 1
 = 4
 = 1
 = 3

And when you add 86 to it, you get

 = 91
 = 1
 = 4
 = 1
 = 3

The numbers carry over, resulting in

 = 1
 = 10
 = 4
 = 1
 = 3

and finally

 = 1
 = 0
 = 5
 = 1
 = 3

This won't work for fractions, though. And it's quite memory-intensive. I haven't found many uses for it, so I haven't implemented it ever. But it's one way to do it. I use apmatrix for something like that, but I understand a linked list would probably do the job better. A linked list is much less effective for this usage; a Standard Template Library vector is best suited to this task.

You've described an inefficient variant of Java's BigInteger class, and I'm quite certain that Boost has a class for this already. (Boost is a popular C++ library that I really need to study more because it's useful.) BigInteger uses an ArrayList (java.util's name for a Vector; they also have Vector, but it's actually deprecated and an empty extension of an ArrayList- it's a synonym) of byte (C++ would use unsigned char; in Java, there are no unsigned types (making the code for this much nastier) and a char is the length of a short- two bytes- for Unicode reasons) to store the values in an analagous way. However, instead of using BCD as you are (binary-coded decimal), it does its math in a binary manner; each "digit"- element of the array- goes from -127 to 127, it's interpreted from 0 to 255, giving you pretty much a base-256 number, and with reasonable bit-twiddling, just a base two number in blocks of eight. (As usual.) This is the most efficient implementation in memory or space- to store in base 2- because there are no unused sets of values and no complicated data manipulation to make the carry operations work properly. Since (in C++) you're using unsigned, keep a bool to track the sign of the number.

If B is the size of the BigInteger in bits, addition and subtraction take O(B) time. Here's the trick: Can you make multiplication and division run in time better than O(B2)?

I already have a class for storing fractions within the limits of a long long int, with silent casts from integers and to doubles, with all applicable operators overloaded. More accurately, I have a lump of code that's supposed to do that saved in my Snippets database in case I need a Rational (the class name) for a TopCoder match (I think you'd like TopCoder- http://www.topcoder.com , I'm Windrider there- and see my recent posts about winning \$50 in the TCO; you can try for the TCCC, the tournament for only college students, when it starts and see how far you get, but they have weekly "regular matches" anyway)... I've never tried to compile it, mind you, and it's completely untested because I haven't had the time. For that matter, since I plan to use it in TopCoder, I don't quite want to open-source it. I will tell you that writing it is half an hour's work at worst if you're comfortable with operator overloading, are familiar with Euler's GCD algorithm (or can look it up quickly), and like that sort of thing. In addition to cast to long double (which casts down to double or float) and cast down from long long (implicit denominator of 1; this of course casts down to long, int, short, or char), there's a constructor from long long and long long (explicit numerator and denominator) and, perhaps suprisingly, string (which can be explicitly constructed from a char*, actually) in which it expects a string in the format [numerator] [zero or more whitespace characters] "/" [zero or more whitespace charactres] [denominator], a popular way of representing fractions, and an overloaded << to ostream to spit out a string in the same format (when, of course, fed into good ol' ostream- ready for output, or reusage on ostringstream). There are explicit calls to get the numerator and denominator, but no explicit calls to directly alter them. Also note that the thing keeps itself in reduced terms, automatically simplifying itself after all mathematics...

I'd show you mine, but I don't think I'd be permitted to use it in TopCoder if I do so, and you'd have to write your own to use it in TopCoder ("all code must be your own") anyway. I think I've described it well enough that you should be able to do it- and it's good practice to do so anyway. 