# Representing float number in computer

Some stupid results always occurred in calculating the value of two float numbers. For example, in JavaScript, `0.1+0.2=0.30000000000000004`

. This wild result is the feature of the float number while representing in computer, which is not a bug. Storing float number in computer has to encode and decode, the algorithm is not the same as integer. IEEE754 is wildly used in computer, which use a formular to encode and decode float number.

Sign * Exponent * Fraction

We can apply this formular to convert a decimal number 3.14 to:

(-1) * 10^{-2} * 314

Now, we can fill these three parts into a 32bits binary container(the number 10 can be ignored with convention).

```
[1bit][8bits][23bits]
[-1] [2] [314]
```

The same idea applies in IEEE754, but it has a little bit difference. IEEE754 based on power 2 and the fraction part must convert to binary. The fraction number always started with 1.xxxx, and it multiply the **Exponent** to get the decimal. The formular can be applied to:

Sign * 2^{n} * (1+Fraction)

First of all, we should divide 3.14 by 2, to get the **Exponent**.

3.14 = 3.14 / 2 = 1.57 * 2^{1}

Next, we can apply the formular to:

(-1) * 2^{1} * (1+0.57)

For convention, we don't need to store the base number 2 and the integer 1 in the fraction. Unfortunately, 0.57 is not a binary number. The next step, we have to convert 0.57 to binary. The calculating is simple, multiply it with 2, if the result large than 1 and set 1 to binary bit, if not, set 0. We can convert 0.57 to:

```
0.57 * 2 = 1.14 | 1
0.14 * 2 = 0.28 | 1
0.28 * 2 = 0.56 | 0
0.56 * 2 = 1.12 | 1
0.12 * 2 = 0.24 | 0
0.24 * 2 = 0.48 | 0
0.48 * 2 = 0.96 | 0
0.96 * 2 = 1.92 | 1
.....
```

This processing is infinity and finally get a approximate value. We will get the result 10010001111010111000011 to fill into the fraction part.

The next step is to fill the **Exponent** part. IEEE754 use a bias number(127) to represent -126 ~ 127 range in a 8bits binary number. If we want to store 2^{1}, we must add bias number 127 and then convert 128 to binary 1000000.

The third part is Sign, is the same as integer, 1 means negative and 0 means positive. Finally, we can store -3.14 in a float binary as three parts as:

1 10000000 10010001111010111000011 Sign Exponent Fraction

## Converting IEEE754 to decimal

We can apply the same formular `Sign * Exponent * (1+Fraction)`

to convert the above binary result to decimal. The first step is to divide it to three parts.

1 10000000 10010001111010111000011 Sign Exponent Fraction

We can map the Sign and Exponent in a simple way, but the fraction parts that we should convert each bit to multiply 2^{-n}. So, we can convert the fraction part to:

10010001111010111000011 = 1x2^{-1}+0x2^{-2}+0x2^{-3}+1x2^{-4} +....+ 1x2^{-23} = 0.57

Finally, we can apply the formular to calculate the decimal:

**(-1) * 2 ^{128-127} * (1+0.57) = 3.14**

## The puzzles

There are two puzzles in IEEE754:

- Why does IEEE754 use a bias number?
- Why does IEEE754 use power 2 instead of power 10?

For converting a negative number, integer use the first bit to indicate the sign, the same method applies in IEEE754. In the exponent part, we can still apply this method, but IEEE754 use a bias number to represent the negative number. When we need to compare two float numbers, storing exponent parts like integer must decode first and then compare them. Using a bias number can avoid decoding. For example, we want to compare 3.14 and -3.14. Their binary is representing:

```
11000000010010001111010111000011
10111111010010001111010111000011
```

We can compare them directly and don't need to decode them first.

IEEE754 use power 2 trade off the time and space against the precision. Because Calculating power 2 is fast than power 10, and storing the fraction to binary can reduce the space. For example, if we want to store 0.5, storing decimal to binary must occupy 3 bits. If we convert it to binary directly, we just need to occupy one bit:

101 //decimal fraction 1 //binary fraction