In this issue I'll explore the diverse sources of the rounding problems in unit conversions as well as how to fix them permanently. It's a bit of a long post, so I'll focus on the conversions that don't have an offset. There exist only two units that have offsets in the CLDR table (Celsius and Fahrenheit), and handling them won't be too difficult once we settle on the correct approach for units without an offset.
I hope that at some future time the language will include some kind of a Decimal type. When it does, Amount should be compatible with it — it would be bad if adding Decimal in the future would cause breaking changes to Amount. To ensure that, I'll sketch an Amount design that includes Decimal inputs (in addition to Number and BigInt) and we can decide which parts we can omit because we don't want to implement them today.
For clarity:
- I will suffix Numbers by 𝔽 and Decimals by 𝔻. Real numbers will have no suffix.
- I'm skipping the conversion to exponential notation for Strings.
State Today
Let's look at a few conversions in today's implementation using Number inputs and outputs. Let's start with integral inputs that are representable exactly as Numbers:
| Input |
Unit In |
Unit Out |
Conv Factor |
Output |
| 3𝔽 |
m |
mm |
1000 |
3000𝔽 |
| 5𝔽 |
in |
cm |
127/50 |
12.7𝔽 |
| 84𝔽 |
in |
ft |
1/12 |
7𝔽 |
| 5𝔽 |
g |
tonnes |
1/1000000 |
0.0000049999999999999996𝔽 |
| 825𝔽 |
g |
kg |
1/1000 |
0.8250000000000001𝔽 |
The incorrect outputs above are due to multiple rounding. As I'll show later, doing the conversion more precisely will fix these incorrect results.
Next let's take a look at some non-integral Number inputs:
| Input |
MV(Input) |
Unit In |
Unit Out |
Conv Factor |
Output |
| 0.003𝔽 |
0.0030000000 0000000006 2450045135 1650553988 2928133010 8642578125 |
m |
mm |
1000 |
3𝔽 |
| 0.3𝔽 |
0.2999999999 9999998889 7769753748 4345957636 8331909179 6875 |
yard |
ft |
3 |
0.8999999999999999𝔽 |
| 0.352𝔽 |
0.3519999999 9999997957 1896346897 1196562051 7730712890 625 |
m |
cm |
100 |
35.199999999999996𝔽 |
Here the outputs are actually correct for Number arithmetic:
- 𝔽(0.299999999999999988897769753748434595763683319091796875 × 3) = 0.8999999999999999𝔽
- 𝔽(0.35199999999999997957189634689711965620517730712890625 × 100) = 35.199999999999996𝔽
Since in these examples the results are mathematical values correctly rounded to the nearest Number, that's the best we can do.
Desired State
Now let's imagine that we also have a Decimal type that's supported by Amount. What should the results to look like for the above examples?
| Input |
MV(Input) |
Unit In |
Unit Out |
Conv Factor |
Output |
| 3𝔻 |
3 |
m |
mm |
1000 |
3000𝔻 |
| 5𝔻 |
5 |
in |
cm |
127/50 |
12.7𝔻 |
| 84𝔻 |
84 |
in |
ft |
1/12 |
7𝔻 |
| 5𝔻 |
5 |
g |
tonnes |
1/1000000 |
0.000005𝔻 |
| 825𝔻 |
825 |
g |
kg |
1/1000 |
0.825𝔻 |
| 0.003𝔻 |
0.003 |
m |
mm |
1000 |
3𝔻 |
| 0.3𝔻 |
0.3 |
yard |
ft |
3 |
0.9𝔻 |
| 0.352𝔻 |
0.352 |
m |
cm |
100 |
35.2𝔻 |
What about Strings? The MV of a String is well-defined, so the results should be similar to the Decimal results:
| Input |
MV(Input) |
Unit In |
Unit Out |
Conv Factor |
Output |
| "3" |
3 |
m |
mm |
1000 |
"3000" |
| "5" |
5 |
in |
cm |
127/50 |
"12.7" |
| "84" |
84 |
in |
ft |
1/12 |
"7" |
| "5" |
5 |
g |
tonnes |
1/1000000 |
"0.000005" |
| "825" |
825 |
g |
kg |
1/1000 |
"0.825" |
| "0.003" |
0.003 |
m |
mm |
1000 |
"3" |
| "0.3" |
0.3 |
yard |
ft |
3 |
"0.9" |
| "0.352" |
0.352 |
m |
cm |
100 |
"35.2" |
Amount also allows rounding before conversion. Let's say that one does:
let a = new Amount(0.3515𝔻, { unit: "meter", fractionDigits: 3, roundingMode: "halfEven" });
let b = a.convertTo({ unit: "cm" });
Here the correct answer would be for b's value to be "35.2" because 0.3515𝔻 rounded to 3 fraction digits in halfEven mode is 0.352. It would be strange indeed (and incorrect) to do Number arithmetic here and produce "0.8250000000000001".
Implementation
Implementing this so that it works well on everyday usage cases is quite easy. Here's how I'd do it:
- Every unit in the CLDR table has a base unit conversion factor $f$ that is some rational number (the CLDR even expresses π as a rational number). Represent $f$ as a lowest-terms ratio of two positive integers $f = p/q$ where $p$ and $q$ have no common factors. These are CLDR constants so these would be precomputed and stored in a table, not calculated at run time.
- To convert a value $a$ whose current unit has conversion factor $f = p/q$ to a new value $b$ whose unit has conversion factor $g = r/s$, we need to compute $b = a \frac{f}{g} = a \frac{ps}{qr}$.
Number
If the input $a$ is a Number:
- Let $num = 𝔽(ps)$ and $den = 𝔽(qr)$. These are integers, so no rounding will take place unless the conversion factors are very large, well beyond typical everyday usage. For example, a zettameter (1021 m, which is over 100,000 lightyears) can be represented exactly as a Number, even though it's much larger than 253; a yottameter (1024 m) would get rounded.
- Let $b = 𝔽(a × num / den)$. To calculate this, do, for example (using C/C++ syntax):
double hi = a * num;
double err = fma(a, num, -hi);
double res = hi / den;
double rem = fma(res, -den, hi) + err;
double b = res + (rem / den);
If an intermediate overflow to ±∞ or NaN happens, revert to the simpler:
double ratio = num/den;
double b = a * ratio;
String
If the input $a$ is a String or BigInt, getting correct results is also quite easy, using integers or existing BigInt operations. All variables except $a$ in this section are integers or BigInts.
- Express $a$ = $m × 10^e$, where $m$ is an integer or BigInt.
- Let $n = m × p × s$
- Let $den = q × r$
- Let $res$ be the integer arithmetic quotient $n / den$ truncated to an integer and let $rem$ be the remainder.
- If $rem = 0$, then $res$ is exact; otherwise:
- Generate as many decimal fraction places of the result as we want by repeatedly multiplying $rem$ by a power of 10 and doing the division again. For example, to generate the next 20 decimal places at once:
- Let $n_2 = rem × 10^{20}$.
- Let $res_2$ be the integer arithmetic quotient $n_2 / den$ truncated to an integer and let $rem_2$ be the remainder.
- The result now is $res + res_2 × 10^{-20}$. If more fraction digits are desired and $rem_2 ≠ 0$, keep doing this.
- Multiply the result by $10^e$; this is just an exponent shift.
At this point we have a choice to make. The above will produce the correct results for the String cases using nothing more than existing BigInt functionality already built into the language. However, there is also a desire to not implement such arithmetic. So we have a tradeoff:
- Do the simple calculations above. There is already precedent for such computations things in the Intl number formatting code.
- Do the calculations on Numbers only. If the user tries to do them on a String, that's an error; if the user gives us a String but really wants to use the faulty Number conversion arithmetic, they should first convert the String amount to a Number amount.
- Do the calculations incorrectly on Strings. Producing incorrect results is hostile to users and to future compatibility with Decimal, requiring either flags or breaking changes, so I would not be in favor of this alternative.
In this issue I'll explore the diverse sources of the rounding problems in unit conversions as well as how to fix them permanently. It's a bit of a long post, so I'll focus on the conversions that don't have an offset. There exist only two units that have offsets in the CLDR table (Celsius and Fahrenheit), and handling them won't be too difficult once we settle on the correct approach for units without an offset.
I hope that at some future time the language will include some kind of a Decimal type. When it does, Amount should be compatible with it — it would be bad if adding Decimal in the future would cause breaking changes to Amount. To ensure that, I'll sketch an Amount design that includes Decimal inputs (in addition to Number and BigInt) and we can decide which parts we can omit because we don't want to implement them today.
For clarity:
State Today
Let's look at a few conversions in today's implementation using Number inputs and outputs. Let's start with integral inputs that are representable exactly as Numbers:
The incorrect outputs above are due to multiple rounding. As I'll show later, doing the conversion more precisely will fix these incorrect results.
Next let's take a look at some non-integral Number inputs:
Here the outputs are actually correct for Number arithmetic:
Since in these examples the results are mathematical values correctly rounded to the nearest Number, that's the best we can do.
Desired State
Now let's imagine that we also have a Decimal type that's supported by Amount. What should the results to look like for the above examples?
What about Strings? The MV of a String is well-defined, so the results should be similar to the Decimal results:
Amount also allows rounding before conversion. Let's say that one does:
Here the correct answer would be for
b's value to be "35.2" because 0.3515𝔻 rounded to 3 fraction digits in halfEven mode is 0.352. It would be strange indeed (and incorrect) to do Number arithmetic here and produce "0.8250000000000001".Implementation
Implementing this so that it works well on everyday usage cases is quite easy. Here's how I'd do it:
Number
If the input$a$ is a Number:
If an intermediate overflow to ±∞ or NaN happens, revert to the simpler:
String
If the input$a$ is a String or BigInt, getting correct results is also quite easy, using integers or existing BigInt operations. All variables except $a$ in this section are integers or BigInts.
At this point we have a choice to make. The above will produce the correct results for the String cases using nothing more than existing BigInt functionality already built into the language. However, there is also a desire to not implement such arithmetic. So we have a tradeoff: