DEC64: Decimal Floating Point (2020)

(crockford.com)

77 points | by vinhnx 11 days ago ago

58 comments

  • lifthrasiir 2 days ago ago

    Previously:

    https://news.ycombinator.com/item?id=7365812 (2014, 187 comments)

    https://news.ycombinator.com/item?id=10243011 (2015, 56 comments)

    https://news.ycombinator.com/item?id=16513717 (2018, 78 comments)

    https://news.ycombinator.com/item?id=20251750 (2019, 37 comments)

    Also my past commentary about DEC64:

    > Most strikingly DEC64 doesn't do normalization, so comparison will be a nightmare (as you have to normalize in order to compare!). He tried to special-case integer-only arguments, which hides the fact that non-integer cases are much, much slower thanks to added branches and complexity. If DEC64 were going to be "the only number type" in future languages, it had to be much better than this.

    • indolering 2 days ago ago

      Wtf is up with the clown car that is floating point standards?

      • maxlybbert 2 days ago ago

        Well, for one thing, IEEE-757 was a significant improvement on the vendor-specific ways of handling floating point that it replaced ( https://people.eecs.berkeley.edu/~wkahan/ieee754status/754st... ).

        I wasn't a big fan of floating point until I worked with a former college professor who had taught astrophysics. When possible, he preferred to use respected libraries that would give accurate results fast. But when he had to implement things himself, he didn't always necessarily want the fastest or the most accurate implementation; he'd intentionally make and document the tradeoffs for his implementation. He could analyze an algorithm to estimate the accumulated units in the last place error ( https://en.wikipedia.org/wiki/Unit_in_the_last_place ), but he realized when that wasn't necessary.

      • stephencanon 2 days ago ago

        IEEE 754 is a floating point standard. It has a few warts that would be nice to fix if we had tabula rasa, but on the whole is one of the most successful standards anywhere. It defines a set of binary and decimal types and operations that make defensible engineering tradeoffs and are used across all sorts of software and hardware with great effect. In the places where better choices might be made knowing what we know today, there are historical reasons why different choices were made in the past.

        DEC64 is just some bullshit one dude made up, and has nothing to do with “floating-point standards.”

        • dbcurtis 2 days ago ago

          It is important to remember that IEEE 754 is, in practice, aspirational. It is very complex and nobody gets it 100% correct. There are so many end cases around the sticky bit, quiet vs. signaling NaNs, etc, that a processor that gets it 100% correct for every special case simply does not exist.

          One of the most important things that IEEE 754 mandates is gradual underflow (denormals) in the smallest binate. Otherwise you have a giant non-monotonic jump between the smallest normalizable float and zero. Which plays havoc with the stability of numerical algorithms.

          • jcranmer 2 days ago ago

            Sorry, no. IEEE 754 is correctly implemented in pretty much all modern hardware [1], save for the fact that optional operations (e.g., the suggested transcendental operations) are not implemented.

            The problem you run into is that the compiler generally does not implement the IEEE 754 model fully strictly, especially under default flags--you have to opt into strict IEEE 754 conformance, and even there, I'd be wary of the potential for bugs. (Hence one of the things I'm working on, quite slowly, is a special custom compiler that is designed to have 100% predictable assembly output for floating-point operations so that I can test some floating-point implementation things without having to worry about pesky optimizations interfering with me).

            [1] The biggest stumbling block is denormal support: a lot of processors opted to support denormals only by trapping on it and having an OS-level routine to fix up the output. That said, both AMD and Apple have figured out how to support denormals in hardware with no performance penalty (Intel has some way to go), and from what I can tell, even most GPUs have given up and added full denormal support as well.

        • KingLancelot 2 days ago ago

          [dead]

      • welferkj 2 days ago ago

        The set of real numbers is continuous and uncountably infinite. Any attempt to fit it into a discrete finite set necessarily requires severe tradeoffs. Different tradeoffs are desirable for different applications.

        • tialaramex 2 days ago ago

          Almost All real numbers are non-Computable, so what we're most commonly reaching for is only the rationals, but the thing is DEC64 can't represent lots of those, it seems like a very niche type rather than, as the author asserts, the only numeric type you need.

      • CoastalCoder 2 days ago ago

        Can you be more specific?

      • rdtsc 2 days ago ago

        How would fix the standards, or if your point is about having too many, which one would you choose as the _one_?

  • mgaunard 2 days ago ago

    There are a bunch of different encodings for decimal floating-point. I fail to see how this is the standard that all languages are converging to.

    IEEE754 normalizes two encodings, BID and DPD, for decimal32, decimal64 and decimal128 precision presets. This is neither of those.

    Many libraries use an approach with a simple significand + exponent, similar to the article, but the representation is not standardized, some use full integral types for this rather than specific bits (C# uses 96+32, Python uses a tuple of arbitrary integers). It's essentially closer to fixed-point but with a variable exponent.

    The representation from the article is definitely a fairly good compromise though, specifically if you're dealing with mostly-fixed-point data.

    • moron4hire 2 days ago ago

      > I fail to see how this is the standard that all languages are converging to.

      Yes, you are failing to see it because it's not there.

      Crockford had a hot book amongst junior and amateur JavaScript developers 17 years ago. But he's never really been involved in any language standardization work. Even his self-described "invention" of JSON I wouldn't really call an invention and rather a discovery that one could send JS object literals instead of XML over the wire. This discovery also opened a new class of XSS exploits until browsers implemented JSON.parse.

      So, when he says "DEC64 is intended to be the only number type in the next generation of application programming languages," it's just the same, old bloviation he has always employed in his writing.

      • ramses0 2 days ago ago

        He literally says he "discovered" it!?!?! https://nofluffjuststuff.com/blog/douglas_crockford/2008/01/...

        ...primary source: """I do not claim to have invented JSON. I claim only that I discovered it. It existed in nature. I identified it, gave it a name, and showed how it was useful. I never claimed to be the first to have discovered JSON. I made my discovery in the spring of 2001. There were other developers who were using it 2000."""

        I see value in this semi-simplistic representation of DEC64, and to respond to a peer comment, consider "DEC64-norm", where non-normalized representations are "illegal" or must be tainted/tagged/parsed like UTF-8 before processing.

        His other "useful" contribution which I lament not seeing anywhere else is "The Crockford Keyboard" another "near-discovery" of existing useful properties in an endemic standard (the English alphabet): https://www.crockford.com/keyboard.html

        I really wish that the X-Box onscreen keyboard used this layout! You could imagine that if you could "hot-key" back to the left row (vowels), and have "tab-complete" (ie: shift-tab/tab) it could be a pretty comfortable typing experience compared to the existing "let's copy QWERTY(?!)".

        ...I feel him as a kindred spirit of pragmatism and complexity reduction (which necessarily) introduces more complexity since we live in a world where complicated things exist. Compared to IEEE floating point numbers, this DEC64 or "Number()" type seems like a breath of fresh air (apart from normalization, as mentioned!).

      • gwbas1c 2 days ago ago

        I do think DEC64 has merit: Most of the time, I'd prefer to be able to reason about how non-integers are handled; and I just can't reason about traditional floating point.

        Yes, Doug does bloviate quite a bit. (And when I asked him if there was a way to make callbacks in Node.js / Javascript simpler, he mocked me. A few years later, "await" was added to the language, which generally cleans up callback chains.)

  • RustyRussell 2 days ago ago

    A goid friend of mine worked on decimal floating point for IBM Power chips (I think it was Power 7 which had hardware support).

    Anyway, he insisted on calling it just "Decimal Floating". Because there was "no point".

    • sevensor 2 days ago ago

      Hilarious and apt.

      Either you want fixed point for your minimum unit of accounting or you want floating point because you’re doing math with big / small numbers and you can tolerate a certain amount of truncation. I have no idea what the application for floating point with a weird base is. Unacceptable for accounting, and physicists are smart enough to work in base 2.

      • zokier 2 days ago ago

        I'm pretty confident that dfp is used for financial computation. Both because it has been pushed heavily by IBM (who certainly are very involved in financial industry) and because many papers describing dfp use financial applications as motivating example. For example this paper: https://speleotrove.com/decimal/IEEE-cowlishaw-arith16.pdf

        > This extensive use of decimal data suggested that it would be worthwhile to study how the data are used and how decimal arithmetic should be defined. These investigations showed that the nature of commercial computation has changed so that decimal floating-point arithmetic is now an advantage for many applications.

        > It also became apparent that the increasing use of decimal floating-point, both in programming languages and in application libraries, brought into question any assumption that decimal arithmetic is an insignificant part of commercial workloads.

        > Simple changes to existing benchmarks (which used incorrect binary approximations for financial computations) indicated that many applications, such as a typical Internet-based ‘warehouse’ application, may be spending 50% or more of their processing time in decimal arithmetic. Further, a new benchmark, designed to model an extreme case (a telephone company’s daily billing application), shows that the decimal processing overhead could reach over 90%

        • sevensor 2 days ago ago

          Wow. OK, I believe you. Still don’t see the advantages over using the same number of bits for fixed point math, but this definitely sounds like something IBM would do.

          Edit: Back of the envelope, you could measure 10^26 dollars with picodollar resolution using 128 bits

      • nyrikki 2 days ago ago

        Decimal128 has exact rounding of decimal rules and preserves trailing zeros.

        I don’t think Decimal64 has the same features, but it has been a while.

        But unless you hit the limits of 34 decimal digits of significand, Decimal128 will work for anything you would use fixed point for, but much faster if you have hardware support like on the IBM cpus or some of the sparc cpus from Japan.

        OPAP Agg functions as an example are an application.

        • jcranmer 2 days ago ago

          > I don’t think Decimal64 has the same features, but it has been a while.

          Decimal32, Decimal64, and Decimal128 all follow the same rules, they just have different values for the exponent range and number of significant figures.

          Actually, this is true for all of the IEEE 754 formats: the specification is parameterized on (base (though only 2 or 10 is possible), max exponent, number of significant figures), although there are number of issues that only exist for IEEE 754 decimal floating-point numbers, like exponent quantum or BID/DPD encoding stuff.

          • nyrikki 2 days ago ago

            You are correct, the problem is that Decimal64 has 16 digits of significand, while items like apportioned per call taxes need to be calculated with six digits past the decimal before rounding which requires about 20 digits.

            Other calculations like interest rates take even more and cobol requires 32 digits.

            As decimal128 format supports 34 decimal digits of significand, and has emulated exact rounding, it can meet that standard.

            While items is more complex, requiring ~15-20% more silicon space in the ALU plus larger dataset size, compared to arbitrary precision libraries like BigNum it is more efficient for business applications.

            This looks like a digestible cite:

            https://speleotrove.com/decimal/decifaq1.html

  • lambdaone 2 days ago ago

    If you want the job done properly, this already exists: https://en.wikipedia.org/wiki/Decimal128_floating-point_form...

  • tgv 2 days ago ago

    What's the point of saying that it is "very well suited to all applications that are concerned with money" and then write 3.6028797018963967E+143, which is obviously missing a few gigamultiplujillion.

    • jcarrano 2 days ago ago

      No point whatsoever. If you have to deal with money you never use floating point. Either use arbitrary precision, or integers with a sufficiently small base like blockchains do (which can be also though of as fixed point). Also you would never be multiplying two money value (there are no "square dollars").

  • shawn_w 2 days ago ago

    I was expecting something about floating point formats used by some DEC PDP series computer...

    • PopePompus 2 days ago ago

      Yes, DEC did use a non-IEEE floating point format at least through the VAX-11 era. I was fooled by the title too.

  • dahart 2 days ago ago

    > It can precisely represent decimal fractions with 16 decimal places, which makes it very well suited to all applications that are concerned with money.

    How many decimal places do people use for financial calculations in practice? Google search’s AI answer said 4, 6, or sometimes 8. Is that true for large financial institutions, like banks and hedge funds and governments and bitcoin exchanges?

    I’ve heard lots of people saying floating point isn’t good for financial calculations, and I believe it. Would DEC64 actually be good for money if it has 16 base 10 digits, and if so, why? If you need 8 decimal places, you have 8 digits left, and start losing decimal places at around 100M of whatever currency you’re working with. I’m just guessing that working with large sums is exactly when you actually need more decimal places, no?

    • drob518 2 days ago ago

      Typically, you select a precision that you’re comfortable with and then use fixed point for everything. My broker seems to use 6 decimal places.

    • PaulKeeble 2 days ago ago

      I hate to break the news but banks use floating point numbers all the time, for financial calculations. They shouldn't but they do and they use all sorts of tricks like alternate rounding schemes and some pretty deep knowledge of floats to try and keep the books roughly right. I have never seen a Money type nor anything like DEC64 in a real commercial system, its all floats or ints/longs for pennies.

      • RaftPeople 2 days ago ago

        > I have never seen a Money type nor anything like DEC64 in a real commercial system, its all floats or ints/longs for pennies.

        I've been working in ERP land for a long time working with many different commercial systems of different sizes and market share and I've only seen NUMERIC/DECIMAL types used in DB for things like Price, Cost, Amount etc.

        The only time I've ever seen floating point is for storage of non-money type values, like latitude and longitude of store locations.

        Examples of a few top systems:

        SAP - Packed decimal

        Oracle NetSuite - Decimal

        Microsoft Dynamics - Decimal/Numeric

        Infor - Decimal/Numeric

      • dahart 2 days ago ago

        That’s not too surprising to me, I imagine that many number types are used, and it depends entirely on the task at hand? If you have deeper knowledge of what gets done in practice, I’m still curious what criteria and types and how many decimal points might get used in the most restrictive cases. What do people use in practice for, say, interest compounding on a trillion dollars? I can calculate how many decimal places I need at a minimum for any given transaction to guarantee the correct pennies value, but I don’t have first-hand experience with banks and I don’t know what the rules of thumb, or formal standards might be for safe & careful calculation on large sums of money. I would imagine they avoid doing floating point analysis in every case, since that’s expensive engineering?

        • jcranmer 2 days ago ago

          My electricity and gas bills both charge me $0.xxxxxx per unit price, or 10,000ths of a ¢, although the last digit is invariably a zero. I've also seen 4-5 digits on currency exchange places.

          I'd have to break out a calculator to be certain, but my guess is that most of these transactions amount to sum( round_to_nearest_cent(multiply(a * b))), where a is a value in thousandths of a cent--which is to say, there isn't a single "this is the fixed-point unit to use for all your calculations."

          For financial modeling, the answer is definitely "just use floats," because any rounding-induced error is going to be much smaller than the inherent uncertainty of your model values anyways. It's not like companies report their income to the nearest cent, after all.

          • dahart 2 days ago ago

            Right, the number of digits you need for any given calculation depends on the magnitudes of the numbers involved, and the acceptable precision of the result. BTW round to nearest might be unsafe, I’m certain that there are many situations where people will specifically avoid rounding to nearest, I would not assume that companies use that scheme.

            It seems like a decent assumption that financial companies are not spending engineering time (money) on a floating point analysis of every single computation. They must generally have a desired accuracy, some default numeric types, and workflows that use more bits than necessary for most calculations in exchange for not having to spend time thinking hard about every math op, right? That’s how it works everywhere else.

            The accuracy used for reporting doesn’t seem relevant to the accuracy used for internal calculations. It’s fine for large companies to report incomes to rounded millions, while it’s absolutely unacceptable to round compound interest calculations in a bank account to millions, regardless of the balance.

        • PaulKeeble 2 days ago ago

          Floats all the way. Its floats compounding errors so they estimate the error along the way and then adjust the results. Its floats all the way down!

          • dahart 2 days ago ago

            Oh now that is somewhat surprising! Is this 64 bit or 128 bit floats? 32 bit floats aren’t accurate enough to represent $1B to the penny. Do error values get stored as deltas and passed around with the float values? Would love to hear more about this, do you know of any good summaries of the engineering workflow online?

      • rdtsc 2 days ago ago

        Wonder if it depends on which side of banking: investments, accounting, clients side (web pages, and ui) vs the backend. Or if we're talking about central banks vs a small credit union.

        • PaulKeeble 2 days ago ago

          I have done retail and investment banking as well as hedge funds for front and back office. Everything from giant banks everyone knows through to small hedge funds no one has.

          • rdtsc 2 days ago ago

            Thanks. That’s fascinating. I have zero experience with any banking/finance stuff but through tropes and various anecdotal accounts was somehow sure floats were not used by banks. I guess it’s one of those persistent rumors that just refuses to die.

      • jcranmer 2 days ago ago

        The noob-expert meme is really apropos here, as floating-point is really the option to go for if you know nothing about it or you know a lot about it.

        The "problem" with floating-point is that you have to deal with rounding. But in financial contracts, you have to deal with rounding modes. Fixed-point (aka integers) gives you one rounding mode, and that rounding mode is wrong.

        • PaulKeeble 2 days ago ago

          With ints/longs they usually use something that alternates the rounding so that it spreads, but that alternation can be done on a system wide basis, on a machine basis or in an individual account.

          For floats what they tend to do is carry around the estimated error when they are doing a lot of calculations and then adjust for the error at the end to bring the value back.

          Both of them are trying to deal with the inherent issues with the representation being biased and ill suited for money in practice. But oddly this is never pulled together into a Money type because it really depends what you are doing at the time, sometimes you just round and move on and sometimes the error is expected to impact the result because you are dealing with millions/billions with calculations on 10/100s so its going to matter.

          But reality is the books are basically off by pennies to pounds every day because of these representations and its part of the reason no one worries about being off a little bit because the various systems do this differently.

  • pwdisswordfishy 2 days ago ago

    This seems optimized for fast integer operations. Except that if I only cared about integers, I'd use an actual integer type.

    • adamnew123456 2 days ago ago

      > Languages for scientific computing like FORTRAN provided multiple floating point types such as REAL and DOUBLE PRECISION as well as INTEGER, often also in multiple sizes. This was to allow programmers to reduce program size and running time. This convention was adopted by later languages like C and Java. In modern systems, this sort of memory saving is pointless.

      More than that, the idea that anyone would be confused about whether to use integer or floating-point types absolutely baffles me. Is this something anyone routinely has trouble with?

      Ambiguity around type sizes I can understand. Make int expand as needed to contain its value with no truncation, as long as you keep i32 when size and wrapping does matter.

      Ambiguity in precision I can understand. I'm not sure this admits of a clean solution beyond making decimal a built-in type that's as convenient (operator support is a must) and fast as possible.

      But removing the int/float distinction seems crazy. Feel free to argue about the meaning of `[1,2,3][0.5]` in your language spec - defining that and defending the choice is a much bigger drag on everyone than either throwing an exception or disallowing it via the type system.

      • PaulHoule 2 days ago ago

        There's something to say for languages like Python and Clojure were plain ordinary math might involve ordinary integers, arbitrary precision integers, floats or even rationals.

        In grad school it was drilled into me to use floats instead of doubles wherever I could which cuts your memory consumption of big arrays in half. (It was odd that Intel chips in the 1990s were about the same speed for floats and doubles but all the RISC competitors had floats about twice the speed of doubles, something that Intel caught up with in the 2000s)

        Old books on numerical analysis, particularly Foreman Acton's

        https://www.amazon.com/Real-Computing-Made-Engineering-Calcu...

        teach the art of how to formulate calculations to minimize the effect of rounding errors which resolves some of the need for deep precision. For that matter, modern neural networks use specialized formats like FP4 because these save memory and are effectively faster in SIMD.

        ---

        Personally when it comes to general purpose programming languages I've watched a lot of people have experiences that lead them to thinking that "programming is not for them", I think

           >>> 0.1+0.2
           0.30000000000000004
        
        is one of them. Accountants, for instance, expect certain invariants to be true and if they see some nonsense like

           >>> 0.1+0.2==0.3
           False
        
        it is not unusual for them to refuse to work or leave the room or have a sit-down strike until you can present them numbers that respect the invariants. You have a lot of people who could be productive lay programmers and put their skills on wheels and if you are using the trash floats that we usually use instead of DEC64 you are hitting them in the face with pepper spray as soon as they start.
      • bux93 2 days ago ago

        People routinely have trouble picking ints for monetary amounts, leading to all kinds of lovely rounding errors.

    • sfoley 2 days ago ago

      Near the end of the article, under Motivation:

      > The BASIC language eliminated much of the complexity of FORTRAN by having a single number type. This simplified the programming model and avoided a class of errors caused by selection of the wrong type. The efficiencies that could have gained from having numerous number types proved to be insignificant.

      DEC64 was specifically designed to be the only number type a language uses (not saying I agree, just explaining the rationale).

    • cozzyd 2 days ago ago

      Well I suppose it might be preferable if javascript used this type...

      • sjrd 2 days ago ago

        JavaScript engines do optimize integers. They usually represent integers up to +-2^30 as integers and apply integer operations to them. But of course that's not observable.

        • tgv 2 days ago ago

          I think it's up to 2^53.

          • hajile 2 days ago ago

            You are half correct about 2^53-1 being used (around 9 quadrillion). It is the largest integer representable with 64-bit float. JS even includes a `Number.MAX_SAFE_INTEGER`.

            That said, these only get used in the rare cases where your number exceeds around 1 billion which is fairly rare.

            JS engines use floats only when they cannot prove/speculate that a number can be an i32. They only use 31 of the 32 bits for the number itself with the last bit used for tagging. i32 takes fewer cycles to do calculations with (even with the need to deal with the tag bit) compared to f64. You fit twice as many i32 in a cache line (affects prefetching). i32 uses half the RAM (and using half the cache increases the hit rate). Finally, it takes way more energy to load two numbers into the ALU/FPU than it does to perform the calculation, so cutting the size in half also reduces power consumption. The max allowable size of a JS array is also 2^32.

            JS also has BigInt available for arbitrary precision integers and these are probably what someone should be using if they expect to go over that 2^31-1 limit because hitting a number that big generally means you have something unbounded and might go over that 2^53-1 limit.

  • spacedcowboy 2 days ago ago

    Atari 8-bit basic used something pretty similar to this [1], except it did have normalization. It only had 10 BCD digits (5 bytes) and 2 digits (1 byte) for exponent, so more of a DEC48 but still… That was a loooong time ago…

    It was slightly more logical to use BCD on the 6502 because it had a BCD maths mode [2], so primitive machine opcodes (ADC, SBC) could understand BCD and preserve carry, zero etc flags

    [1]: https://www.atarimax.com/freenet/freenet_material/5.8-BitCom...

    [2]: http://www.6502.org/tutorials/decimal_mode.html

  • cozzyd 2 days ago ago

    The memory savings from 32 bit or even 16 bit floats are definitely not pointless! Not to mention doubling simd throughput. Speaking of which, without simd support this certainly can't be used in a lot of applications. Definitely makes sense for financial calculations though.

  • YesThatTom2 2 days ago ago

    If they reversed the order of the fields, you could sort them by just pretending they are int64’s.

    • Someone 2 days ago ago

      No you couldn’t, given “Normalization is not required”, so we have

          10³ × 9 < 10¹ × 100000
  • mildred593 11 days ago ago

    Can't wait to have it in my language!

  • 2 days ago ago
    [deleted]
  • dmitrygr 2 days ago ago

    > nan is equal to itself

    > The result of division by zero is nan

    This is all a mistake!! IEEE put a lot of thought into making NaN != NaN and having Inf and NaN be separate things. As it stands 1e130 / 1e-10 == 5 / 0 == 0 / 0. Should that be the case? No. Why might this come up accidentally without you noting this? Imagine each of the following funcs is in a separate module written by separate people. frob() will be called errorneously with DEC64 and will not with float or double

       int main(int, char**) {
    
          //[...]
    
          rateOfThisThing = getRateOfThing1();
          rateOfThatThing = getRateOfThing2();
    
          if (rateOfThisThing == rateOfThatThing) {
             frob();
          }
    
          //[...]
       }
    
       void frob(void) {
          //[...]
       }
    
       DEC64 getRateOfThing1(void) {
           return calculateRate(mThingOneProgress, getTime() - mTimeWhenThing1started);
       }
    
       DEC64 getRateOfThing3(void) {
           return calculateRate(mThingTwoDone, getThingTwoRuningLength());
       }
    
       DEC64 calculateRate(DEC64 done, DEC64 time) {
           return (done + time / 2) / time
       }