What would it take to add refinement types to Rust?

(yoric.github.io)

128 points | by Yoric 2 days ago ago

76 comments

  • amenghra 20 hours ago ago

    Units might seem simple but they have a ton of edge cases. Do you want to be able to add inches and feet? Be careful about potential precision/rounding issues. What is the unit for a temperature delta? You can’t simply keep the original unit (eg C or F) because conversion from F to C is a different rule than ΔF to ΔC. Etc.

    Units do prevent bugs in programs, so they have an important role to play. But they also need to be designed very carefully.

    Java adopted units via JSR 385 (https://belief-driven-design.com/java-measurement-jsr-385-21...)

    • dwattttt 19 hours ago ago

      > Do you want to be able to add inches and feet?

      This probably doesn't have to be too complicated; the usual answer for Rust is "no". Rust doesn't even let you "just" add two unsigned integers of different sizes. Following that design, I would imagine units would require an explicit "turn feet into inches" or the other way around.

      • valenterry 17 hours ago ago

        That's where Scala shines. I wrote about this here a bit: https://valentin.willscher.de/posts/contextual-syntax/

        Rust is heavily inspired by Scala, but I guess achieving something like the examples in my post is difficult. I really hope Rust finds one way or another to make it work. Because simply forbidding everything all the time isn't even the safest way - it drives many people to just avoid it altogether and use unsafe code.

        • nine_k 7 hours ago ago

          While at it, could you give a few examples that illustrate how Rust was inspired by Scala proper, and not Haskell / SML / OCaml (which also influenced Scala)?

          • akkad33 5 hours ago ago

            This page from official Rust website of its influences does not mention Scala. https://doc.rust-lang.org/reference/influences.html

            In fact this is the first time I've seen anyone say Scala influenced Rust let alone "heavily". Seems like a stretch

          • amelius 6 hours ago ago

            It says here:

            https://en.wikipedia.org/wiki/ML_(programming_language)

            that ML influenced Rust, Scala, Haskell and OCaml, so that's the common denominator.

            However, although Rust wants to be an ML style language all the idioms break because there's no GC.

          • threeseed 5 hours ago ago

            So I regularly jump between Scala and Rust and it's more a feeling.

            In that overall the way option types, pattern matching, FP operations e.g. map/filter, immutability by default and type parameters have been implemented are very similar to Scala. And since these make up a large percentage of the everyday code you write you see the similarity as being larger than maybe it is.

            • akkad33 5 hours ago ago

              All these are just functional features you find in every functional language though. Rust was highly influenced by OCAML and other ML languages

              • threeseed 5 hours ago ago

                Hence why I said it was a feeling.

                And I think the fact the overall style of the language e.g. braces, semicolon, method signatures, loop handling is so similar to Scala versus OCaml contributes to this.

      • vlovich123 18 hours ago ago

        Might be annoying if you’re working with floating point but given that each conversion introduces error, it’s probably good to be explicit and recommend internally to be consistent

        • stouset 18 hours ago ago

          Thanks to generics you could theoretically work with whatever underlying type makes the most sense for your use-case.

              let v1 : Inches<Ratio> = Ratio::new(5, 8).into();
              let v2 : Inches<Ratio> = Ratio::new(3, 8).into();
          
              let v3 : Feet<_> = (v1 + v2).into();
          
              assert_equal!(v3, Ratio::new(1, 12));
          • throwawaymaths 13 hours ago ago

            no that's not the point. if you let rust automagically decide when and where to apply conversions you could easily wind up in a situation where you have more operations than you need, which increases numerical error, and also be a bitch to uncover or refactor to minimize conversions.

            • stouset 12 hours ago ago

              Rust doesn’t automatically apply conversions, full stop.

              And with a thoughtful approach to the API, you could avoid numerical error entirely by using integral types.

              • amenghra 8 hours ago ago

                That’s a good starting point. The temperature delta issue remains: adding and subtracting temperatures should create a different unit. Thankfully there aren’t a lot of non-linear units.

  • weinzierl 20 hours ago ago

    I find the idea of being able to specify numerical types with arbitrary ranges very appealing.

    In Pascal these are called range types, e.g.

    month: 1..12

    would define an integer where the type system would ensure that it is always between 1 and 12.

    Apart from Ada this seems to be an alien concept to all other languages. The concept of a "range type" also seems to have other meanings.

    What is the Pascal "range type" properly called in type theory and what is its relationship to refinement types?

    • touisteur an hour ago ago

      Ranges are awfully useful in Ada, but a very small help in dimensionality/units analyis.

      If you mean to go all the way on units, dimensions and typing, there's a bestiary there, quited maintained (with several Ada 'takes' too) https://www.gmpreussner.com/research/dimensional-analysis-in...

    • 3eb7988a1663 19 hours ago ago

      Nim calls these subranges https://nim-lang.org/docs/manual.html#types-subrange-types

        Subrange = range[0..5]
        PositiveFloat = range[0.0..Inf]
    • frogulis 19 hours ago ago

      Anybody correct me if I'm wrong -- in Pascal these subranges can specify the size of the variable in memory and check compile-type assignment to a literal value, but don't (can't) do much else.

      There's the concept of dependent types in e.g. Idris which lets us correctly do this sort of range check throughout the program (not just on literal assignment) but it comes with strict requirements on what the compiler can allow, such as no unbounded recursion, because checking dependent types is roughly equivalent to running the program.

    • evertedsphere 10 hours ago ago

      it is a refinement type over the base number type, where the predicate constrains the number to lie in an interval

      e.g. { x:Int8 | 1 ≤ x ≤ 12 }

    • bspammer 19 hours ago ago

      I frequently wish I had a natural number type. So many programs would benefit from the type system guaranteeing that numbers are never negative.

      • archargelod 15 hours ago ago

        Nim has Natural and Positive (signed int) types that checked at compile and runtime. You can also define arbitrary range types for any ordinal type, for example enums:

          type
            Month = enum Jan, Feb, Mar, Apr, May, ...
            SpringMonth = range[Mar..May]
          
          var m: SpringMonth = Mar
        
        It will raise under/overflow exception if value falls out of range.
      • josephg 18 hours ago ago

        Rust has this one - though the ergonomics are a little awkward. Its called NonZero:

        https://doc.rust-lang.org/std/num/type.NonZeroU32.html

      • frogulis 19 hours ago ago

        Easy enough to write in some languages... as long as you only need to support addition and multiplication, or a very limited set of numbers.

    • williamdclt 7 hours ago ago

      I’ve seen “month” being implemented in typescript as a big sum type ‘1 | 2 | 3 | … | 12’ :) probably ranges could just be syntactical sugar on top of that, but I suppose it probably causes big inefficiencies for the compiler

    • riffraff 9 hours ago ago

      raku has "type subsets" which can do this (and more)

      https://docs.raku.org/language/typesystem#subset

    • cpa 5 hours ago ago

      Haskell calls it Ord, for ordinal. https://en.m.wikipedia.org/wiki/Ordinal_data_type

      • DylanSp 3 hours ago ago

        Haskell's Ord typeclass is for ordering; it just represents types that have a total order, it doesn't represent ranges.

        • cpa an hour ago ago

          My bad the ordinal data type class is Enum not Ord!

    • munchler 20 hours ago ago

      > an integer where the type system would ensure that it is always between 1 and 12

      How does Pascal handle overflow/underflow? E.g. Month 10 + Month 11 = Month 21?

  • thesuperbigfrog 20 hours ago ago

    I would love to see refinement types added to Rust.

    They are very handy for preventing errors and expressing intent in Ada:

    https://learn.adacore.com/courses/Ada_For_The_CPP_Java_Devel...

  • nmilo a day ago ago

    It should be easier now with const generics, I know the popular C++ library looks like Value<Unit, T, L, M, ...> where the letters are numbers representing the dimension of time, length, mass, etc. So m/s^2 would always be Value<f64, -2, 1, 0, ...>. By keeping it normalized you don't need Equivalent or whatever

  • JoshTriplett 19 hours ago ago

    As a related concept, we've talked about adding "pattern types" to Rust: `Result<T> is Err(_)` (the type of a Result that you've just confirmed is an `Err`, so you don't have to handle the `Ok` case), or `u32 is 1..` (equivalent to `NonZero<u32>` but more flexible).

    • bonzini 18 hours ago ago

      Is the former different from a hypothetical Result::Err type (I don't know if you'd spell that Result::Err<T> or Result::<T>::Err)?

      • n_plus_1_acc 18 hours ago ago

        Different solutions to the same problem. Pattern types are more general than enum variant types.

      • JoshTriplett 18 hours ago ago

        Same concept, but generalized to any type, not just enums.

  • loeg a day ago ago

    The author didn't mention this, but maybe there could be a type (unit) canonicalization step that always produces the same reduced/simplified type for any equivalent set of Mul/Divs? So that you don't need the later equivalency check.

    E.g. with the article's example, where 'a' is Meters and 'b' is Seconds, 'a/(b*b)' and 'a/b/b' both have type 'Div<Meters, Mul<Seconds, Seconds>>' instead of one having the type 'Div<Div<Meters, Seconds>, Seconds>'.

    For only Muls and Divs you can basically just have a histogram of units to powers (e.g., m/s^2 => m: 1, s: -2) which uniquely represent equivalent types.

    • hinkley a day ago ago

      James Gosling looked at this and wrote about it at the time. It can be a bit mind bending. Especially with units named after people.

      If you’re looking at statistics for current or power the type system might try to convert it to joules even though you wanted to look at average wattage.

      • jillesvangurp 3 hours ago ago

        I used javax.measure a few years ago on a project that was about building a material search engine. It featured search by all sorts of chemical and physical properties. Complete with all the wonderful units that are used around the globe to measure tensile strength, breaking point, conductivity, magnetic properties, etc. To avoid comparing apples to oranges when searching, being able to normalize and convert between different units is key. Additionally, allowing to user to search with their preferred units is also important. We supported hundreds of different properties for a wide range of ceramics, metal alloys and other materials.

        Not the easiest framework to work with but pretty well thought out. A good starting point if you are looking to reinvent that wheel.

      • epcoa 20 hours ago ago

        I do not understand, average wattage/power would be integration of wattage (joules) divided by time (so back to watts), it’s still watts? Under what implementation would you end up with joules that isn’t just oddly broken in general?

        I guess I could see a naive implementation (confusing integration/sampling and discrete summation) going the other way, erroneously ending up with a nonsensical W/s.

        Also don’t understand what it has to do with eponyms, which are just substitutes for base units, either your DA works or not, no? Average wattage is kg⋅m²⋅s⁻³ not kg⋅m²⋅s⁻² (joules) or kg⋅m²⋅s⁻⁴

      • loeg 21 hours ago ago

        Yeah, but equivalent units with different names is sort of only a display/formatting issue, right?

        • weinzierl 20 hours ago ago

          This is a matter of perspective.

          For example the unit Sievert is an official SI unit despite being just J/kg. This is because confusing equivalent dose and absorbed dose, which also has the unit of J/kg, could be very dangerous.

          Note, that this is different from J sometimes being written as Ws. While there are informal conventions, when we use J and when Ws, using the unconventional one would not be technically wrong because 1 J is simply 1 Ws, whereas 1 Sv is not necessarily 1 J/kg when the later is an absorbed dose.

          I think one could reasonably disagree with these decisions but that is how the SI people see it.

          • loeg 2 hours ago ago

            Sure, but I think this is still better than declaring type system bankruptcy because determining whether trees of units are equivalent is hard. You could do newtype wrappers in situations where it is warranted.

          • bonzini 18 hours ago ago

            Sievert is more of a "newtype" in Rust/Haskell terminology than a separate unit. Likewise for radians and moles.

            • namibj 8 hours ago ago

              Torque is actually J/rad.

          • dwattttt 19 hours ago ago

            > This is a matter of perspective.

            I'd make a stronger statement here; this is a specific example of when having units is most powerful. When even though two things are expressed in some common form, they nevertheless represent something different.

            > Yeah, but equivalent units with different names is sort of only a display/formatting issue, right?

            This could be said of two u32s as well.

            • pests 12 hours ago ago

              > When even though two things are expressed in some common form, they nevertheless represent something different.

              This is where people go wrong trying to DRY and other refactors. Slightly forced example but

                function averagePerClassroom(total) { return total / 30; } // 30 kids per class
                function averagePerMonth(total) { return total / 30; }  // assume 30
              
              "Oh, the function body is the same therefore lets refactor this into an "averagePer" function" expect its two completely different concepts and once the code calculates the actual days per month or once classes are no longer 30 people suddenly things need to be un-refactored, or what I see more often, is just branching off inside the new single function based on an argument flag. Horrible.
        • amalcon 15 hours ago ago

          A joule, notably, is a newton-meter. Torque is also measured in newton-meters, but the meters are perpendicular to the newtons.

          Adding energy to torque is rarely going to be intentional.

    • jiggawatts 20 hours ago ago

      A simple way to do this is to store a vector of the powers.

      For example momentum is kilogram * meter / second, which is MASS^1 * LENGTH^1 * TIME^-1

      As a vector, this can be represented as (1,1,-1) where the positions are M, L, T respectively.

      In that format velocity is represented as (0,1,-1), acceleration is (0,1,-2), etc...

      This is automatically canonicalised and much easier to manipulate than a tree of operations.

      Of course, this assumes uniform units such as CGS or MKS in something sane like the metric system. Conversion back and forth is generally straightforward, as long as the types encode the system used. E.g.: CGS<1,1,-1> and MKS<1,1,-1> both represent momentum, but at different scales.

      Imperial also works, and other base units can be added to extend the system. This can include things like current, temperature, moles, etc...

      See: https://en.wikipedia.org/wiki/Dimensional_analysis

      • loeg 19 hours ago ago

        > This is automatically canonicalised and much easier to manipulate than a tree of operations.

        As long as the vector is sorted by unit, yeah. With that caveat, it's the same idea.

        • adgjlsfhk1 3 hours ago ago

          there are 7 si units, and you can represent all the powers you need in 8 bits, so you can pack every reasonable SI unit in 64 bits

        • jiggawatts 19 hours ago ago

          It’s a positional system with the same basic measurement units always at the same index locations.

  • eddsolves 10 hours ago ago

    I think liquid Haskell is a great example of refinement types?

    https://ucsd-progsys.github.io/liquidhaskell/

  • jph a day ago ago

    Type refinements are a great concept and I'd love to see them in Rust. And double-refinement types are great for helping with conversions, such as with Rust From/Into, and potentially a dynamic converter function.

    Examples of double-refinements that I'd like:

    - Common units like Length:Meter and Length:Foot.

    - Color bits like Color:RGB24 and Color:CYMK24.

    - Worldwide currency like Money:USD and Money:GBP with a converter function that knows exchange rates.

    - Human languages like String:English vs String:Cymraeg with a converter function that knows translations.

    • skissane 20 hours ago ago

      > - Worldwide currency like Money:USD and Money:GBP with a converter function that knows exchange rates.

      Exchange rates vary over time, so you'd arguably need a type which includes a timestamp (e.g. "USD 1000 at 2024-12-25").

      And that's ignoring all these other complexities such as the spread, different currency converters offering differing rates, unofficial and multiple official rates in countries with currency controls (e.g. Argentina), hedging, etc

      • magicalhippo 6 hours ago ago

        > multiple official rates

        One such case is customs agencies publishing their own exchange rates for use in custom declarations, for example here[1] for the US or here[2] for Sweden.

        [1]: https://www.cbp.gov/trade/document/report/daily-foreign-curr...

        [2]: https://tulltaxan.tullverket.se/arctictariff-public-web/#!/t...

      • jaza 20 hours ago ago

        Plus, perhaps the biggest complexity of all is that the currency rates are often not free, particularly if you want "live pricing" (updated every few seconds), and particularly if it's for commercial use. And, the fact that they may or may not be free, also illustrates well the fact that there are no definitive rates, there are only "rates according to X".

    • spockz 20 hours ago ago

      The currency converter is the odd one out because it context dependent. (Time, contract, etc)

      • 0xFF0123 20 hours ago ago

        Similar to time with a timezone?

  • ordu 19 hours ago ago

    I'd really like to have them. If I'm writing an algorithm that maps interval (0, 1) of probabilities onto different intervals of u32 integers, and probabilites are also represented as integers (an integer part of p*2^n), I will want to have different units for those integer probabilites and their mapped values. I don't want to mix them accidentally and to add probability and a mapped probability accidentally. Probably I dont want to add cumulative probability and a probability, though I'm not really sure about that, because sometimes I want to add (for example when calculating cumulative probabilities) and I want to compare them. Though maybe it will work with some exotic rules, like CumProb-CumProb -> Prob, while CumProb+CumProb is forbidden?

    And there are more examples of that, I can have probabilities from different distributions, I don't want to add them accidentally, though multiplications of them is all over the place, so let it be. Of course I have no hope that any language could deal with the explosion of types that are needed to represent this, but if compiler just gave me f64 as a result of multiplication of probabilites, I'd be happy.

    It can be done in Rust on case by case bases, but it is a lot of boilerplate. The issue is the typedef IntProb = u32, treat IntProb like an alias to u32 and rustc converts these types into each other silently like C compiler converts int to char. One can do struct IntProb(u32), but then it would be needed to implement traits like Add, Mul, Cmp and so on, which is possible but it is too much work. If it was possible to force rustc to treat typedef MyType = {NumericType} as a distinct numeric type that requires explicit conversion into NumericType, while retaining all the traits of NumericType, just substituting in them NumericType with MyType, it would be great.

    I think, that all the complexities described in the article stem from the attempt to create an universal instrument that can do everything and to keep bees. The real difficulty is to pick a small subset of wants, that will cover 80% of needs, while being really simple. I see no issues with occasional .into::<Length<f64>>(), like I see no issues with (my_struc.index as usize), if I keep index as u8 to spare memory, but use it as usize of course. I see no need in different units for the same quantity, because in any case I'd want to convert everything into the same uniform units before I start adding and multiplying. But I'd like to have restrictions on available operations between different types, and I'd like to have a possibility to add optional dynamic checks for type value, that I could turn on for a debug build and turn off for a release one. For example, I'd like to check that any probability p fits into [0, 1].

  • Waterluvian 21 hours ago ago

    If I understand this properly, I’ve needed this in typescript a lot.

    I can make `type uuid = string` for self documentation, but a lot of plugins will just label it “string” and developers can (and have) mistakenly put some other identifier, like the robot’s hostname.

    Of course we validate at the API but it’d be more skookum if we could prevent accidental wiring together of front-end components that make this error.

    String literals help a ton. Gosh they’re wonderful to care about the shape of a string in the type system. But sometimes I really want to say “strict uuid” as in “I don’t care if it quacks like a duck, it’s not called duck.”

    • feznyng 15 hours ago ago

      As spockz pointed out, you’re looking for the new type pattern. Rust supports this explicitly but you’ve got to do some workarounds to get it in typescript.

      https://kubyshkin.name/posts/newtype-in-typescript/

      Unfortunately doesn’t help much when you’re dealing with functions from packages someone else has typed.

    • mcfedr 21 hours ago ago

      Check out what are often called tagged types.

      You basically define a `type A = string &{a: SomeSymbol}`

      And then have a type assertion function that just returns true, and you have control over the places A can come from

      • demurgos 21 hours ago ago

        Yes, it's the standard way if you want implicit compat with the base type, so you can pass a value of type `A` to a function expecting a `string`. An other name for this pattern is "branded types".

      • steve_adams_86 20 hours ago ago

        I really like the Effect schema library for tagged types. I’m not sure how well it works for primitives, though.

      • Waterluvian 21 hours ago ago

        Ah yeah I remember poking at that. I should review it again. I think at the time it felt like a bit of a hack, but maybe some features of the past years of TS updates have helped.

        I assume the idea is to lie to the type system about the existence of the symbol, and at runtime it is just a string.

        • TOGoS 14 hours ago ago

          To me it feels like less of a hack if I can make the type declaration not a lie. e.g.

             type FooID = string & { typeName? : "FooID" }
          
          Read as 'of course this thing doesn't have a typeName[1] property, since it's a string, but if it did have the property, the value would be "FooID"'. You can then cast between FooID and string, but not between FooID and some other type that declares a typeName property.

          [1] I actually tend to use 'classRef' with an RDFish long name for the type, but that makes examples longer and isn't the point.

    • spockz 20 hours ago ago

      Haskell has had this with `newtype` for ages. Type safety without runtime overhead.

  • marxisttemp a day ago ago

    Could someone clarify to me the different between dependent types and refinement types?

    • dietr1ch 21 hours ago ago

      Refinements embed a predicate that elements of the new type must pass.

          def Nat = Int: (|x| -> x >= 0)
      
      Dependent types allow types to be computed from functions (and depend on arguments, otherwise it seems they become just weird constants),

          def Five(as_type: String) -> NumericType(as_type):
            match as_type:
              "string" => "five"
              "int" => 5
              "float" => 5.0f
              "double" => 5.0d
              _ => panic()  // Unnecessary if you refine `as_type` from a String to an enum or a fixed set of strings.
      
      
      Dependent types seem weird, but they help making types first-class (https://www.youtube.com/watch?v=mOtKD7ml0NU&t=325s) and gaining types like `Array<T, N>` that allow ensuring things are the right length, and define append/extend properly.
      • xigoi 3 hours ago ago

        Just curious, what language is this?

        • dietr1ch 3 hours ago ago

          I made up the syntax to try to keep it easy to read for people using python/rust/c.

          I just realized my lambda syntax on the Nat predicate is redundant because I didn't clean up and that using snake_case for function names would be better in a language that lets you operate on functions like they are values.

    • burakemir 20 hours ago ago

      Dependent types typically refers to type systems where a type can depend on a term. The canonical example is "Vector n" where n is some expression that evaluates to a natural number.

      Refinement types typically(1) refers to a type systems that lets you create a subtype of a type through refining (qualifying) with a predicate or constraint on the shape. Examples {x \in int | is_even x } or { x \in List | len(x) = 1 }

      Refinement types can be very powerful but that may well make type checking undecidable (think of a type of Turing machines, and the refinement that keeps only the ones that halt). By being careful about the logic used in the refinements, one may retain decidability.

      (1) The article seems to have a different idea of what a refinement type is: quote "a type system that does its work after another type system has already done its work".

      I am not going to play orthodox guardian of type theory terminology here, yet to me personally, it does seem unfortunate to use that term. The author seems to really want a form of type-level computation, which could be interesting if it could be rigorously specified and it's relation to the existing type level reduction clarified.

    • demurgos 21 hours ago ago

      Dependent types can make type constructors generic over _values_ (instead of only types). Refinement types keep the separation between types and values, but they let you wrap an existing type to enforce extra constraints or semantics. Another example of refinement are pattern types where you can attach a match pattern that is enforced to pass.

    • akdor1154 19 hours ago ago

      You could just read the first paragraph of TFA?