Comparison Traits – Understanding Equality and Ordering in Rust

mattrighetti 12 minutes ago

I was blown away when I discovered that Rust automatically generates enum ordering. I remember I was coding an AoC solution [0] and the tests that I had set up were passing without me actually doing any work, good times! :)

[0]: https://mattrighetti.com/2023/12/07/aoc-day-7

thomasmg 10 hours ago

I find floating point NaN != NaN quite annoying. But this is not related to Rust: this affects all programming languages that support floating point. All libraries that want to support ordering for floating point need to handle this special case, that is, all sort algorithms, hash table implementation, etc. Maybe it would cause less issues if NaN doesn't exist, or if NaN == NaN. At least, it would be much easier to understand and more consistent with other types.

newpavlov 6 hours ago

I agree. In my opinion NaNs were a big mistake in the IEEE 754 spec. Not only they introduce a lot of special casing, but also consume a relatively big chunk of all values in 32 bit floats (~0.4%).
I am not saying we do not need NaNs (I would even love to see them in integers, see: https://news.ycombinator.com/item?id=45174074), but I would prefer if we had less of them in floats with clear sorting rules.
MyOutfitIsVague 7 hours ago

There's a helpful crate that abstracts that away: https://docs.rs/ordered-float/latest/ordered_float/
You have a strongly ordered `NotNan` struct that wraps a float that's guaranteed to not be NaN, and an `OrderedFloat` that consideres all NaN equal, and greater than non-NaN values.
These are basically the special-cases you'd need to handle yourself anyway, and probably one of the approaches you'd end up taking.
ramon156 8 hours ago

I wonder if "any code that would create a NaN would error" would suffice here. I don't think it makes sense when you actually start to implement it, but I do feel like making a NaN error would be helpful. Why would you want to handle an NaN?
- thomasmg 8 hours ago
  
  Well floating point operations never throw an exception, which I kind of like, personally. I would rather go in the opposite direction and change integer division by zero to return MAX / MIN / 0.
  But NaN could be defined to be smaller or higher than any other value.
  Well, there are multiple NaN. And NaN isn't actually the only weirdness; there's also -0, and we have -0 == 0. I think equality for floating point is anyway weird, so then why not just define -0 < 0.
- MyOutfitIsVague 7 hours ago
  
  I mentioned in a sibling comment, there's a crate that does this in a pretty simple and obvious way: https://docs.rs/ordered-float/latest/ordered_float/
- westurner 8 hours ago
  
  If you don't handle NaN values, and there are NaNs in the real observations made for example with real sensors that sometimes return NaN and outliers, then the sort order there is indeterminate regardless of whether NaN==NaN; the identity function collides because there isn't enough entropy for there to be partial ordering or total ordering if multiple records have the same key value of NaN.
  How should an algorithm specify that it should sort by insertion order instead of memory address order if the sort key is NaN for multiple records?
  That's the default in SQL Relational Algebra IIRC?
  - thomasmg 5 hours ago
    
    > then the sort order there is indeterminate
    Well each programming language has a "sort" method that sorts arrays. Should this method throw an exception in case of NaN? I think the NaN rules were the wrong decision. Because of these rules, everywhere there are floating point numbers, the libraries have to have special code for NaN, even if they don't care about NaN. Otherwise there might be ugly bugs, like sorting running into endless loops, data loss, etc. But well, it can't be changed now.
    The best description of the decision is probably [1], where Stephen Canon (former member of the IEEE-754 committee if I understand correctly) explains the reasoning.
    [1] https://stackoverflow.com/questions/1565164/what-is-the-rati...
  - westurner 8 hours ago
    
    What is a good sort key for Photons and Phonons? What is a good sort key for H2O water molecules?

tialaramex 4 hours ago

One thing that isn't discussed here but seems worth knowing for a HN audience is that these are what Rust calls "safe" traits. This has several related consequences

1. You don't need to utter the keyword "unsafe" to implement these traits for your type. If you're not allowed by policy to write unsafe Rust (or if you just don't want to risk making any mistakes), you can implement these traits anyway. If you do that you should do so correctly as with writing any software...

2. But, because they're safe traits, nobody else's Rust software is allowed to rely on your correctness. If you disobeyed a rule, such as you decide all values of your type are always greater than themselves (whether carelessly or because you're a vandal) other Rust software mustn't become unsafe as a result.

3. This has real world implications, for example if your type Goose has an Ord implementation which is defective, whether on purpose or by mistake, sorting a Vec<Goose> in Rust won't have Undefined Behaviour like in C++, it might panic (in debug) and it can't necessarily sort your type if your Ord implementation is nonsensical, but the "sorted" Vec<Goose> is the same geese as your original, just potentially in a sorted state to the extent that meant anything. It's not fewer geese, or more geese, or just different geese altogether - and it certainly isn't say, an RCE now like it might be in C++