Clojure: Transducers

(clojure.org)

152 points | by tosh 4 days ago ago

82 comments

  • drob518 2 days ago ago

    Transducers work even better with a Clojure library called Injest. It has macros similar to the standard Clojure threading macros except Injest’s macros will recognize when you’re using transducers and automatically compose them correctly. You can even mix and match transducers and non-transducer functions and Injest will do its best to optimize the sequence of operations. And wait, there’s more! Injest has a parallelizing macro that will use transducers with the Clojure reducers library for simple and easy use of all your cores. Get it here: https://github.com/johnmn3/injest

    Note: I’m not the author of Injest, just a satisfied programmer.

  • bjoli 2 days ago ago

    I made srfi-171 [0], transducers for scheme. If you have any questions about them in general I can probably answer them. My version is pretty similar to the clojure version judging by the talks Rich Hickey gave on them.

    I know a lot of people find them confusing.

    0: https://srfi.schemers.org/srfi-171/srfi-171.html

    • matrix12 a day ago ago

      thanks. this is going in my scheme.

  • jwr a day ago ago

    Transducers are IMHO one of the most under-appreciated features of Clojure. Once you get to know them, building transducer pipelines becomes second nature. Then you realize that a lot of data processing can be expressed as a pipeline of transformations, and you end up with reusable components that can be applied in any context.

    The fact that transducers are fast (you don't incur the cost of handling intermediate data structures, nor the GC costs afterwards) is icing on the cake at this point.

    Much of the code I write begins with (into ...).

    And in Clojure, like with anything that has been added to the language, anything related to transducers is a first-class citizen, so you can reasonably expect library functions to have all the additional arities.

    [but don't try to write stateful transducers until you feel really comfortable with the concepts, they are really tricky and hard to get right]

  • adityaathalye 2 days ago ago

    May I offer a little code riff slicing FizzBuzz using transducers, as one would do in practice, in real code (as in not a screening interview round).

    Demo One: Computation and Output format pulled apart

      (def natural-nums (rest (range)))
    
      (def fizz-buzz-xform
        (comp (map basic-buzz)
              (take 100))) ;; early termination
    
      (transduce fizz-buzz-xform ;; calculate each step
                 conj ;; and use this output method
                 []   ;; to pour output into this data structure
                 natural-nums)
    
      (transduce fizz-buzz-xform ;; calculate each step
                 str ;; and use this output method
                 ""  ;; to catenate output into this string
                 natural-nums) ;; given this input
    
      (defn suffix-comma  [s]  (str s ","))
    
      (transduce (comp fizz-buzz-xform
                       (map suffix-comma)) ;; calculate each step
                 str ;; and use this output method
                 ""  ;; to catenate output into this string
                 natural-nums) ;; given this input
    
    Demos two and three for your further entertainment are here: https://www.evalapply.org/posts/n-ways-to-fizzbuzz-in-clojur...

    (edit: fix formatting, and kill dangling paren)

  • pjmlp 2 days ago ago

    Nowadays you can make use of some transducers ideas via gatherers in Java, however it isn't as straightforward as in plain Clojure.

  • arximboldi a day ago ago

    I implemented these in C++ a while ago. Arguably some aspects of the concept work even better in C++ as they do in Clojure.

    https://github.com/arximboldi/zug

    An interesting application in cursors in Lager:

    https://github.com/arximboldi/lager

  • talkingtab 2 days ago ago

    When I first read about transducers I was wowed. For example, if I want to walk all the files on my computer and find the duplicate photos in the whole file system, transducers provide a conveyor belt approach. And whether there are saving in terms of memory or anything, maybe. But the big win for me was to think about the problem as pipes instead of loops. And then if you could add conditionals and branches it is even easier to think about. At least I find it so.

    I tried to implement transducers in JavaScript using yield and generators and that worked. That was before async/await, but now you can just `await readdir("/"); I'm unclear as to whether transducers offer significant advantages over async/await?

    [[Note: I have a personal grudge against Java and since Clojure requires Java I just find myself unable to go down that road]]

    • jwr a day ago ago

      I think, like with the rest of Clojure, none of this is "revolutionary" in itself. Clojure doesn't try to be revolutionary, it's a bunch of existing ideas implemented together in a cohesive whole that can be used to build real complex systems (Rich Hickey said so himself).

      Transducers are not new or revolutionary. The ideas have been around for a long time, I still remember using SERIES in Common Lisp to get more performance without creating intermediate data structures. You can probably decompose transducers into several ideas put together, and each one of those can be reproduced in another way in another language. What makes them nice in Clojure is, like the rest of Clojure, the fact that they form a cohesive whole with the rest of the language and the standard library.

    • a day ago ago
      [deleted]
    • justinhj a day ago ago

      You could always try ClojureScript

  • vindarel a day ago ago
    • jwr a day ago ago

      I'd say SERIES is it's older cousin.

  • eduction 2 days ago ago

    The key insight behind transducers is that a ton of performance is lost not to bad algorithms or slow interpreters but to copying things around needlessly in memory, specifically through intermediate collections.

    While the mechanics of transducers are interesting the bottom line is they allow you to fuse functions and basic conditional logic together in such a way that you transform a collection exactly once instead of n times, meaning new allocation happens only once. Once you start using them you begin to see intermediate collections everywhere.

    Of course, in any language you can theoretically do everything in one hyperoptimized loop; transducers get you this loop without much of a compromise on keeping your program broken into simple, composable parts where intent is very clear. In fact your code ends up looking nearly identical (especially once you learn about eductions… cough).

    • fud101 2 days ago ago

      These sound wild in terms of promise but I never understood them in a practical way.

      • moomin 2 days ago ago

        They're not really that interesting. They're "reduce transformers". So, take a reduction operation, turn it into an object, define a way to convert one reduction operation into another and you're basically done. 99% of the time they're basically mapcat.

        The real thing to learn is how to express things in terms of reduce. Once you've understood that, just take a look at e.g. the map and filter transducers and it should be pretty obvious. But it doesn't work until you've grasped the fundamentals.

      • eduction a day ago ago

        Canonical example is rewriting a non transducing set of collection transformations like

           (->> posts
              (map with-user)
              (filter authorized?)
              (map with-friends)
              (into []))
        
        That’s five collections, this is two, using transducers:

            (into []
                  (comp
                    (map with-user)
                    (filter authorized?)
                    (map with-friends))
                  posts)
        
        A transducer is returned by comp, and each item within comp is itself a transducer. You can see how the flow is exactly like the double threading macro.

        map for example is called with one arg, this means it will return a transducer, unlike in the first example when it has a second argument, the coll posts, so immediately runs over that and returns a new coll.

        The composed transducer returned by comp is passed to into as the second of three arguments. In three argument form, into applies the transducer to each item in coll, the third argument. In two argument form, as in the first example, it just puts coll into the first argument (also a coll).

        • kccqzy a day ago ago

          That does not sound like a good example. The two-argument form of `map` already returns a lazy sequence. Same for `filter`. I thought lazy sequences are already supposed to get rid of the performance problem of materializing the entire collection. So

          • eduction a day ago ago

            Lazy sequences reduce the size of intermediate collections but they “chunk” - you get 32 items at a time, multiply that by however many transformations you have and obviously by the size of the items.

            There are some additional inefficiencies in terms of context capturing at each lazy transformation point. The problem gets worse outside of a tidy immediate set of transformations like you’ll see in any example.

            This article gives a good overview of the inefficiencies, search on “thunk” for tldr. https://clojure-goes-fast.com/blog/clojures-deadly-sin/ (I don’t agree with its near condemnation of the whole lazy pattern (laziness is quite useful - we can complain about it because we have it, it would suck if we didn’t).)

            • kccqzy a day ago ago

              So what’s your coding style in Clojure? Do you eschew lazy sequences as much as possible and only use either non-lazy manipulation functions like mapv or transducers?

              I liked using lazy sequences because it’s more amenable to breaking larger functions into smaller ones and decreases coupling. One part of my program uses map, and a distant part of it uses filter on the result of the map. With transducers it seems like the way to do it is eductions, but I avoided it because each time it is used it reevaluates each item, so it’s sacrificing time for less space, which is not usually what I want.

              I should add that I almost always write my code with lazy sequences first because it’s intuitive. Then maybe one time out of five I re-read my code after it’s done and realize I could refactor it to use transduce. I don’t think I’ve ever used eduction at all.

              • eduction a day ago ago

                It's evolving, and I'm using transducers more over time, but I still regularly am in situations where a simple map or mapv is all I need.

                Lazy sequences can be a good fit for a lot of use cases. For example, I have some scenarios where I'm selecting from a web page DOM and most of the time I only want the first match but sometimes I want them all - laziness is great there. Or walking directories in a certain order, and the number of items they contains varies, so I don't know how many I'll need to walk but I know it's usually a small fraction of the total. Laziness is great there.

                This can still work with transducers - you can either pass a lazy thing in as the coll to an eager transducing context (maybe with a "take n" along the way) or use the "sequence" transducing context which is lazy.

                I tend to reach for transducers in places in my code where I'm combining multiple collection transformations, usually with literal map/filter/take/whatever right there in the code. Easy wins.

                Recently I've started building more functions that return either transducers or eductions (depending on whether I want to "set" / couple in the base collection, which is what eduction is good for) so I can compose disparate functions at different points in the code and combine them efficiently. I did this in the context of a web pipeline, where I was chaining a request through different functions to come up with a response. Passing an eduction along, I could just nest it inside other eductions when I wanted to add transducers, then realize the whole thing at the end with an into and render.

                Mentally it took me some time to wrap my head around transducers and when and how to use them, so I'm still figuring it out, but I could see myself ending up using them for most things. Rich Hickey, who created clojure, has said if he had thought of them near the beginning he'd have built the whole language around them. But I don't worry about it too much, I mostly just want to get sh-t done and I use them when I can see the opportunity to do so.

            • eduction a day ago ago

              This, by the way, is why the lead example in the original linked post on clojure.org is very much like mine.

        • fud101 a day ago ago

          Thanks. So is this not an optimiser Clojure runtime can do for you automatically? I find the first one simpler to read and understand.

          • jwr a day ago ago

            Performance is one of the niceties of transducers, but the real benefits are from better code abstractions.

            For example, transducers decouple the collection type from data-processing functions. So you can write (into #{} ...) (a set), (into [] ...) (a vector) or (into {} ...) (a map) — and you don't have to modify the functions that process your data, or convert a collection at the end. The functions don't care about your target data structure, or the source data structure. They only care about what they process.

            The fact that no intermediate structures have to be created is an additional nicety, not really an optimization.

            It is true that for simple examples the (-> ...) is easier to read and understand. But you get used to the (into) syntax quickly, and you can do so much more this way (composable pipelines built on demand!).

            • eduction a day ago ago

              I'd argue for most people performance is the single best reason to use them. Exception is if you regularly use streams/channels and benefit from transforming inside of them.

              To take your example, there isn't much abstraction difference between (into #{} (map inc ids)) vs (into #{} (map inc) ids), nor is there a flexibility difference. The non transducer version has the exact same benefit of allowing specification of an arbitrary destination coll and accepting just as wide range of things as the source (any seqable). Whether in a transducer or not, inc doesn't care about where its argument is coming from or going. The only difference between those two invocations is performance.

              Functions already provide a ton of abstractability and the programmer will rightly ask, "why should I bother with transducers instead of just using functions?" (aka other, arbitrary functions not of the particular transducer shape) The answer is usually going to be performance.

              For a literal core async pipeline, of course, there is no replacing transducers because they are built to be used there, and there is a big abstraction benefit to being able to just hand in a transducer to the pipeline or chan vs building a function that reads from one channel, transforms, and puts on another channel. I never had the impression these pipelines were widely used, but I'd love to be wrong!

  • thih9 2 days ago ago
    • adityaathalye 2 days ago ago

      I'd reckon most of Clojure is from ten years ago. Excellent backward compatibility, you see :) cf. https://hopl4.sigplan.org/details/hopl-4-papers/9/A-History-...

    • whalesalad 2 days ago ago

      It's a blessing and a curse that zero innovation has occurred in the Clojure space since 2016. Pretty sure the only big things has been clojure.spec becoming more mainstream and the introduction of deps.edn to supplant lein. altho I am still partial to lein.

      • seancorfield a day ago ago

        Clojure 1.9: Spec.

        Clojure 1.10: datafy/nav + tap> which has spawned a whole new set of tooling for exploring data.

        Clojure 1.11: portable math (clojure.math, which also works on ClojureScript).

        Clojure 1.12: huge improvements in Java interop.

        And, yes, the new CLI and deps.edn, and tools.build to support "builds as programs".

        • vaylian a day ago ago

          And we can look forward to Jank https://jank-lang.org/

          • pjmlp a day ago ago

            Yes, although if one cares about Jank, they can also use a traditional Common Lisp or Scheme compiler, if compatibility with existing Clojure code isn't a requirement.

        • whalesalad a day ago ago

          Things have surely happened and the language has improved, but would you consider any of this to be innovative?

          • iLemming 14 hours ago ago

            > would you consider any of this to be innovative?

            JS got optional chaining, nullish coalescing, async/await, decorators, pattern matching proposals - all borrowed from other languages. Python got type hints (borrowed), structural pattern matching (borrowed from ML/Haskell), walrus operator. Rust got async/await (borrowed). Go got generics (very late, borrowed from everywhere).

            Almost every "feature addition" in any mainstream language since roughly 2010 is a synthesis or import from prior art - usually from ML, Haskell, Lisp, or Smalltalk lineages. Comparatively, there's been quite some good amount of innovation in Clojure-sphere. Anyone who ever tried Hyperfiddle/electric, generated tests based on Specs or Malli, or even used nbb for scripting - knows that.

            So let's either apply pressure everywhere equally, or nowhere. What's your point of singling out Clojure? Are you asking for a higher standard being applied because of Clojure's stated philosophy (simplicity and careful design, etc.), or this is a proxy complaint about something else?

          • jwr a day ago ago

            Hmm. I'm not sure what you are looking for — myself, I write software that supports my living, and I'm not looking for thrills. What I get with Clojure is new concepts every couple of years or so, thought through and carefully implemented by people much smarter than me, in a way that doesn't break anything. This lets me concentrate on my work and deliver said software that supports my living. And pay the bills.

          • waffletower a day ago ago

            Babashka is definitely innovative and useful

            • whalesalad a day ago ago

              Agreed, that is huge for the ecosystem. I have a side project actually that has a unified codebase: central library and api server in clj, and the cli client is babashka.

      • JoshCole a day ago ago

        I know others already pointed out a ton of things, but having worked with Clojure in 2016 and doing active Clojure development for my startup now I feel like I have to chime in too.

        In 2016, Clojure was not great for serious data science. That has changed substantially and not just via Java Interop.

        - It now has cross ecosystem GPU support via blueberry libraries like neanderthal, which in benchmarking, outperform some serious Java libraries in this space.

        - It has columnar indexed JIT optimized data science libraries via cnuernber and techascent part of the Clojure ecosystem. In benchmarking they've outperformed libraries like numpy.

        - The ecosystem around data science is also better. The projects aren't siloed like they used to be. The ecosystem is making things interoperate.

        - You can now use Python from Clojure via the lib-pythonclj bindings. In general, CFFI is a lot better, not just for Python.

        - The linters are way better than they used to be. The REPL support too.

        Clojure already had one of the best efficiency scores in terms of code written to what is accomplished, but now you also get REPL integration, and LLMs have been increasingly capable of leveraging that. There are things like yogthos mycelium experiments to take advantage of that with RLLM calls. So its innovating in interesting new ways too, like cutting bugs in LLM generated code.

        It just doesn't feel true to me that innovation isn't occurring. Clojure really has this import antigravity feel to it; things other languages would have to do a new release for, are just libraries that you can grab and try out (or maybe that's the python)

        • uxcolumbo a day ago ago

          Can you talk more about why you chose CLJ for datascience / ML.

          Are there any benefits of using it over Python?

          And how is the interop with Python libs?

          • JoshCole a day ago ago

            > Can you talk more about why you chose CLJ for datascience / ML.

            I use Python for a lot of machine learning. My vision transformers, for example, are in Python. There is a lot to like about the Python ecosystem. Throwing away libraries like ablumentations and pytorch because you move to a different ecosystem is a real loss. You probably ought to be using Python if you're doing machine learning of the sort that one immediately thinks of when they see ML.

            That said, data science and machine learning are words that cover a lot of ground.

            Python often works because it serves as glue code to more optimized libraries. Sometimes, it is annoying to use it as glue code. For example, when you're working on computational game theory problems, the underlying data model tends to be a tree structure and the exploration algorithm explores that tree structure. There is a lot of branching. Vanilla python in such a case is horrifically slow.

            I was looking at progress bars in tqdm reporting 10,000 years until the computation was done. I had already reached for numba and done some optimizations. Computational game theory is quite brutal. You're very often reminded that there are less atoms in the universe than objects of interest to correctly calculating what you want to calculate.

            Most people use C, C++, and CUDA kernels for the sort of program I was writing. Some people have tried to do things in Python.

            > Are there any benefits of using it over Python?

            There is an open source implementation of a thing I built. It solves the same problem I solved, but in Python and worse than I solved it and with a lot of missing features. It has a comment in it, discussing that the universe will end before the code would finish, were it to be used at the non-trivial size. The code I wrote worked at the non-trivial size. Clojure, for me, finished. The universe hasn't ended yet. So I can't yet tell you how much faster my code was than the Python code I'm talking about.

            > And how is the interop with Python libs?

            Worked for me without issue, but I eventually got annoyed that I had to wait for two rounds of dependency resolution in some builds. Conda builds can sometimes have issues with dependency resolution taking an unreasonable amount of time. I was hitting that despite using very few libraries.

            • pjmlp a day ago ago

              To note that enough people have tried to do things in Python, that now writing CUDA kernels in Python is also a supported way, still WIP but NVidia is quite serious about it.

              Basically their GPU JIT builds on top of MLIR, thus in the end is no different from anything else on top of LLVM.

            • uxcolumbo 21 hours ago ago

              I like Clojure and want to get more into, but wondered what folks are doing when it comes to building AI powered apps. So thanks for sharing your experience.

              And nice site btw :)

      • iLemming a day ago ago

        > zero innovation has occurred in the Clojure space since 2016.

        Oh, really? Zero, eh?

        clojure.spec, deps.edn, Babashka, nbb, tap>, requiring-resolve, add-libs, method values, interop improvements, Malli, Polylith, Portal, Clerk, hyperfiddle/electric, SCI, flowstorm ...

        Maybe you should've started the sentence with "I stopped paying attention in 2016..."?

        • instig007 a day ago ago

          > clojure.spec

          Tape-patches for self-inflicted language design issues isn't innovation, lol

          • iLemming 14 hours ago ago

            > lol

            Joke's on you. You seem to be so invested in moving in a single direction that you developed "an expert blind spot". Have you ever thought that it's possible that the knowledge you've so far "accumulated" has become an obstacle to seeing simpler or orthogonal ideas clearly?

            Every type system, schema library, and validation tool in every language is in some sense "patching" the lack of built-in guarantees. Haskell's typeclasses patch the lack of ad-hoc polymorphism. Rust's borrow checker patches the lack of memory safety. Python's type hints patch the lack of static types. You can retroactively frame any additive language feature as patching a prior omission - it's not an argument, it's a framing choice.

            Spec isn't even so much about patching - it's about runtime generative testing, instrumentation, and data specification in a dynamic context where static types would be the wrong tool anyway. That's a genuine design space with genuine ideas in it, regardless of whether you like dynamic typing. You just can't see it, because you already have decided "isn't innovation, lol", etc.

            One more reason to love the language is its community. I appreciate that Clojurians engage with diverse ideas from different tools and languages, freely borrowing the best ones without prejudice, owing to their deep and widespread understanding of language design. And they do it with the focus on pragmatism. Something maybe we can learn from them, even if we don't like the language and tools they make.

            • instig007 12 hours ago ago

              > Every type system, schema library, and validation tool in every language is in some sense "patching" the lack of built-in guarantees.

              > Spec isn't even so much about patching - it's about runtime generative testing, instrumentation, and data specification in a dynamic context where static types would be the wrong tool anyway.

              it's amazing what people can claim when they don't have to prove it. But I wonder, how exactly does your runtime generative tests are different from statically derived strategies that I get via QuickCheck or Validity?

              > And they do it with the focus on pragmatism

              "pragmatism" is defined in terms of values that one desires to practice. I am in no position to argue that your and their desires don't exist, but please don't claim that their preferences of transducers and schemas are somehow more pragmatic just because they ignored types and effectful/pure evaluation distinction in their language philosophy.

              • iLemming 11 hours ago ago

                I never claimed that Clojure (or transducers, etc) is "more pragmatic than Haskell", I said "Clojurians engage with diverse ideas pragmatically".

                > how exactly does your runtime generative tests are different from statically derived strategies

                Spec generators are derived from predicates, not types - which inverts the usual QuickCheck problem where Int generates any Int and you have to write newtypes or custom Gen instances to narrow to "ages 1-120." Spec also has :fn specs that assert relationships between args and return values, which base QuickCheck/Validity don't give you natively (you'd reach for Liquid Haskell). And `instrument` validates real calls in dev, not just sampled properties.

                You seem to be operating on a single axiom: types + purity + laziness are the correct solution to the problems worth solving. Given that axiom, every Clojure design choice in your eyes either (a) a patch for not having them, or (b) an unnecessary abstraction that falls out of having them. There is no version of reality in which Clojure can be credited with solving something for you, because the axiom forecloses it.

                This is an unfalsifiable position, any additional technical arguments would be wasted. You don't even try to evaluate my counterexamples, because the axiom tells you the counterexamples must be wrong in some way you haven't yet articulated to yourself.

                Okay, please, give me Haskell code that takes one composed transformation and applies it, unchanged, to a vector, a lazy seq, a channel, and a pure fold. Not 'here is pipes, here is conduit, here is streaming, here is foldl library' - one piece of code, four consumers. That's the thing you have dodged four times in the other thread.

                Clojure didn't ignore static types or the pure/effectful distinction - it made a deliberate decision to optimize for different values. Framing deliberate trade-offs as ignorance is often itself a screaming display of ignorance.

                • instig007 9 hours ago ago

                  > I said "Clojurians engage with diverse ideas pragmatically".

                  then you said nothing and contributed nothing to your points, as everybody else "engage with diverse ideas pragmatically". It also so happened that engaging allows for rejection of inferior ideas, which is what transducers are. I can compose around any Python iterable the same way you claim is important to transducers, but do you know what I lose if I engage with the pragmatic Python and Clojure? I lose precision and further optimizations.

                  > You seem to be operating on a single axiom

                  How about you abstain from drawing wrong conclusions and actually focus on being precise

                  > Spec generators are derived from predicates, not types

                  What do predicates operate on? QuickCheck builds bounded ints within their `minBound` and `maxBound` of the type, as the basis of Int spec deriving. There's no difference and no inversion of intent if your strategy for your newtype actually produces a spec deriving from 1-120 range. If you say there's a thing called Age, and it being a subset of Int or Nat ranges, you do define the Age and its bounds as part of your spec, and there's zero inversion to what clojure spec does. I'm beginning to suspect that I'm conversing with a prompt output.

                  > Okay, please, give me Haskell code that takes one composed transformation and applies it, unchanged, to a vector, a lazy seq, a channel, and a pure fold. Not 'here is pipes, here is conduit, here is streaming, here is foldl library' - one piece of code, four consumers. That's the thing you have dodged four times in the other thread.

                  Certainly, I'll do that as soon as you provide me with the example of a transducer tracking effects separately from pure evaluations. We want to be on the same page, don't we? I want to compose my effects without ambiguity, so hurry up.

                  > Clojure didn't ignore static types or the pure/effectful distinction - it made a deliberate decision to optimize for different values.

                  lol, it actually ignored it, but you're too perky to simply admit that as if your future depends on it.

                  • iLemming 8 hours ago ago

                    > QuickCheck builds bounded ints within their `minBound` and `maxBound`

                    Yeah, your narrow technical note isn't wrong here (I should've used less trivial example), but the broader differences still hold - spec operating on map shapes without lifting data into types, arg/return relationships without reaching for Liquid Haskell, etc. This is much longer discussion that requires its own thread, unrelated to transducers.

                    > I can compose around any Python iterable the same way

                    No, you can't. Python iterables are not uniform across: strict collections, lazy sequences, async channels, arbitrary reducing step functions. itertools composes over iterables. It does not compose over asyncio.Queue, a trio memory channel, or a user-supplied step function. The transducers are specifically about the reducing function as the point of composition, which decouples the transformation from whatever produces or consumes values. Python's iterator protocol is a narrower abstraction. Show me some Python code that applies one composed transformation, unchanged, to an iterable, an asyncio.Queue, and a user-defined reduce function. You can't, because the protocol doesn't support it. Congratulations, now you're complaining on a third language without properly understanding the topic at hand.

                    > provide me with the example of a transducer tracking effects separately from pure evaluations

                    Transducers aren't an effect-tracking system because Clojure's language doesn't track effects in types. Asking for a Clojure abstraction that tracks effects is like asking for a Haskell library that works without the type system. It handles effects through different mechanisms, and transducers are orthogonal to effect tracking by design. Effect separation is not a goal of transducers, just as uniform multi-consumer application is not a goal of lazy evaluation.

                    > lol, it actually ignored it, but you're too perky

                    This is factually wrong. Hickey's talks (Effective Programs, Maybe Not, Clojure core.typed, the explicit refusal to adopt static types) are public, specific, and reasoned. You can absolutely disagree with the reasoning. Calling deliberate, argued design decisions "ignored" is either ignorance or dishonesty.

                    You're defending a hierarchy: typed+pure+lazy on top, everything else is a degraded attempt at the top. Within that hierarchy, Clojure can't contribute anything original, by definition - anything Clojure does either (a) duplicates what types+purity already give you, or (b) is a workaround for not having them.

                    I'm defending something subtler and harder to argue: that different design axes exist, that Clojure's choices are coherent given its axes, and that "this language's solution to X is a workaround for not having Y" is a framing choice, not a technical claim. I am also rhetorically disadvantaged, because "X is just a workaround" is a punchy dismissal and "X is a coherent choice within a different design space" is a paragraph.

                    I'm not perky about any of it, this isn't a Haskell vs. Clojure debate for specific use cases, you're arguing just for the sake of it. You're not learning, not probing ideas, not stress-testing your own position, nor are you giving me an opportunity for any of that on my side. Language-tribal arguments on HN are a genre, and you're writing in that genre. I hope you had fun, and please don't you dare calling me "a prompt output" - I spent time and energy arguing in vain, about nothing, at least have some human decency to acknowledge that.

  • solomonb a day ago ago

    I never understood what was so special about Clojure's Transducers. Isn't it essentially just applying a transformation on the lambda applied to a fold?

    • Veedrac a day ago ago

      Fundamentally, there are two ways of representing iteration pipelines: source driven, and drain driven. This almost always maps to the idea of _internal_ iteration and _external_ iteration, because the source is wrapped inside the transforms. Transducers are unusual in being source driven but also external iterators.

      Most imperative languages choose one of two things, internal iteration that doesn't support composable flow control, and external iteration that does. This is why you see pause/resume style iteration in Python, Rust, Java, and even Javascript. If that's your experience, transducers are a pretty novel place in the trade-off space: you keep most of the composability, but you get to drive it from things like event sources.

      But the gap is a bit smaller than it might appear. Rust's iterators are conceptually external iterators, but they actually do support internal iteration through `try_fold`, and even in languages that don't, you can 'just' convert external to internal iterators.

      Then all you have to do to recover what transducers give you is pass the object to the source, let it run `try_fold` whenever it has data, and check for early termination via `size_hint`. There's one more trick for the rare case of iterators with buffering, but you don't have to change the Iterator interface for that, you just need to pass one bit of shared state to the objects on construction.

      Not all Iterators are strictly valid to be source-driven, and while most do, not everything works nicely when iterated this way (eg. Skip could but doesn't handle this case correctly, because it's not required to), but I don't think transducers can actually do anything this setup can't. It's just an API difference after that point.

      • solomonb a day ago ago

        > If that's your experience, transducers are a pretty novel place in the trade-off space

        That is not my experience and TBH I don't know what a lot of your terminology specifically means.

        • Veedrac a day ago ago

          I wasn't saying you would have that experience, I was saying that the reason people act like transducers are unique is that transducers are an unconventional place on well worn ground.

          Ultimately, yes, everything bottoms out, most special tricks seem less special the more you understand about them, because it's programming and Turing Equivalence is the bedrock the whole field rests on. But the average person learning about transducers is not going to spot how closely related it is to other things that already exist.

          I'm happy to elaborate on any part of the terminology if you're curious, but tbh I mostly wrote it for myself because I thought the framing was novel and wanted it noted down somewhere.

    • waffletower a day ago ago

      That is a bit reductive. You can consider these implementations in other languages: https://github.com/hypirion/haskell-transducers -- https://github.com/ruuda/transducers

      • solomonb a day ago ago

        It seems like a messy abstraction whose results could be achieved through a variety of other tools. :/

        • mhitza 12 hours ago ago

          Free Monads are a very nice (though not performant) way of creating an embedded domain specific language interpreter.

          Once I was building a declarative components library in PHP, using the ideas I've learned from free monads. I'm sure you can't imagine what an attrocity I've built. It did the job, but I had to mentally check out and throw in a couple of goto's in my main evalution loop.

          All that to say that elegance of expressivity is tied to the syntax and semantics of languages.

          • solomonb 10 hours ago ago

            Free Monads are also built on a tower of mathematical structures that come with laws and invariants. I have yet to see such formalization for transducers.

        • waffletower a day ago ago

          It isn't messy in Clojure

  • mannycalavera42 2 days ago ago

    transducers and async flow are :chefkiss

  • waffletower a day ago ago

    I am a fan of Christophe Grand's xforms library -- https://github.com/cgrand/xforms -- I find the transducer nexus function, by-key, to be particularly useful for eliminating clojure.core destructuring dances when one needs group-by with post-processing.

    • waffletower a day ago ago

      A not too contrived example: (require '[net.cgrand.xforms :as x]) (into {} (x/by-key :name :size (comp (x/reduce +) (map str))) example-mapseq)

  • faraway9911 2 days ago ago

    [dead]

  • instig007 a day ago ago

    You get this for free in Haskell, and you also save on not having to remember useless terminology for something that has no application on their own outside Foldables anyways.

    • Maxatar a day ago ago

      >...you also save on not having to remember useless terminology...

      It may be true in this particular case, but in my admittedly brief experience using Haskell you absolutely end up having to remember a hell of a lot of useless terminology for incredibly trivial things.

      • tombert a day ago ago

        Terminology doesn't bother me nearly as much as people defining custom operators.

        I used to think it was cute the you could make custom operators in Haskell but as I've worked more with the language, I wish the community would just accept that "words" are actually a pretty useful tool.

    • iLemming a day ago ago

      > You get this for free in Haskell,

      Oh, my favorite part of the orange site, that's why we come here, that's the 'meat of HN' - language tribalism with a technical veneer. Congratulations, not only you said something as lame as: "French doesn't need the subjunctive mood because German has word order rules that already express uncertainty", but you're also incorrect factually.

      Haskell's laziness gives you fusion-like memory behavior on lists for free. But transducers solve a broader problem - portable, composable, context-independent transformations over arbitrary reducing processes - and that you don't get for free in Haskell either.

      Transducers exist because Clojure is strict, has a rich collection library, and needed a composable abstraction over reducing processes that works uniformly across collections, channels, streams, and anything else that can be expressed as a step function. They're a solution to a specific problem in a specific context.

      Haskell's laziness exists because the language chose non-strict semantics as a foundational design decision, with entirely different consequences - both positive (fusion, elegant expression of infinite structures) and negative (space leaks, reasoning difficulty about resource usage).

      • instig007 a day ago ago

        > Haskell's laziness gives you fusion-like memory behavior on lists for free.

        Haskell laziness & fusion isn't limited to lists, you can fuse any lawful composition of functions applied over data with the required lawful instances used for the said composition. There's no difference to what transducers are designed for.

        > But transducers solve a broader problem - portable, composable, context-independent transformations over arbitrary reducing processes - and that you don't get for free in Haskell either.

        Transducers don't solve a broader problem, it's the same problem of reducing complexities of your algorithims by eliminating transient data representations. If you think otherwise, I invite you to provide a practical example of the broader scope, especially the part about "context-independent transformations" that would be different to what Haskell provides you without that separate notion.

        > and negative (space leaks, reasoning difficulty about resource usage).

        which is mostly FUD spread by internet crowd who don't know the basics of call-by-need semantics, such as the places you don't bind your intermediate evaluations at, and what language constructs implicitly force evaluations for you.

        • iLemming a day ago ago

          > you can fuse any lawful composition of functions

          each of those requires manually written rewrite rules or specific library support. It's not a universal property that falls out of laziness - it's careful engineering per data type. Transducers work over any reducing function by construction, not by optimization rules that may or may not fire.

          > it's the same problem

          It is not. Take a transducer like `(comp (filter odd?) (map inc) (take 5))`. You can apply this to a vector, a lazy seq, a core.async channel, or a custom step function you wrote five minutes ago. The transformation is defined once, independent of source and destination. In Haskell, fusing over a list is one thing. Applying that same composed transformation to a conduit, a streaming pipeline, an io-streams source, and a pure fold requires different code or different typeclass machinery for each. You can absolutely build this abstraction in Haskell (the foldl library gets close), but it's not free - it's a library with design choices, just like transducers are.

          You're third claim is basically the "skill issue" defense. Two Haskell Simons - Marlow, and Jones, and also Edward Kmett have all written and spoken about the difficulty of reasoning about space behavior in lazy Haskell. If the people who build the compiler and its core libraries acknowledge it as a real trade-off, dismissing it as FUD from people who "don't know the basics" is not an argument. It's gatekeeping.

          Come on, how can you fail to see the difference between: "Haskell can express similar things" with "Haskell gives you this for free"?

          • instig007 a day ago ago

            Why do you eliminate a library-based solution from the equation if it can actually prove the point that there's no difference in intent as long as my runtime is already lazy by default?

            > It is not. Take a transducer like `(comp (filter odd?) (map inc) (take 5))`. You can apply this to a vector, a lazy seq, a core.async channel, or a custom step function you wrote five minutes ago. In Haskell, fusing over a list is one thing. Applying that same composed transformation to a conduit, a streaming pipeline, an io-streams source, and a pure fold requires different code or different typeclass machinery for each.

            You can do that only because Clojure doesn't care whether the underlying iterable is to be processed by a side-effectful evaluation. That doesn't negate the fact that the underlying evaluation has a useless notion of "transducer". I said "fuse" in my previous comment to demonstrate that further comptime optimisations are possible that eliminate some transient steps altogether. If you don't need that you can just rely on generic lazy composition of functions that you define once over type classes' constraints.

            `IsList` + `OverloadedLists` already exist. Had Haskell had a single type class for all iterable implicitly side-effectful data, you would have got the same singly-written algorithm without a single notion of a transducer. Let that sink in: it's not the transducer that's useful, it the differentiation between pure and side-effectful evaluations that allow your compiler to perform even better optimisations with out-of-order evaluations of pure stuff, as well as eliminating parts of inner steps within the composed step function, as opposed to focusing just on the reducing step-function during the composition. It's not a useful abstraction to have if you care about better precision and advanced optimisations coming from the ability to distinguish pure stuff from non-pure stuff.

            Haskell aside, if your goal is to just compose reusable algorithms, a call-by-need runtime + currying + pointfree notation get you covered, you don't need a notion of transducers that exist on their own (outside of the notion of foldable interfaces) to be able to claim exactly the same benefits.

            > Two Haskell Simons - Marlow, and Jones, and also Edward Kmett have all written and spoken about the difficulty of reasoning about space behavior in lazy Haskell.

            There's a difference between what the people said in the past, and the things the crowd claims the people meant about laziness and space leaks. We can go over individual statements and see if they hold the same "negative" meaning that you say is there.

            • iLemming 14 hours ago ago

              On IsList + OverloadedLists - this is a fantasy counterargument. Unified typeclass for side-effectful iterables doesn't exist in Haskell, so you're saying "had it existed, you'd get the same thing", you're describing a different language.

              Transducers don't exist despite the lack of purity distinction, they exist because the reducing step function abstraction is useful regardless. You've drifted from "you get this for free" to "a different design with different trade-offs would make this unnecessary" - which again is just describing a different language. You're moving goalposts from "you get this for free" to "you don't need this at all if you design your language differently..."

              Look, there are nice things in Haskell for sure, there are things that may cause frustration as well. Same for Clojure, but comparing them on a single thing is like judging a bicycle and a boat by which one flies better. They're built on fundamentally different assumptions and those assumptions cascade into every design decision. Transducers aren't a workaround for the absence of laziness, they're a natural solution within Clojure's actual constraints and goals - you're complaining without even understanding those constraints (in both of them). Haskell's laziness isn't a superior version of transducers, it's a different bet on different trade-offs. Neither language is trying to be the other.

              Stick to Haskell if you must, bring to the table some interesting ideas from it, they'd be appreciated, but please stop spreading confusion and misinformation, thinking that if you talk louder people would prefer Haskell. It's not like folks en masse trying to abandon Python, Typescript and Java and confused between choosing Clojure or Haskell.

              • instig007 12 hours ago ago

                > so you're saying "had it existed, you'd get the same thing", you're describing a different language.

                that's not what I'm saying. I'm saying that Haskell doesn't have it because it's a useless and shallow abstraction to have, that also hampers the ability to apply advanced optimisation laws down the compilation pipeline.

                I will just repost the part that you conveniently ignored in your reply and pretended that it didn't exist:

                Let that sink in: it's not the transducer that's useful, it the differentiation between pure and side-effectful evaluations that allow your compiler to perform even better optimisations with out-of-order evaluations of pure stuff, as well as eliminating parts of inner steps within the composed step function, as opposed to focusing just on the reducing step-function during the composition. It's not a useful abstraction to have if you care about better precision and advanced optimisations coming from the ability to distinguish pure stuff from non-pure stuff.

                My argument holds: you get the same composability with lazy functions for free, you don't need to apply rewrite rules to be on the same level of reusability. Haskell grants you that for free, but for some reason you chime in and claim that's not the case and the only proof you've provided had to do with missing interfaces that can be solved by a library implementation. There's no restriction in the type system, nor runtime, to have it. But people don't need it because it's a useless abstraction that doesn't improve the baseline of what Haskell has to offer both in terms of composability of your foldings and further optimisations that take iteration purity into account.

                > You've drifted from

                I didn't drift from anything, I told you that you ignored a library-based solution in a sneaky attempt to move the goalpost from "you need rewrite rules in many places" to "there's no interface generic enough to accomodate effectful and non-effectful steps together without a library implementation".

                > Haskell's laziness isn't a superior version of transducers

                It absolutely is a superior solution to the same problem of algorithm optimisation and composability. It's more generic, it applies to anamorphisms and hylomorphisms in the same way as it does to foldings, and it doesn't introduce a special terminology to a single building block that doesn't exist outside foldings anyways.

                > but please stop spreading confusion and misinformation

                that's a bold statement coming from someone that claims that call-by-need semantics in Haskell is a negative aspect of the language according to other people (who probably didn't mean it in the first place, but you wouldn't dare to verify).

                • iLemming 11 hours ago ago

                  > It's more generic

                  Generic over what? Lazy evaluation is a semantic property of expression reduction. Transducers are parameterized over the reducing function. These aren't comparable on a generality axis - they live at different levels of abstraction. The fact that recursion schemes (ana/hylo) exist in Haskell is true and cool but doesn't address the actual transducer claim, which is: one value, applied to fundamentally different consumers (a channel, a fold, a stream, a transient collection) without recompilation or re-specialization. In Haskell, the closest analogs are conduit/pipes/streaming - each a library, each with its own type, each requiring adapters between them.

                  The concrete example - `(comp (filter odd?) (map inc) (take 5))` applied across source types - is the single most load-bearing thing in the thread and you never actually answered it. You gestured at OverloadedLists + a hypothetical unified typeclass, then pivoted to "it's useless anyway" which is the tell that you don't even understand the topic to start contemplating a direct answer.

                  Can we we please stop responding to concrete technical points by retreating to broader aesthetic claims - "useless", "shallow", "superior"? This honestly isn't helping anyone. I don't see the point of keeping going here, and not because I'm from the "internet crowd who don't know the basics".

                  You're claiming to know how (a better) language should have been designed, okay, let's talk about possibilities, instead of "just use Haskell" - that is really is childish.

                  • instig007 8 hours ago ago

                    > Generic over what?

                    Generic over whatever you decide to compose out of smaller parts into a full algorithm that doesn't produce transient buffered results. Transducers is a dead end of abstractions, they aren't applicable anywhere but folding, lazy runtime gets you covered for free regardless of your choice of the exact source of either a foldable `Stream f e a`, or a generator of values on demand, or even a data constructor.

                    > doesn't address the actual transducer claim, which is: one value, applied to fundamentally different consumers (a channel, a fold, a stream, a transient collection) without recompilation or re-specialization.

                    the transducer claim is that there's no way to track effects, period. Hey, you've found a new abstraction that doesn't care about things, my congratulations, you're now on par with Python itertools!

                    > In Haskell, the closest analogs are conduit/pipes/streaming - each a library, each with its own type, each requiring adapters between them.

                    Do you understand why it's the case? It's because transducers are useless and people actually care about further optimisations and experimentation. To be on par with Clojure it would be enough to have a single `Stream m e a` that everyone would silently buy into. But no one opts for it, because people actually care about their local optimisations that go beyond what you think transducers give you. If you don't care about those, pick any generic enough interface and glue it with whatever you want in a single place for the entirey of your ecosystem. Had `Streamly` been part of `base`, you'd get exactly that property that you claim isn't a thing. Then maybe add `streamly` into your dependency list and start using it pervasively everywhere where iteration happens. You'll be on par with Clojure, but without the silly notion of transducers as a thing of its own (but it's not, it's only for foldings that don't care about side-effects).

                    • iLemming 7 hours ago ago

                      > Transducers aren't applicable anywhere but folding

                      Wrong, factually wrong! Transducers apply to anything expressible as a step function: reductions, yes, but also channels (core.async), observable streams (manifold), eduction pipelines, into-transformations, transient-collection builds, stateful transformations like partition-by and dedupe that don't fit a pure fold at all. (dedupe) is a transducer. Try expressing it as a pure lazy-list fusion. You can, but you need explicit state threading, and then you've rebuilt a step function by hand.

                      The definition of "folding" you're using here is so broad it's doing no work. If "folding" means "any left-to-right consumption of values", then yes, transducers are for folding - and so is ~all of streaming, ~all of iteration, ~all of channel consumption. You're using the word to make the scope sound small while the scope is actually most of what programs do with sequences of values.

                      > the transducer claim is that there's no way to track effects, period.

                      The transducer claim - the actual one, as stated in Hickey's talk and the docs - is that a reducing function is a fundamental substrate that composes, and you can build transformations over reducing functions that are source and sink-agnostic. Effect tracking is orthogonal.

                      You keep trying to move the goalposts to "transducers must track effects or they're useless". That's like saying "typeclasses must handle concurrency or they're useless" It's a category demand imported from your preferred language's feature set.

                      The "on par with Python itertools" jab is wrong. itertools composes over iterables only. Transducers compose over reducing functions. Python itertools does not work against asyncio.Queue or a user-defined reduce.

                      > To be on par with Clojure it would be enough to have a single `Stream m e a` that everyone would silently buy into.

                      Okay, let me read that again slowly:

                      1. The property you're describing (one transformation, many consumers) is real and distinct.

                      2. Haskell does not currently give it to you (on the language level).

                      2. To give it to you, Haskell would need a single blessed streaming abstraction in base.

                      3. Haskell doesn't have one because the community prefers local optimization over a universal substrate.

                      The rationalization is fine - yes, there's a real trade-off between "single blessed abstraction for everyone" and "multiple specialized libs, each optimized for its niche" - but it is a trade-off. You've started with "you get this for free in Haskell", and arrived to: "Haskell correctly chose not to give you this, and here's why the thing you want is actually bad..."

                      Streamly is a great Haskell library, does streaming well, has effect tracking, is performant. And it is absolutely not a drop-in transducer analog - Streamly composes over Streamly streams. If you have a conduit source, a pipes producer, and a streaming Stream, Streamly doesn't make one composed transformation apply to all three. It just adds a fourth ecosystem. So your "had Streamly been in base" hypothetical is exactly the Clojure move - pick one substrate, bless it, get uniformity - and now you're simultaneously using Streamly to argue that Haskell doesn't need transducers while pointing at a hypothetical world where Haskell would have done what Clojure actually did. "pick any generic enough interface and glue it with whatever you want in a single place for the entirety of your ecosystem" - this is basically what Clojure did.

                      Can we just find a middle ground in this debate that maybe actually works, something like:

                      "Sure, Clojure blessed a universal reducing substrate at the language level. Haskell didn't, and instead has multiple streaming libraries, each with stronger local guarantees about effects, memory, and back-pressure. Clojure trades uniformity across effect context - a transducer, works everywhere, at the cost of the compiler not telling you whether a given pipeline touches the world; Haskell chose effect-visibility in types"

                      Neither side is free. Clojure pays in runtime-only knowledge of effects. Haskell pays in fragmentation of streaming abstractions and the attendant ceremony of moving between them. That's the trade. It's not flaws being papered over; it's the shape of the bet. You can argue the bet is wrong, but you can't argue it wasn't made on purpose.

    • eduction a day ago ago

      It goes beyond a foldable, can be applied to streams. Clojure had foldables, called reducers, this was generalized further when core.async came along - transducers can be attached to core async channels and also used in places where reducers were used. The terminology is used to document the thing that various contexts accept (chan, into, sequence, eduction etc). They exist to make the language simpler and more general. They could actually allow a bunch of old constructs to be dispensed with but came along too late to build the whole language around.

      • instig007 a day ago ago

        > It goes beyond a foldable, can be applied to streams.

        > Clojure had foldables, called reducers, this was generalized further when core.async came along - transducers can be attached to core async channels and also used in places where reducers were used.

        Ok, you mean there's a distinction between foldables and the effectful and/or infinite streams, so there's natural divide between them in terms of interfaces such as (for instance) `Foldable f` and `Stream f e` where `e` is the effect context. It's a fair distinction, however, I guess my overall point is that they all have applicability within the same kind of folding algorithms that don't need a separate notion of "a composing object that's called a transducer" if you hop your Clojure practice onto Haskell runtime where transformations are lazy by default.

  • css_apologist a day ago ago

    Is there a gain of clojure transducers to js style iterators? - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

    https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

    both solve the copying problem, and not relying on concrete types

    • joe-user 19 hours ago ago

      I think the misunderstanding is about iterators "not relying on concrete types". Rather, iterators are the concrete type. Consider the example transformation from the transducers page:

        (def xf
          (comp
            (filter odd?)
            (map inc)
            (take 5)))
      
      You'll see there's no notion of a concrete type that the transformation operates on. It can work with vectors, seqs, core.async channels, etc. Now consider how that could be written in JavaScript such that it works on arrays, sets, generators, iterators, etc. without having to first convert to another type (such as an iterator). That is what's meant about transducers not being tied to a concrete type.
    • bjoli a day ago ago

      They compose. And can be passed around and be completely oblivious to how they will be reduced. With conj or sum or whatever they want. And you can extend them at any point at any end.

      They are like map, filter and friends, but they compose. I think of iterators as an iterator protocol and transducers as a streaming protocol. An iterator just describes how to iterate over a collection. Transducers are transformations that can be plugged into any point where data goes in one direction.

      • css_apologist a day ago ago

        js iterators work over lazy streams

        • bjoli 15 hours ago ago

          As I said, it is a protocol for iteration or data access. You cant take an iterator and hand it as a filter to a file reader. If I make a rot13 transducer I can hand it to a transduce function that transforms a collection. I can give it to a file reader as a transformer on any char.

          Transducers are way to express transformations.