What is the advantage of using Numexpr instead of say Polars, Numba or Taichi?
Numexpr seems to sit in a sort of odd niche of having to do relatively simple arithmetic on in-core matrix data fast. For anything more complex, Polars seems more powerful and yet easier to understand, Numba and Taichi are both much more flexible in that they can be used to implement much more complex arithmetic (at the cost of writing lower level python code).
Numexpr basically evaluates raw strings, which makes any sort of heavy usage basically immune to linting, code inspection and refactoring.
Pandas has the eval() method on the Dataframe that uses numexpr as backend, but we generally never use it because of the upper mentioned maintenance issues and the availability of better alternatives.
numexpr is for different use cases than Polars or Taichi, which themselves are quite different from each other. numexpr is more akin to numba - it speeds up numpy usage.
numexpr speeds up different kinds of usage than numba does. numba is best at speeding up non-vectorizable usage of numpy like repeated operations on arrays inside of for-loops. numexpr speeds up regular numpy expressions, like `5 * x + 7` where x is an array, by avoiding intermediate allocations. It calculates the entire expression for each cell, rather than doing each individual operation into intermediate arrays. It uses strings for expressing calculation so that Python will not break down the expression and hand it off to numexpr one operator at a time like it does with numpy.
I understand what numexpr does, I don't understand why I'd use it.
Polars is able to lazy evaluate query plans without any unnecessary intermediate allocations, if I want to do algebra on dataframes, I'd use polars.
The narrow usecase seems to be that you have large matrices such that memory efficiency is a concern, but not so large that they don't fit into memory at all.
My point was that this seems like a very narrow niche to me, where I'd still rather use numba or taichi purely because I don't have to evaluate raw strings and can still rely on linters.
I don't know how you'd define niche, but there are many applications where multidimensional arrays of a uniform data type are needed and data frames are not. Image processing would be a simple example where you might have 2D arrays of floating point brightness values that you want to do operations on. Medical imaging is often 3 or 4 dimensions (spatial + time). Analysis of a spatially arranged set of sensor readings over time can easily be more than that. The increase in speed of deferred evaluation is nice in applications like this, especially given that very little change is needed.
You're right that it has trade-offs, like challenges with linting. But many practitioners in these domains are experts in the area of science or engineering involved, not in software development. The ease of adapting an existing script is a big deal for many of them. Many don't even know what a linter is, and numexpr predates (by several years) the high quality linters like ruff that so many of us rely on today.
What is the advantage of using Numexpr instead of say Polars, Numba or Taichi?
Numexpr seems to sit in a sort of odd niche of having to do relatively simple arithmetic on in-core matrix data fast. For anything more complex, Polars seems more powerful and yet easier to understand, Numba and Taichi are both much more flexible in that they can be used to implement much more complex arithmetic (at the cost of writing lower level python code).
Numexpr basically evaluates raw strings, which makes any sort of heavy usage basically immune to linting, code inspection and refactoring.
Pandas has the eval() method on the Dataframe that uses numexpr as backend, but we generally never use it because of the upper mentioned maintenance issues and the availability of better alternatives.
numexpr is for different use cases than Polars or Taichi, which themselves are quite different from each other. numexpr is more akin to numba - it speeds up numpy usage.
numexpr speeds up different kinds of usage than numba does. numba is best at speeding up non-vectorizable usage of numpy like repeated operations on arrays inside of for-loops. numexpr speeds up regular numpy expressions, like `5 * x + 7` where x is an array, by avoiding intermediate allocations. It calculates the entire expression for each cell, rather than doing each individual operation into intermediate arrays. It uses strings for expressing calculation so that Python will not break down the expression and hand it off to numexpr one operator at a time like it does with numpy.
I understand what numexpr does, I don't understand why I'd use it.
Polars is able to lazy evaluate query plans without any unnecessary intermediate allocations, if I want to do algebra on dataframes, I'd use polars.
The narrow usecase seems to be that you have large matrices such that memory efficiency is a concern, but not so large that they don't fit into memory at all.
My point was that this seems like a very narrow niche to me, where I'd still rather use numba or taichi purely because I don't have to evaluate raw strings and can still rely on linters.
I don't know how you'd define niche, but there are many applications where multidimensional arrays of a uniform data type are needed and data frames are not. Image processing would be a simple example where you might have 2D arrays of floating point brightness values that you want to do operations on. Medical imaging is often 3 or 4 dimensions (spatial + time). Analysis of a spatially arranged set of sensor readings over time can easily be more than that. The increase in speed of deferred evaluation is nice in applications like this, especially given that very little change is needed.
You're right that it has trade-offs, like challenges with linting. But many practitioners in these domains are experts in the area of science or engineering involved, not in software development. The ease of adapting an existing script is a big deal for many of them. Many don't even know what a linter is, and numexpr predates (by several years) the high quality linters like ruff that so many of us rely on today.