The quadratic sandwich

(fedemagnani.github.io)

144 points | by cpp_frog 6 days ago ago

16 comments

  • laGrenouille 3 days ago ago

    Great visualizations. Really enjoyed having a well-written example where mathematical proofs directly help with understanding a practical application.

    I wonder what would happen with this analysis if a momentum term was added to the gradient descent. It seems that it would fix the specific failure modes in the examples, but I wonder if there's a corresponding mathematical way of categorizing what kinds of functions can(not) be quickly optimized with GD + momentum.

  • drunello 2 days ago ago

    That's my article! Thank you so much to the user who posted it here <3

    • f5129cac 2 days ago ago

      Thank you for writing this article! It really helped me clear up my understanding of why you care about min and max eigenvalues of a Hessian matrix, something I've been confused about for some time. I have https://fedemagnani.github.io/math/2025/07/04/fenchel.html queued up to read next (convex conjugates being another topic that confuse the hell out of me).

      • drunello 2 days ago ago

        Haha that's great and excited to hear feedback, thank you so much! In these articles I deliberately want to keep a casual tone, just for grasping the concept, so probably a more rigorous material is very important as a follow-up

  • xuzhenpeng 3 days ago ago

    The animation is very good, making the article easy to understand

  • explainforwhat 2 days ago ago

    It frustrates me when math explainers, and textbooks, seem to start from the "here's why our methods are insufficient to solve our problem" and fail to provide an example of the problem they are trying to solve.

    What's the question this method is attempting to answer? What does an answer look like? How does this method lead to it?

    > If you have ever tried to minimize a function with gradient descent

    "and if otherwise, go kick sand," I guess.

  • Scene_Cast2 3 days ago ago

    There is one very clear example that I ran across due to the reasons outlined in the article. If you have a wavelet and you're trying to slide it around to make it fit, that will fail spectacularly. There are lots of problems that boil down to basically the above.

    The neural net answer is being able to spawn a wavelet at any position, as opposed to tweaking the position of an existing one.

  • 20k 2 days ago ago

    This is a great article and its super helpful, thanks to whoever wrote it!

  • CarVac 3 days ago ago

    Simplex methods can handle those tough situations, though.

    • FabHK 2 days ago ago

      Simplex is not applicable. Simplex only minimises a linear function (f(x)=c'x) under linear inequality constraints (Ax≤b). The minimisation problem here is unconstrained, but (very) non-linear.

      • CarVac a day ago ago

        I guess I wasn't being precise, I meant Nelder-Mead.

  • vzaliva 2 days ago ago

    Kudos for beatiful formulae rendering.

  • xzp12138 3 days ago ago

    [flagged]