Show HN: Fast and Exact Algorithm for Image Merging

(github.com)

122 points | by C-Naoki 4 days ago ago

31 comments

  • scottdupoy 4 days ago ago

    Interesting to see something like this!

    My computer science masters thesis was based on the same goal. I used a 2D convolution which meant you can merge images with inexact overlaps. I had to run a high-pass filter first to limit the image details to their edges only or else the convolution incorrectly matched bright areas.

    In reality merging pictures is further complicated because the source images may be slightly rotated relative to each other and also due to the images being slightly curved due to lens distortion.

    My supervisor wanted me to do a PHD on the topic!

    • gsliepen 4 days ago ago

      I used this for several applications. Note that 2D convolution can be done efficiently using FFTs, and filtering can be combined with this very efficiently: if you see your high-pass filter as a convolution of its own, you can pre-calculate its FFT, and just multiply it almost for free in the frequency domain with the two images you want to convolve.

      • scottdupoy 4 days ago ago

        That's exactly how it worked, hand rolled FFT and filtering following the method in "Numerical Recipes for C"

        • gsliepen 4 days ago ago

          Oh, Numerical Recipes is nice but their algorithm implementations are not really state-of-the-art. I highly recommend using FFTW (https://fftw.org/) as it will likely give you a substantial performance improvement.

    • C-Naoki 4 days ago ago

      Thank you for your comments! For sure, the CNN is expressive for learning the characteristics of images. However, in this development, I tried to not use deep-learning because I believe that it is important to provide fast, consistent results without the need for training data. If you are particularly interested in this app, I would be glad if you could create a pull request to extend the algorithm.

      • jdhwosnhw 4 days ago ago

        The parent comment said nothing about using deep learning. Convolution is not the same as using a CNN. I interpreted their comment as meaning they used a 2D convolution (presumably a 2D cross correlation, actually) to find regions of overlap

        • scottdupoy 4 days ago ago

          Yes you're right it was a 2D cross-correlation which is very analogous to a convolution

          • r_hanz 4 days ago ago

            If memory serves… the only difference is that one of the kernels being convolved is reversed for convolution.

    • sitkack 4 days ago ago

      The images might not be coplanar and the overlapping composition should be 2d planes in 3d space or go full gaussian splat.

  • mightyham 4 days ago ago

    What are the practical applications for this tool? Typically stitching images for something like panoramas requires significantly more advanced image processing algorithms because the pixels do not perfectly overlap.

    • jdiff 4 days ago ago

      Even in web browsers that support screenshotting an entire page, websites often unload elements that are off-screen. A solution like this can take a bunch of screen-length images and stitch them into a full view of the document.

      • hackernewds 4 days ago ago

        There are Chrome extensions that do this well already

    • C-Naoki 4 days ago ago

      Thank you for comments! Certainly, this application may not be able to handle any kinds of images. However, I tried to stitch images without using deep-learning. Therefore, the strength of this app is that when this app receives the same images, it always produces consistent results. In the future, I will try to develop a more effective image merging method in more generalized scenario.

      • jasonjmcghee 4 days ago ago

        Is deep learning state of the art for something like this?

        Would have expected it to just be kernel based.

        Regardless, you can have fully deterministic deep learning approaches. You can use integers, run on a CPU, and seed everything.

  • tobr 4 days ago ago

    Interesting! The example shows two images that appear to have a pixel-perfect matching region. Is that a requirement or does it work with images that are only somewhat similar?

    • asadm 4 days ago ago

      seems to be doing some mean-square error to find best matching region.

  • therobot24 4 days ago ago

    look at those for loops! should look into fft-based correlation, can even do so with melon transform for scale and circular harmonic transform for rotation

  • mathisd 3 days ago ago

    Nice project of yours! I am a data science student but I never looked into Computer Vision. Until a few days ago, when I started watching a series of short courses on a YouTube channel called First Principles of Computer Vision [0]. I found it fascinating and the math behind is truly beautiful, concise and efficient.

    [0] https://www.youtube.com/@firstprinciplesofcomputerv3258 strongly recommend to check-out any playlist. Best courses I have had since a long time.

  • fullspectrumdev 4 days ago ago

    I’ve been looking for something like this for creating surveys using drone footage - extract every “n” frames from the video, then stitch ‘em up somehow to make a “layer”.

    There’s existing software for this kind of work, but I’ve been in the mood to reinvent the wheel a bit for some strange reason.

  • martinmaly21 4 days ago ago

    Nice work!

    What's the latest state of the art in image stitching these days? From what I can tell, there was a bunch of research done on it in the past, but with all the recent advancements in AI, not much has changed on this front. I'd love to be wrong though!

  • sorenjan 4 days ago ago

    Related to this, is there a name for the effect when you stitch together video frames into a static background while keeping the moving objects moving? The best example I can think of is this Bigfoot video[0, 1], where the shaky footage has been combined into a bigger canvas with "Bigfoot" moving through it. It's a combination of video stabilization and image panorama, but with some smarts to only keep one version of the moving object in each finished frame.

    [0] https://www.youtube.com/watch?v=Q60mSMmhTZU [1] https://x.com/rowancheung/status/1641519493447819268

    • iamjackg 4 days ago ago

      A long time ago I did some work to do exactly this in an automated fashion using ffmpeg. It wasn't perfect, but it was better than nothing. I tried going back through my bash history, and the last related entry was this command line:

          ffmpeg -i C0119.MP4 -vf vidstabtransform=interpol=no:crop=black:optzoom=0:zoom=0:smoothing=0:debug=1:input="weirdzoom.trf",unsharp=5:5:0.8:3:3:0.4 kittens-stabilized.mp4
      
      I think the trick was to set all the stabilization parameters to 0 and crop=black to force ffmpeg to move the image around as much as necessary and zoom everything out.

      EDIT: nevermind, it was more complicated than that. I actually wrote a Python script that modified the motion tracking information generated by ffmpeg to reduce the zoom amount and fit everything within a 1920x1080 frame. Man, I wish I'd added comments to this.

      The https://www.reddit.com/r/ImageStabilization/ subreddit has a lot of posts in that style, but from the research I did it seems like it's mostly done manually by lining up each frame as a separate layer and then rendering an animation that adds one layer per frame.

    • chompychop 3 days ago ago

      I believe the term in research literature for this is "Dynamic Video Synopsis". Check this CVPR 2006 paper for instance: https://www.cs.huji.ac.il/~peleg/papers/cvpr06-synopsis.pdf

  • tsumnia 4 days ago ago

    Nicely done and keep up the practice. I recall during my Masters needing to translate facial landmark points from a Cartesian coordinate system into points that could would appear on their respective images. It wasn't for anything major, I just wanted a visual representation of my work. Its these little "neat" projects that help build larger breakthroughs.

  • wmanley 4 days ago ago

    See also: Hugin - Panorama photo stitcher. I used to use a lot back in ~2006 for making panoramas. It automatically finds "control points" in your photos, figures out which ones are shared between the photos and uses that information to determine the relative positions of the photos, and your lens parameters.

    Once it does that it can stitch the photos together. It does this by projecting the photos onto a sphere, and then taking a picture of that sphere using whatever lens parameters you want.

    https://hugin.sourceforge.io/

    • kouru225 4 days ago ago

      This app never works for me and I don’t know why. Photoshop’s auto merge always works but this one doesn’t

  • lugao 4 days ago ago

    Why a naive pixel matching library got so many likes here?

    • debo_ 4 days ago ago

      Because it's nice to see people trying things for themselves, even if they aren't novel?

  • kouru225 4 days ago ago

    Oh shit thank you so much. I’ve always had to use photoshop for this

    • C-Naoki 4 days ago ago

      I'm glad it was helpful!

  • a257 4 days ago ago

    In the biomedical sciences, we typically use a tool called BigStitcher [0], which is bundled with ImageJ [1]

    [0] https://www.nature.com/articles/s41592-019-0501-0 [1] https://imagej.net/plugins/bigstitcher/