Detecting AV1-encoded videos with Python

(alexwlchan.net)

18 points | by surprisetalk 4 days ago ago

17 comments

  • breve an hour ago ago

    > I’ve saved some AV1-encoded videos that I can’t play on my iPhone.

    Sure you can. Install VLC on your phone and you'll be able to play the AV1 videos. Even the iPhone 7 released in 2016 can play AV1 video.

    Don't agonise over battery life. The dav1d decoder for AV1 is great:

    https://www.reddit.com/r/AV1/comments/1cf7eti/av1_dav1d_play...

    https://www.reddit.com/r/AV1/comments/1cg2wv4/dav1d_battery_...

    https://www.reddit.com/r/AV1/comments/1cgyace/dav1d_battery_...

    https://www.reddit.com/r/AV1/comments/1chpz2r/dav1d_battery_...

    • monster_truck an hour ago ago

      It's not just great. It's so good that even on much older android phones than the ones tested in those links the brightness of the screen has a larger impact.

      This is by design, so that even extremely dated smart tvs and etc can also benefit from the bandwidth savings.

      Fun fact: I can't say which, but some of the oldest devices (smart tvs, home security products, etc) work around their dated hardware decoders by buzzsawing 4k video in half, running each piece through the decoder at a resolution it supports, then stitching them back together.

  • crazygringo 21 minutes ago ago

    This is a perfectly fine blog post, but is about something so basic I don't understand why it's submitted to HN.

    Yes, ffprobe and mediainfo are the two common tools for this. This just feels like something that belongs as the answer to an everyday StackOverflow question. I don't understand what it's doing on the front page of HN.

  • zahlman 2 hours ago ago

        av1_videos = {
            p
            for p in glob.glob("**/*.mp4", recursive=True)
            if is_av1_video(p)
        }
    
        assert av1_videos == set()
    
    Building a set just to check if it's empty is a bit more complexity than necessary. A more direct way that also bails out early:

        assert not any(is_av1_video(p) for p in glob.glob("**/*.mp4", recursive=True))
    
    Equivalently (de Morgan's law):

        assert all(not is_av1_video(p) for p in glob.glob("**/*.mp4", recursive=True))
    • KwanEsq 2 hours ago ago

      > A more direct way that also bails out early

      If it bails out early it is of no use to them.

      > This means that if the test fails, I can see all the affected videos at once. If the test failed on the first AV1 video, I’d only know about one video at a time, which would slow me down.

  • Scaevolus an hour ago ago

    Note that ffprobe can output JSON which is much easier to handle than CSV. I have this snippet in my bashrc:

    ffpj() { for f in "$@"; do ffprobe -v quiet -print_format json -show_format -show_streams "$f"; done }

  • wolttam 2 hours ago ago

    Somehow I thought this was going to be about detecting AV1 based on the decoded video frames, which would have been interesting!

    • avidiax 2 hours ago ago

      Yeah, I would think that the simulated grain of AV1 might be characterizable, even though, IIRC, it is pretty sophisticated.

  • avidiax 2 hours ago ago

    My first question is, where is this guy getting AV1 videos? Never seen these on the high seas.

    Also, given that these videos are going to be reencoded, which is tremendously expensive, I feel that any optimization in this step is basically premature. Naively launching ffprobe 10,000 times is probably still less heavyweight than 1 reencode.

    • monster_truck 36 minutes ago ago

      I exclusively download av1 encodes from places like tbp. It has fantastic quality for the filesize, and AV1 also benefits the most from the trick of encoding sdr content in 10 bit (more accurate quantization at a smaller size). Crazy that we can fit ~two hours of 1080p video at better than netflix quality (they bias their psnr/etc a little low for my eyes) on a single CD.

      I'm not sure it's fair to call reencodes expensive. Sure, its relatively expensive to using ffprobe, but any 4 series nvidia gpu with 2 nvenc engines can handle five? simultaneous realtime encodes, or will get up to near 180fps if it isn't being streamed. Our "we have aja at home" box with four of them churned through something like 20,000 hours of video in just under two weeks.

    • breve an hour ago ago

      YouTube encodes video to AV1.

      Right click on a YouTube video and select "Stats for Nerds" to see which format it's using in your browser. AV1 will be something like "av01.0.09M.08".

      You've probably watched a lot of AV1 video without realising it.

    • KwanEsq 2 hours ago ago

      Sounds like you're just sailing the wrong seas. Some have plenty of AV1. Though those tend to be more obviously advertised as such, I believe, so perhaps this is about downloads from YouTube.

    • senand 2 hours ago ago

      Off-topic, but it’s actually a she

    • 01HNNWZ0MV43FF 2 hours ago ago

      Maybe he transcoded them. I know some archivers who download in H.264 but then transcode to H.265 to save on disk. (I guess they don't seed?)

  • nick238 2 hours ago ago

    Is launching an ffmpeg process so heavyweight that there's a reason to avoid it? If anything, it feels like it would trivialize parallelism, which is probably a feature, not a bug, if you have a bunch of videos to go through.

    • zahlman 2 hours ago ago

      TFA claims:

      > This is shorter than the ffprobe code, and faster too – testing locally, this is about 3.5× faster than spawning an ffprobe process per file.

      And the calls to the MediaInfo wrapper are not really harder to parallelize. `subprocess.check_output` is synchronous, so that code would have to be adapted to spawn in a loop and then collect the results in a queue or something. With the wrapper you basically end up doing the same thing, but with `multiprocessing` instead. And you can then just reuse a few worker processes for the entire job.

    • 01HNNWZ0MV43FF 2 hours ago ago

      Python must have libav bindings somewhere, you could certainly run that check in-process.

      Off the top of my head, it's probably in the container metadata, so you'd just need libavformat and not even libavcodec. Pass it a path, open it, scan the list of streams and check the codec magic number?