I had to compare multiple stream dumps I’ve collected from different places. Each file had a timestamp from when it was generated (i.e. downloaded) in it. So, even the exact same dump would have different timestamps if downloaded twice. Also, some files had embedded cover art and tags while others didn’t have any tags.
To compare the audio stream only, I came up with this command:
find . -type f -iname "*.m4a" -exec sh -c "ffmpeg -i \"{}\" -map_metadata -1 -c copy -f adts pipe:1 | sha256sum -b" \; -print 2>/dev/null | sed 'N;s/\*-\n//g'Code language: JavaScript (javascript)
This finds all *.m4a files and extracts the audio TS stream. This is then piped through sha256sum, followed by -print, which will output something like:
975e3cb5ae1bea8ffa993eab1d4d9fb484fd6cc1f5d70376428b836111e7595a *-
./folder/file.m4a
*- signals that the data came via stdin.The sed command takes these two lines and removes the *- and the newline, resulting in something like:
975e3cb5ae1bea8ffa993eab1d4d9fb484fd6cc1f5d70376428b836111e7595a ./folder/file.m4a
And because this is only the checksum of the audio stream, files with identical audio data will have identical checksums – even if they have completely different metadata.
Reposts
Likes