marc walter

Copy folders that contain files with specific extensions

2015-10-15

Recently I used a tool to re-tag and move my music library. This left me a neatly organized music library and also with a music folder that contained a lot of folders that only contained e.g. the album cover (in various formats). But some songs were not picked up by the auto-tagging mechanism, so some music files still persisted in that folder.

That is why I created a script that traverses the folder structure and copies all folders that (directly) contain at least one file out of a list of desired file types (e.g. '.mp3').

Situation

If I would run the script on this folder:

C:\tagged
├─Der dicke Polizist
│ └─und die Hoffnung stirbt zuletzt!
│    folder.jpg
└─ZSK
  ├─[2004] From Protest to Resistance
  │ ├─ folder.jpg
  │ └─ Kein Mensch ist illegal.mp3
  └─[2015] Herz für die Sache
      folder.jpg

I would expect this outcome:

C:\contains_music\tagged
└─ZSK
  └─[2004] From Protest to Resistance
    ├─ folder.jpg
    └─ Kein Mensch ist illegal.mp3

Solution

Criteria

  1. Each album that still contains music should be copied.
  2. But if an album contains no more music should not be copied.
  3. If an artist's folder does not contain any music (also not in subdirectories), it should not be copied

Reasoning

Write a recursive function that walks the directory tree and copies a folder if it contains at least one file with one of the desired file extensions.
If it does not contain any of those files, don't copy it - this will also suffice for criteria 3.

Code

from os import scandir
from os.path import join as join_path, splitext
from shutil import copytree

path = 'tagged'
dest = 'contains_music'
desired_file_extensions = ('.mp3', '.ogg', '.flac', 'm4a', 'aac', 'wma')
hidden_file_prefixes = ('.', '_')

other_file_types = set()

# recursive function that either returns true, copies a folder or returns null
def keep_folder(path):
    entries = scandir(path)
    keep = False
    for entry in entries:
        if entry.is_dir():
            # recurse into this folder
            next_path = join_path(path, entry.name)
            if (keep_folder(next_path)):
                print("Will keep folder: %s" % next_path)
                copytree(next_path, join_path(dest, next_path))

        elif entry.is_file():
            if keep_file(entry):
                # if one file is found that I want to keep, the whole folder will be copied
                keep = True
                # parsing the other files or folders in this folder will not be necessary
                break
            else:
                # add this file to the list of file types that were not selected for copying
                other_file_types.add(splitext(entry.name)[1])

    return keep

# returns true if the file ends with one of the desired file extensions
def keep_file(entry):
    return not entry.name.startswith(hidden_file_prefixes) \
        and entry.name.lower().endswith(desired_file_extensions)


if __name__ == '__main__':
    # recursively walk through the folder structure
    keep_folder(path)

    print('---\nFile types that were not picked for copying:\n', other_file_types)

This will output for instance:

Will keep folder: tagged\Beatsteaks
[...]
Will keep folder: tagged\ZSK\From Protest to Resistance
---
File types that were not picked for copying:
 {'.url', '.ini', '.doc', '.txt', '.gif', '.zip', '.jpg'}

The last output lists all file the types that were not picked for copying, which is a good way to adapt this script for future uses. This is also the reason to keep it here - maybe I'll remember it if I need it again (or I'll write another one in an hour).

Download