Copy folders that contain files with specific extensions
2015-10-15
Recently I used a tool to re-tag and move my music library. This left me a neatly organized music library and also with a music folder that contained a lot of folders that only contained e.g. the album cover (in various formats). But some songs were not picked up by the auto-tagging mechanism, so some music files still persisted in that folder.
That is why I created a script that traverses the folder structure and copies all folders that (directly) contain at least one file out of a list of desired file types (e.g. '.mp3').
Situation
If I would run the script on this folder:
C:\tagged
├─Der dicke Polizist
│ └─und die Hoffnung stirbt zuletzt!
│ folder.jpg
└─ZSK
├─[2004] From Protest to Resistance
│ ├─ folder.jpg
│ └─ Kein Mensch ist illegal.mp3
└─[2015] Herz für die Sache
folder.jpg
I would expect this outcome:
C:\contains_music\tagged
└─ZSK
└─[2004] From Protest to Resistance
├─ folder.jpg
└─ Kein Mensch ist illegal.mp3
Solution
Criteria
- Each album that still contains music should be copied.
- But if an album contains no more music should not be copied.
- If an artist's folder does not contain any music (also not in subdirectories), it should not be copied
Reasoning
Write a recursive function that walks the directory tree and copies a folder if it contains at least one file with one of the desired file extensions.
If it does not contain any of those files, don't copy it - this will also suffice for criteria 3.
Code
from os import scandir
from os.path import join as join_path, splitext
from shutil import copytree
path = 'tagged'
dest = 'contains_music'
desired_file_extensions = ('.mp3', '.ogg', '.flac', 'm4a', 'aac', 'wma')
hidden_file_prefixes = ('.', '_')
other_file_types = set()
# recursive function that either returns true, copies a folder or returns null
def keep_folder(path):
entries = scandir(path)
keep = False
for entry in entries:
if entry.is_dir():
# recurse into this folder
next_path = join_path(path, entry.name)
if (keep_folder(next_path)):
print("Will keep folder: %s" % next_path)
copytree(next_path, join_path(dest, next_path))
elif entry.is_file():
if keep_file(entry):
# if one file is found that I want to keep, the whole folder will be copied
keep = True
# parsing the other files or folders in this folder will not be necessary
break
else:
# add this file to the list of file types that were not selected for copying
other_file_types.add(splitext(entry.name)[1])
return keep
# returns true if the file ends with one of the desired file extensions
def keep_file(entry):
return not entry.name.startswith(hidden_file_prefixes) \
and entry.name.lower().endswith(desired_file_extensions)
if __name__ == '__main__':
# recursively walk through the folder structure
keep_folder(path)
print('---\nFile types that were not picked for copying:\n', other_file_types)
This will output for instance:
Will keep folder: tagged\Beatsteaks
[...]
Will keep folder: tagged\ZSK\From Protest to Resistance
---
File types that were not picked for copying:
{'.url', '.ini', '.doc', '.txt', '.gif', '.zip', '.jpg'}
The last output lists all file the types that were not picked for copying, which is a good way to adapt this script for future uses. This is also the reason to keep it here - maybe I'll remember it if I need it again (or I'll write another one in an hour).