admin

All posts tagged admin

If you’ve ever had to write scripts that process large human-maintained filesystems, you’ll know what a pain “special characters” can be. It only takes one lousy single quote in a filename somewhere deep in a directory structure for your nightly jobs to start failing.

Fortunately, I was able to brute-force this problem in one particular environment by removing any undesirable characters. The following script is obviously pretty hacky but it got me a good result.

When it finds a directory entry that does not comply it tries to rename that entry to the compliant form. If the compliant form already exists it prefixes it with a timestamp so as to make it unique.

conv() contains the logic for converting a messy filename to a neat and tidy one, so if you want to be less tolerant than me with the characters that you permit in your filenames, that’s the place to make your change.

This is not my prettiest bit of code, but it did get me a good result… and that’s what we’re all about!

#!/usr/bin/python

# Copyright (c) 2013, James Downie <jdownie@gmail.com>
# 
# Permission to use, copy, modify, and/or distribute this
# software for any purpose with or without fee is hereby
# granted, provided that the above copyright notice and this
# permission notice appear in all copies.
# 
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
# WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
# THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR
# CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING
# FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
# CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

import os
import time
import shutil
import unicodedata
import re

def conv(str):
  str = list(str)
  ret = list()
  for c in str:
    if ord(c) < 128:
      ret.append(c)
    else:
      ret.append("?")
  ret = "".join(ret)
  ret = re.sub('\'', '?', ret)
  ret = re.sub('\`', '?', ret)
  ret = re.sub('\!', '?', ret)
  ret = re.sub('\:', '?', ret)
  ret = re.sub('\"', '?', ret)
  ret = re.sub('\%', '?', ret)
  ret = re.sub('\t', ' ', ret)
  ret = re.sub('\&', 'and', ret)
  ret = re.sub('#', 'and', ret)
  ret = ret.replace("\\", "")
  return ret

def walkDir(path):
  if os.path.isdir(path):
    for dirpath, dirnames, filenames in os.walk(path):
      for filename in filenames:
        walkDir("/".join([ dirpath, filename ]))
      for dirname in dirnames:
        converted = conv(dirname)
        src = "/".join([dirpath, dirname])
        dst = "/".join([dirpath, converted])
        e = dst
        if dirname != converted:
          if os.path.exists(dst):
            print "Collission:", dst
            converted = "-".join([ time.strftime("%Y%m%d%H%M%S"), converted ])
            dst = "/".join([dirpath, converted])
            shutil.move(src, dst)
            walkDir(dst)
          else:
            shutil.move(src, dst)
            walkDir(dst)
  else:
    elements = path.split("/")
    entry = elements.pop()
    converted = conv(entry)
    elements.append(converted)
    converted = "/".join(elements)
    if converted != path:
      if os.path.exists(converted):
        elements = converted.split("/")
        entry = elements.pop()
        converted = "-".join([ time.strftime("%Y%m%d%H%M%S"), entry ])
        elements.append(converted)
        converted = "/".join(elements)
        shutil.move(path, converted)
      else:
        shutil.move(path, converted)

walkDir(u"/Volumes/bigMess")