Folder Backup Utility

From IronPython Cookbook

This is an example of a folder backup utility (included is a command-line option parser also). Although originally designed to allow the backing up of Visual Studio project folders, it will walk the subfolders of a set of directories and create a zip file of any folders which have one or more modified files. It takes advantage of the open source SharpZipLib to create zip archives.

Contents

Features

  • Archives entire folders into .zip files.
  • Does not back up folders which have not changed.
  • Easy to set up, use, and configure.
  • Ability to skip certain folders.
  • Accepts command-line paramters.
  • Ability to keep one backup per minute, day, week, or year.
  • Great way to back up all those Microsoft Visual Studio projects without any hassle.

Code

The following code files (BackupFolders.py, CmdLine.py, ICSharpCode.SharpZipLib.dll) must be placed within the same directory.

BackupFolders.py

"""Folder-to-Archive Backup Tool

This utility will backup the directories that you indicate into archives
based on the directory names.

Version	Changes
0.5		Initial version
0.6		Encapsulated all functionality into a class decarlation while
		maintaining command-line execution functionality.
0.9		Changed the CompilePreviousBackupTimes method to look at the
		creation time for the backup archive instead of parsing the
		file name with a regular expression.  The __CreateBackup method
		signatire as a fix was made for when the archive file name
		already existed and the creation date was not modified.  Added
		more validation to pyFolderBackup.__init__.
1.0		Fixed bugs and changed to an IronPython version of CmdLine.
"""
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
__author__  = 'Dag R. Calafell, III'
__date__    = '2007-05-31'  # yyyy-mm-dd
__module_name__ = "Folder-to-Archive Backup Tool"
__short_cright__= " Creative Commons License" # http://creativecommons.org/licenses/by/3.0/
__version__ = '1.0'     # Human-Readable Version number
version_info = (1,0,0)  # Easier format: if version_info > (1,2,5)

import re
import sys
import clr
clr.AddReference("ICSharpCode.SharpZipLib")
from ICSharpCode.SharpZipLib.Zip import FastZip
from ICSharpCode.SharpZipLib import SharpZipBaseException
from CmdLine import CmdArgParser
from System import Array, DateTime, Environment
from System.IO import Path, File, Directory, SearchOption


class pyFolderBackup:
	"""Class encapsulates all the functionality needed to create and manage backups of directories.
	It create new backups only when a file within the directory has changed."""
	def __init__(self, source_directories, destination_directory, skip_directories=None, verbose=False, target_format=None):
		# Validation
		if destination_directory == None:
			raise ValueError, 'Please specify an archive directory.  %s' % destination_directory
		elif type(destination_directory) != str and hasattr(destination_directory, '__getitem__'):
			destination_directory = destination_directory[0]
			print 'Warning: Only one archive directory is supported.  Script will use only the first directory in the list.'
		elif not Directory.Exists(str(destination_directory)):
			raise ValueError, 'The destination directory does not exist.  %s' % destination_directory
		if not hasattr(source_directories, '__getitem__'):
			raise TypeError, 'The source_directories parameter must be enumerable.  A \'%s\' was passed.' % type(source_directories)
		if skip_directories != None and not hasattr(skip_directories, '__getitem__'):
			raise TypeError, 'The skip_directories parameter must be enumerable.  A \'%s\' was passed.' % type(skip_directories)
		if type(destination_directory) != str:
			raise TypeError, 'The destination_directory parameter must be a string.  A \'%s\' was passed.' % type(destination_directory)
		if verbose == None:
			verbose = True
		if target_format != None:
			if type(target_format) != str:
				raise TypeError, 'The target_format parameter must be a string.  A \'%s\' was passed.' % type(target_format)
			elif len(target_format.split('%s')) != 2:
				raise ValueError, 'The target_format parameter must be a string with at least one occurance of \'%s\', where the project name is subsituted.  \'' + str(target_format) + '\' was passed'
		if source_directories == None or len(source_directories) == 0:
			raise ValueError, 'No source directories to process.'

		self.VerboseMode = bool(verbose)
		self.SourceDirs = source_directories
		self.DestDir = destination_directory
		self.__fz = FastZip()
		if skip_directories == None:
			self.SkipDirs = [];
		else:
			self.SkipDirs = skip_directories

		if target_format == None or len(target_format) == 0:
			slash = Path.DirectorySeparatorChar
			if destination_directory[-len(slash)] == slash:
				slash = ''
			self.TargetFormat = destination_directory + slash + '%s ' + DateTime.Now.ToString("yyyy.MM.dd") + '.zip'
		else:
			self.TargetFormat = target_format

	# -------------------- Functions --------------------

	def GetMostRecent(self, directory):
		"""Determines the most recent modified date within all files of a directory.
		Input:
			directory: The directory to walk.
		Output:
			The most recent date & time at which a file within the directory structure has changed.
		"""
		mostRecent = DateTime(1990, 1, 1)
		for f in Directory.GetFiles(directory, '*.*', SearchOption.AllDirectories):
			last = File.GetLastWriteTime(f)
			if mostRecent < last:
				mostRecent = last
		return mostRecent

	def __CreateBackup(self, directory, creation):
		"""Creates a zip backup of a directory.
		Input:
			directory: The directory to back up.
			creation:  The date and time of 
		Output:
			None
		"""
		if type(directory) != str:
			raise TypeError, 'The directory parameter must be a string.  A \'%s\' was passed.' % type(directory)
		if type(creation) != DateTime:
			raise TypeError, 'The creation parameter must be of type System.DateTime.  A \'%s\' was passed.' % type(creation)
		
		# Create the backup
		proj = directory.split('\\')[-1]
		target = self.TargetFormat % proj
		self.__fz.CreateZip(target, directory, True, '.*')
		File.SetCreationTime(target, creation)
		print Path.GetFileName(target) + ' created.'

	def CompilePreviousBackupTimes(self):
		"""Create a dictionary of backup times for each project.
		Input:
			None
		Output:
			A dictionary of the most recent backup date by directory name.
		"""
		reBackupInfo = re.compile(r"([^\\]+)\s(\d{4}.\d{2}.\d{2})\.zip")
		previousBackups = {}
		try:
			files = Directory.GetFiles(self.DestDir, '*.zip')
		except Exception, e:
			print '%s' % e
			print ''
			print 'Archive Directory: %s' % self.DestDir
			return

		for f in files:
			m = reBackupInfo.search(f)
			if m:
				projName = m.group(1)
				dte = m.group(2)
				filenameDate = DateTime(int(dte[0:4]), int(dte[5:7]), int(dte[8:10]))
				creationDate = File.GetCreationTime(f)
				if self.VerboseMode and filenameDate.Day != creationDate.Day and filenameDate.Month != creationDate.Month and filenameDate.Year != creationDate.Year:
					print 'Warning: The filename \'%s\' does not agree with the date that the file was created. (%s != %s)' % (Path.GetFileName(f), filenameDate, creationDate)
				
				if projName not in previousBackups or previousBackups[projName] < creationDate:
					previousBackups[projName] = creationDate
		return previousBackups

	def RunBackup(self):
		"""Runs a backup.
		Input:
			None
		Output:
			None
		"""
		# Dictionary of backup times for each project.
		prevBackups = self.CompilePreviousBackupTimes()
		
		if self.VerboseMode:
			print ''
			print 'Most Recent Backups'
			if prevBackups != None:
				for k in prevBackups.Keys:
					print '   %-25s %s' % (k,prevBackups[k].ToString('MM/dd/yyyy'))
				print ''
			else:
				print '   None'
		
		# Loop through the projects to backup
		for x in self.SourceDirs:
			if Directory.Exists(x):
				for proj in Directory.GetDirectories(x):
					if proj in self.SkipDirs:
						continue
					mostRecent = self.GetMostRecent(proj)
					projName = proj.split('\\')[-1]
					
					# Test if this project has already been backed up
					if not projName in prevBackups or mostRecent > prevBackups[projName]:
						# print '\nmostRecent > prevBackups[projName]\n%s > %s\n' % (mostRecent, prevBackups[projName])
						self.__CreateBackup(proj, mostRecent)
					elif self.VerboseMode:
						print '%s backup created on %s is up to date.' % (projName, prevBackups[projName].ToString('MM/dd/yyyy'))

if __name__ == '__main__':
	if len(sys.argv) == 1:
		print 'Backup Visual Studio Projects'
		
		user = Environment.GetEnvironmentVariable('username')
		
		# Directories to be backed up.
		source = [r'C:\Documents and Settings\%s\My Documents\Visual Studio 2005\Projects' % user,
				  r'C:\Documents and Settings\%s\My Documents\Visual Studio 2005\Websites' % user]
		
		# Directories to skip (do not back up).
		skip = [r'C:\Documents and Settings\%s\My Documents\Visual Studio 2005\Projects\VSMacros80' % user]
		
		# All backups are stored in this backup directory.
		target_directory = 'C:\Documents and Settings\%s\My Documents\Visual Studio 2005\pyBackup' % user
		
		# The zip file name format.
		targetFmt = target_directory + '\\%s ' + DateTime.Now.ToString("yyyy.MM.dd") + '.zip'
		
		# Run the backup
		pyFolderBackup(source, target_directory, skip, True, targetFmt).RunBackup()
	else:
		parser = CmdArgParser(sys.argv)
		pyFolderBackup(parser['source'], parser['target'], parser['skip'], parser['verbose'], parser['format']).RunBackup()
	
	# This allows the command window in Windows to stay open until the user hits enter.
	print ''
	raw_input('Hit enter to exit.')

CmdLine.py

This is a simple class to allow for easier parsing of command-line options without installing CPython.

"""Simple Command-Line Option Parsing Tool

This utility class will parse command-line options without validation.
It does not require installation of the standard CPython library.

Version	Changes
0.5		Initial version in C#
0.6		Converted to IronPython
1.0		Added some test cases.
"""
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
__author__  = 'Dag R. Calafell, III'
__date__    = '2007-06-18'  # yyyy-mm-dd
__module_name__ = "CmdLine"
__short_cright__= " Creative Commons License" # http://creativecommons.org/licenses/by/3.0/
__version__ = '1.0'     # Human-Readable Version number
version_info = (1,0,0)  # Easier format: if version_info > (1,2,5)

import System
from System import Array
from System.Text.RegularExpressions import Regex
from System.Collections.Generic import Dictionary, List

class CmdArgParser(object):
	'''Very simple command-line argument/option parser. Usage:
	
	import CmdLine
	import sys
	
	cmds = CmdArgParser(sys.argv)
	# cmds = CmdArgParser(System.Environment.CommandLine)
	# cmds = CmdArgParser(('/tEst','/teSt:"fancy"', '/tester:""','/go+'))
	if not cmds:
		# No command-line arguments
		sys.exit(1)
	
	# /target="C:\"
	if cmds['target'] is None:
		print 'Please specify the target variable.'
		sys.exit(2)
	
	# cmds = ['test', 'tester', 'go']
	# cmds['test'] = [True, 'fancy']
	# cmds['tester'] = 
	# cmds['go'] = True

	
	Notes:
	*	The initializer can accept a list, tuple, array, string, or any .NET class which implements
		the GetEnumerator method.
	*	All keys are converted to lower case.
	*	All options must be preceeded by a switch.  For example, "myprogram.exe dag.py" would
		return no command-line options whereas "myprogram.exe -f:dag.py would contain the 'f' key.
	*	Command-line options may contain any character except new line \n, carriage return \r,
		pipe |, colon :, and double quote ".
	*	When the value of a option is specified more than once, then the value for that option
		is a list of the values.
	'''
	def __init__(self, args):
		self.__params = Dictionary[str, list]()
		
		# The regex needs to parse a single string, not a list
		if args == None:
			self.CommandLine = None
			return
		elif type(args) in (list, tuple, Array) or hasattr(args, 'GetEnumerator'):
			args = ' '.join(str(i) for i in args)
		else:
			args = str(args)

		# Remove any arguments at the begining which do not start with a switch character
		while args.count(' ') != 0 and args[0] not in ('/', '-'):
			args = args[args.index(' ') + 1:]

		if len(args) == 0:
			return

		self.CommandLine = args
		args += ' '

		# This same regex failed when using the builtin 're' module under IronPython 1.1.
		for m in Regex.Matches(args, "[/-]-?([^\"\\r\\n|:]+?)(?:[:=](\"?)([^\"\\r\\n|]*?)\\2)?\\s+"):
			if m.Success:
				g = m.Groups
				key = g[1].Value.lower()

				if g[3].Success:
					val = g[3].Value
				elif g[2].Success and g[2].Value == '"':
					val = ''
				else:
					val = None

				# Handle pluses or minuses after a specifier
				if val == None:
					if key[-1] == '-':
						val = False
						key = key[:-1]
					elif key[-1] == '+':
						val = True
						key = key[:-1]
					else:
						val = True

				if self.__params.ContainsKey(key):
					self.__params[key].Add(val)
				else:
					if type(val) != list:
						val = [val]
					self.__params.Add(key, val)

	@property
	def Keys(self):
		return list(self.__params.Keys)

	def __getitem__(self, name):
		name = str(name)
		if not self.__params.ContainsKey(name):
			return None
		obj = self.__params[name]
		if len(obj) == 0:
			return '' # [''] converted to [], so convert back
		if len(obj) == 1:
			return obj[0]
		return obj

	def __nonzero__(self):
		'''Allows boolean conversion for use such as:
		
		cmds = CmdLine.CmdArgParser(sys.argv)
		if not cmds:
			# No command-line arguments
			sys.exit(1)
		'''
		return self.__params.Count != 0

	def __len__(self):
		'''Returns the number of parsed parameters.'''
		return self.__params.Count

	def __hash__(self):
		'''Allows this instance to be placed into a dictionary.'''
		return self.__params.GetHashCode()

if __name__ == '__main__':
	# Tests the above class
	assert len(CmdArgParser(('/verbose'))) == 1, 'Test 1'
	a = CmdArgParser(('/tEst','/teSt:"fancy"', '/tester:""','/go+','/stop-','/safemode'))
	assert a != None, 'Test 2'
	assert len(a) == 5, 'Test 3'
	assert a['test'] != None, 'Test 4'
	assert a['go'] == True, 'Test 5'
	assert a['safemode'] == True, 'Test 6'
	assert a['stop'] == False, 'Test 7'
	assert len(CmdArgParser((''))) == 0, 'Test 8'
	assert len(CmdArgParser('')) == 0, 'Test 9'
	assert len(CmdArgParser('/verbose')) == 1, 'Test 10'

ICSharpCode.SharpZipLib.dll

The #ziplib open source C# libary was used to handle zip file creation. "ICSharpCode.SharpZipLib.dll" must be included in the same directory as the above code files. It can be downloaded from icsharpcode.net. More information about how to use the library can be found on the #ziplib wiki.

Output

Example output specifies what actions were taken during the backup process.

Most Recent Backups
   IconExplorer              05/31/2007
   pyInterfaceHelper         06/11/2007
   Utilities                 06/08/2007

IconExplorer backup created on 05/31/2007 is up to date.
pyInterfaceHelper backup created on 06/11/2007 is up to date.
Utilities 2007.06.18.zip created.

Hit enter to exit.

Usage

The easiest way to use this utility class is by creating a windows batch file or shortcut.

Shortcut

A sample shortcut may have the following properties set, assuming that the script is placed in the "C:\Program Files\IronPython-1.1\Lib" directory.

Target:
"C:\Program Files\IronPython-1.1\ipy.exe" "C:\Program Files\IronPython-1.1\Lib\BackupProjects.py"

Start In:
C:\Program Files\IronPython-1.1"

Batch File

An example batch file, "RunBackup.bat", which runs the backup utility.

@echo off

"C:\Program Files\IronPython-1.1\ipy.exe" "C:\Program Files\IronPython-1.1\Lib\BackupProjects.py" /source:"C:\Documents and Settings\dcalafell\My Documents\Visual Studio 2005\Projects" /target:"C:\Documents and Settings\dcalafell\My Documents\Visual Studio 2005" /verbose

@echo.
pause

Configuration

The default configuration will backup the Visual Studio 2005 projects and websites folders for the current user and ignores the 'VSMacros80' project.

The source_directories are the directories which the script walks to create backups.

The destination_directory is the directory where all backups are stored. This could be a network share or removable flash drive.

The skip_directories are the directories to not archive when walking the source directories. For example, these projects may already be part of a source control repository so there is no point in making another backup.

When the verbose option is True the script will print more information.

It allows only one backup for each day. By providing or modifying the default target_format (parameter for pyFolderBackup.__init__) for the class, you can make the backups as granular as a millisecond. There is a balance between the benefit of having backups and how many must be stored.

Notes / Limitations

The utility assumes that the modified date on the backup archive is when the backup was created and it has not been tested on Mono.

Version History

1.0	First publicly-available version -
	Fixed bugs and changed to an IronPython version of CmdLine.

0.9	Changed the CompilePreviousBackupTimes method to look at the
	creation time for the backup archive instead of parsing the
	file name with a regular expression.  The __CreateBackup method
	signatire as a fix was made for when the archive file name
	already existed and the creation date was not modified.  Added
	more validation to pyFolderBackup.__init__.

0.6	Encapsulated all functionality into a class decarlation while
	maintaining command-line execution functionality.

0.5	Initial version


Back to Contents.

TOOLBOX
LANGUAGES