In an earlier article Create checkpoints using Ruby and MD5, I wrote about using the MD5 function in Ruby to create an checkpoint for all files in a directory.

Now the next step in this process is to use that initial checkpoint file as a starting point to find all changes that have happened since.

Here is a simple Ruby script to compare current checksums against an initial checksum hash file.

Side Note: This article was written in 2008 so some of the references may be out-of-date.

This particular example shows the basics of processing all the files in a directory and all the subdirectories underneath. In this case, my subdirectory is “out” and I am only processing files with a “.html” extension.

Note that the md5 checksums are loaded into a hash, sorted, and compared to find what’s new, what has been added, and what has been removed.

checkpoint_changes.rb

#!/usr/bin/ruby -w
require 'digest/md5'

md5FileHash = {}
oldFileHash = {}

#----------------------------------------------------------------------
# Pass 0 - Get file list
#----------------------------------------------------------------------

theFileList = []; 
Dir['out/**/*.html'].each do |fnn| 
  theFileList << fnn.downcase;
end;

#----------------------------------------------------------------------
# Pass 1 - calc md5
#----------------------------------------------------------------------

theFileList.each do |f| 
    digest = Digest::MD5.hexdigest(File.read(f))
    md5FileHash[f] = digest;
end;

#----------------------------------------------------------------------
# Pass 2 - read old array from file 
#----------------------------------------------------------------------

File.open("sync.dat").each do |line|
    line = line.chop;
    md5x = line[0,32];
    filx = line[32,255];
    oldFileHash[filx] = md5x;
end;


#----------------------------------------------------------------------
# Pass 3a - look for additions 
#         ( preferred upload order is additions, changes, then deletes)
#----------------------------------------------------------------------

for ddd in md5FileHash.keys.sort do 
  md5old = oldFileHash[ddd];
  md5new = md5FileHash[ddd];
  if md5old.nil? then puts "++ #{ddd}"; end;
end;

#----------------------------------------------------------------------
# Pass 3b - look for changes ( preferred upload order)
#----------------------------------------------------------------------

for ddd in md5FileHash.keys.sort do 
  md5old = oldFileHash[ddd];
  md5new = md5FileHash[ddd];
  if md5new != md5old then if !md5old.nil? then puts "<> #{ddd}"; end; end;
end;

#----------------------------------------------------------------------
# Pass 5 - look for deletes
#----------------------------------------------------------------------

for ddd in oldFileHash.keys.sort do 
  md5old = oldFileHash[ddd];
  md5new = md5FileHash[ddd];
  if md5new.nil? then puts "-- #{ddd}"; end;
end;

puts "DIFF complete.";

by Brad Trupp © 2008