TL;DR This isn’t a post about The Kaiser Chiefs or even music in general – it’s actually a post about using Ruby as a scripting language, specifically for processing CSV files in bulk.

I wouldn’t call myself a Ruby programmer. I’d like to, but the reality is that Ruby is a language I use for side projects – mainly small Rails applications. But I’ve long been conscious that it’s also a pretty handy scripting tool to hang from your belt and recently I found myself using it in that manner.

I’ve posted before about the kids sign-in application that I wrote for church. It’s a WPF (Windows desktop) application that runs off a CSV data file that is refreshed each week. It’s never been enhanced to post the sign-in data back to a central service (that’s a feature I’m supposed to be working on at the moment as it happens) so each week we end up with multiple CSV files that have very rudimentary data about which kids signed in to which rooms at what times. Occasionally we might take a look at those files, but generally we don’t.

But recently the pastor who oversees our ministry wanted some statistics about the spread of sign-in times across our two services and I found that I needed to collate a large number of those CSV files into a single spreadsheet to make it easy for him to analyse the data himself. And that led me to write the following simple Ruby script one Sunday afternoon.

require 'csv'

CSV.open('combined.csv', 'w') do |csv|
	csv << ['Id','First','Last','Room','SignedInAt','IsNewcomer']
	Dir['**/*.csv'].reject{ |f| f['combined.csv'] }.each do |file|
		CSV.foreach(file, headers:true) do |row|
		  	csv.puts row if row['SignedInAt']
		end
	end
end

I know it's not particularly robust code.

And I know the code makes a lot of assumptions about the data.

But neither of those things is an issue for me ... because this is just a script that I use to work on data that I understand and control.

Here's what it does:

  • It "requires" the csv class - an extremely useful piece of code that's included in core Ruby
  • It opens an output file called "combined.csv"
  • It writes a header row to that output file
  • It iterates through all the csv files in the current folder and subfolders but ignoring the new output file it has just created.
  • For each of those input files, it outputs all the rows where the "SignedInAt" column contains a value (the timestamp of an individual sign-in record)

I had almost a hundred individual CSV files to pull together, and this script did the job beautifully.

So if you thought Ruby just happened to be the first word in the name of a well known web application framework - think again. It's a really handy scripting language as well!