How to speed up CSV imports in Rails

I’ve been trying to find a decent method to import a big CSV for an import Rake task. How big is a big CSV? In this case it’s 2,550,321 lines, weighing in at a reasonably hefty 796MB.

The first approach was doomed as it was typed:

Approach 1

# Zzzzzz
CSV.foreach(file_path, {headers: true}) do |row|
    # Model.create() or whatever you need to do

This was really slow. Every record is being created in the database with an individual transaction. After a short period observing the snail I’d synthesised, it was time to quit it.

Approach 2

# Memory hog
rows =, {headers: true})

rows.each_slice(1000) do |slice|
  ActiveRecord::Base.transaction do
    slice.each do |row|
      # Model.create()

My second approach was to read the whole CSV in memory, split the rows into chunks of 1000. Then I’d wrap the creation of every 1000 database records with an ActiveRecord transaction.

This didn’t work because it was a complete memory hog. The process swelled to about 3.10GB just attempting to load the file and I ended up killing the it.

Approach 3

# Faster
SmarterCSV.process(file_path, {:chunk_size => 1000}) do |chunk|
  ActiveRecord::Base.transaction do
    chunk.each do |row|
      # Model.create()

After a quick google-fu I discovered SmarterCSV. It’s geared towards processing large CSV files. The handy feature I needed from SmarterCSV was the ability to deal with chunks of rows to process at a time, but without all the pain of loading the entire file into memory first (like approach #2).

This approach ran fine but still took about an hour to process (still dramatically faster than approach #1).

Approach 4

# Faster again
SmarterCSV.process(file_path, {:chunk_size => 1000}) do |chunk|
  values = []

  chunk.each do |row|
    # Do your processing
    values.push("(#{y_value}, #{z_value})")

  sql = "INSERT INTO x (y, z) VALUES #{values.join(", ")}"


It seems that using ActiveRecord transaction doesn’t actually result to a mass INSERT statement. After google-fu round 2 I decided to try mass SQL inserts as detailed in this blog post ‘Mass inserting data in Rails without killing your performance’.

This raw SQL approach has caveats:

  1. You need to trust the source of your data
  2. You have to handle validations yourself because there’s no ActiveRecord goodness here.

This approach was quicker still, completing the import in about 25 minutes.

Ruby IDE time

Today I’m getting some practice in with RubyMine - an IDE for Ruby & Rails development.

My initial reaction is it’s bit sluggish on a new-ish machine, especially in comparison to simpler, lightweight text-editors like Sublime Text. Plus there’s the inevitable pain of not knowing the keyboard shortcuts (well, CMD+S still works).

On the plus side I like what I’ve seen so far with test running, version control & terminal access all baked into the app.

The demo screenshots also promise some fancy file difference comparison views and model dependency diagrams. Swish.

Checking out Backbone.js

For my final WDI project I made a cheat sheet editor, using Ember.js for the editor itself. The original intention was to use Ember across the entire site. I was using the Ember data DS.RESTAdapter to sync with my Rails app but ran into trouble trying to adjust the API endpoint for particular queries… cue the pain.

Reading through the RESTAdapter documentation was fruitless - it appears this use case isn’t covered yet (to be fair, the RESTAdapter is currently described as alpha-quality on GitHub). I was faced with writing a custom adapter, but felt too short on time to embark on a rewrite of the rest of the project.

<grumble data-about=’ember’>

Here lies my grumble with Ember. There’s too much magic and there are lot’s of features still under development. Behind the magic lies a world of pain as soon as you need to alter something - if you’re still learning the ropes of Ember.

Ember’s magic lets you skip learning to crawl and walk in the world of JavaScript MV*, graduating straight to running instead. But as a beginner, still on the steepest part of Ember’s lengthy learning curve, you can stray from the well trodden tutorial-ed path and become very lost.

I have learnt more about Ember during the project, but I want to investigate Backbone.js because it looks like you need to build more of your app from the ground up. The ground-up approach should make it less likely that you outcode your current understanding of the framework. 

At least you get to build the forest that you end up getting lost in and might recognise a few trees.

On to Backbone.js…

I’m working through Addy Osmani’s book on Developing Backbone.js Applications (it’s free to read online) so I’m looking forward to understanding how the frameworks differ.


CNAME record for an Amazon S3 bucket

To serve static assets for a project I setup a CNAME record pointing to an S3 bucket. After the changes propagated I kept on running into this error:

NoSuchBucket The specified bucket does not exist

NoSuchBucket error? Odd. The bucket was accessible directly at but didn’t work via subdomain I setup.

After Googling around is turns out the bucket name itself needs to match the URL that the content will be requested on:

So to serve assets from you’d need an S3 bucket with that name and then point the CNAME record to You’ll need to vary the endpoint depending the region of your S3 bucket.