venerdì 10 febbraio 2012

[Ruby][Rake] Importing task is taking too long?

Working with big set of data it could happen to import a very large amount of objects from CSV to your SQL database. And if you have hundreds thousand of rows for a particular model in your CSV it could take too long and too many memory...
It means that during your coding you have been too optimistic about the data size to import.
So, what is the solution?
Actually I found two solutions for this problem: use the Crewait gem, or a better use of transactions.



About the Crewait gem, as stated on the home page, it is very easy to use:
Crewait.start_waiting

1.upto 1_000_000 do |index|
  Product.crewait(:name => "Product ##{index}")
end

Crewait.go!

Ok, but where is the trick?
Looking at the (only) documentation I found that:
...Crewait.go! sends the data in the waiting area off to their different tables in the minimal number of INSERT statements, without creating any ActiveRecord objects at all.
and watching at the method import_to_sql in the crewait.rb file you will find why: it will store a plain SQL insert statement for each object you create "crewait".
But when you crewait you won't have any validation about the Ruby object, just the creation of a plain string with your object data... and sometimes it does not help.

That's why I moved to a more simple "transaction" way.
That's the idea: you don't know the size of your data so let's start a transaction and save the data all at once.
So, let's write a bounce of lines of code:
Product.transaction do
  1.upto 1_000_000 do |index|
    Product.create(:name => "Product ##{index}")
  end
end
Cleaner, simpler and "lighter" solution.

Nessun commento:

Posta un commento