Example

Core of data-migrator is the unix pipe and filter paradigm to build data transformers. Source data is read from a database or some other source. It is piped to a filter written with data-migrator, which emits for example SQL INSERT statements. These can be piped to a target client and thereby loaded into the target database.

Consider the most basic datapump in mysql, which is direct pipe between source and target database:

$ mysqldump -u [uname] -p[pass] source_db table  | mysql target_db

In this example mysqldump will export the table as SQL statements and the new database will process them. Now if you want to do something extra and repeatable with respect to the data, you could use all kinds of unix filtering with sed, awk, or your favorite scripting. It is not very hard to imagine what Pythonista’s would do especially if extra transformations or simplifications are needed. The basic packages are quite expressive and one would easily setup something like:

$ mysql source_db -E 'select * from table' -B  | python my_filter.py | mysql target_db

With my_filter.py a basic implementation of csv:

import sys, csv

reader = csv.DictReader(sys.stdin)

for row in reader:
  print 'INSERT INTO `table` (a,b) VALUES ("%(a)s", %(b)s)' % row

To see the options for manipulation is left as an exercise to the reader, but do accept that as soon things become just a little more complex (think: splitting in two tables, column reverses, renaming of columns, mixing, joining, filtering, transforming), a more declarative support is helpful. That is why we came up with data-migrator. One could simply replace the transformation with:

from data_migrator import models, transform
from data_migrator.emitters import MySQLEmitter

def parse_b(v):
  if v == 'B':
    return 'transformed_B'
  else:
    return v.lower()

class Result(models.Model):
  id   = models.IntField(pos=0) # keep id
  uuid = models.UUIDField()     # generate new uuid4 field
  # replace NULLs and trim
  a    = models.StringField(pos=1, default='NO_NULL', max_length=5, nullable='NULL', replacement=lambda x:x.upper())
  # parse this field
  b    = models.StringField(pos=2, parse=parse_b, name='my_b')

  class Meta:
    table_name = 'new_table_name'

# django-esc like creating and saving of additional records (to a manager)
Result(a='my a', b='my b').object.save()

if __name__ == "__main__":
  transform.Transformer(models=[Result], emitter=MySQLEmitter).process()

  assert(len(Result.objects) > 1)

Hereby write a nice self explaining transformer, which will generate something like:

-- transformation for Result to table new_table_name
-- input headers: id,a,b
-- stats: in=10,dropped=0,out=10

SET SQL_SAFE_UPDATES = 0; -- you need this to delete without WHERE clause
DELETE FROM `new_table_name`;
ALTER TABLE `new_table_name` AUTO_INCREMENT = 1;

INSERT INTO `new_table_name` (`id`, `uuid`, `a`, `my_b`) VALUES (0, "ac7100b9-c9ad-4069-8ca5-8db1ebd36fa3", "MY A", "my b");
INSERT INTO `new_table_name` (`id`, `uuid`, `a`, `my_b`) VALUES (1, "38211712-0eb2-4433-b28f-e3fe33492e7a", "NO_NULL", "some value");
INSERT INTO `new_table_name` (`id`, `uuid`, `a`, `my_b`) VALUES (2, "a3478903-aed9-462c-8f47-7a89013bc6ea", "CHOPP", "transformed_B");
...

If you can see the value of this package, continue with the installation of data-migrator. After which there are some more pages in the Tutorial for you to get a grasp of this library.