The Scan Emit LoopΒΆ

Core in data-migrator is the declarative definition of the target model, indeed in a Django-esc way. Columns of the target table are defined as fields and each field has many settings. The Field is a definition of what to perform when scanning, transforming and emitting the record. Output is abstracted to an extensible set of output writers, called emitters. The whole is controlled with a standard transformer engine.

The scan-emit loop is the basis the data-migrator. Once everything is setup, by default the transformer will read stdin and send every CSV row to the model for scanning. Out of the box the fields define a scan loop:

  1. select the specified column from the row.
  2. nullable test if not allowed and replace by None.
  3. validate the input (if validator is provided).
  4. parse the input (if parser is provided).
  5. store as native python value (aka NULL=>None).

Once all fields are parsed, the resulting object can be checked for None or uniqueness. It can be dropped or the filter can fail because of violations. This are all declarative settings on the Model through the Meta settings. Otherwise the record is saved and (accessible by Model.objects.all()) is emitted. This is based on a dedicated emitter, like the MySQL INSERT statement generator. Emitting provides some of the following features:

  1. trim if string and max_length is set (note the full string is stored in the intermediate object!).
  2. validate the output (if output_validate is provided).
  3. replace the value with some output string (if provided).
  4. anonymize has been added to the output as of version 0.6.0
  5. write in a dedicated format as dictated by the emitter.