DataShifter

Rake-backed data migrations ("shifts") for Rails apps, with dry run by default, progress output, and a consistent summary. Define shift classes in lib/data_shifts/*.rb; run them as rake data:shift:<task_name>.

Installation

# Gemfile
gem "data_shifter"

bundle install

No extra setup in a Rails app: the railtie registers the generator and defines rake tasks by scanning lib/data_shifts/*.rb.

Quickstart

Generate a shift (optionally scoped to a model):

bin/rails generate data_shift backfill_foo
bin/rails generate data_shift backfill_users --model User

Add your logic to the generated file in lib/data_shifts/.

Run it:

rake data:shift:backfill_foo
COMMIT=1 rake data:shift:backfill_foo

Defining a shift

Collection-based shifts (typical)

For systemic migrations across many records, implement:

collection: an ActiveRecord::Relation (uses find_each) or an Array/Enumerable
process_record(record): applies the change for one record

module DataShifts
  class BackfillCanceledById < DataShifter::Shift
    description "Backfill canceled_by_id"

    def collection
      Bar.where(canceled_by_id: nil).where.not(canceled_at: nil)
    end

    def process_record(bar)
      bar.update!(canceled_by_id: bar.company.primary_contact_id)
    end
  end
end

Task-based shifts (targeted, one-off changes)

For targeted changes to specific records (e.g. fixing a bug for particular IDs), use task blocks instead:

module DataShifts
  class FixOrderDiscrepancies < DataShifter::Shift
    description "Fix order #1234 shipping and billing issues"

    task "Correct shipping address" do
      order.update!(shipping_address: "123 Main St")
    end

    task "Apply missing discount" do
      order.update!(discount_cents: 500)
    end

    private

    def order
      @order ||= Order.find(1234)
    end
  end
end

Task blocks run in the context of the shift instance, so they have access to private helper methods, dry_run?, log, skip!, find_exactly!, and any other instance methods you define. Use private methods to DRY up shared lookups across tasks.

Task blocks:

Run in sequence within the same lifecycle (transaction, dry run protection, summary)
Default to single transaction (all tasks commit or roll back together); use transaction :per_record for per-task transactions

Generate a task-based shift with:

bin/rails generate data_shift fix_order_1234 --task

Dry run vs commit

Shifts run in dry run mode by default. DB changes are always rolled back in dry run mode, regardless of transaction setting.

Dry run (default): rake data:shift:backfill_foo
Commit: COMMIT=1 rake data:shift:backfill_foo
- (COMMIT=true or DRY_RUN=false also commit)

Automatic side-effect guards (dry run)

In dry run mode, DataShifter automatically blocks or fakes these side effects so unguarded code is less likely to hit the network or send mail/jobs:

Service	Behavior in dry run
HTTP	Blocked via WebMock (`disable_net_connect!`). Allow specific hosts with `allow_external_requests [...]` or `DataShifter.config.allow_external_requests`.
ActionMailer	`perform_deliveries = false` (restored after run).
ActiveJob	Queue adapter set to `:test` (restored after run).
Sidekiq	`Sidekiq::Testing.fake!` (restored with `disable!` after run). Only applied if `Sidekiq::Testing` is already loaded.

Guarding other side effects: For anything we don't cover (e.g. another service, or allowed HTTP that mutates), use e.g. return if dry_run? in your shift. DB changes are always rolled back in dry run; only non-DB side effects need this.

To allow HTTP to specific hosts during dry run (e.g. a migration that must call an API to compute values), use the per-shift DSL or global config (NOTE: it is your responsibility to ensure you only make readonly requests in dry_run? mode):

# Per shift
module DataShifts
  class BackfillFromApi < DataShifter::Shift
    allow_external_requests ["api.readonly.example.com", %r{\.internal\.company\z}]
    # ...
  end
end

# Global (e.g. in config/initializers/data_shifter.rb)
DataShifter.configure do |config|
  config.allow_external_requests = ["api.readonly.example.com"]
end

Allowed hosts are combined (per-shift + global). Restore (WebMock, mail, jobs) happens in an ensure so later code and other specs are unaffected.

Transaction modes

Set the transaction mode at the class level:

transaction :single / transaction true (default): one DB transaction for the entire run; dry run rolls back at the end; a record error aborts the run.
transaction :per_record: in commit mode, each record runs in its own transaction (errors are collected and the run continues); in dry run, the run is wrapped in a single rollback transaction.
transaction false / transaction :none: No automatic transaction in commit mode only. In dry run, the run is still wrapped in a single rollback transaction so DB changes are never committed. Use when you have external side effects or your own transaction strategy in commit mode.

module DataShifts
  class BackfillLegacyId < DataShifter::Shift
    description "Per-record so one failure doesn't roll back all"
    transaction :per_record

    def collection = Item.where(legacy_id: nil)
    def process_record(item)
      item.update!(legacy_id: LegacyIdService.fetch(item))
    end
  end
end

module DataShifts
  class SyncToExternal < DataShifter::Shift
    description "Side effects outside DB"
    transaction false

    def process_record(record)
      return if dry_run?

      record.update!(synced_at: Time.current)
      ExternalAPI.notify(record)
    end
  end
end

Progress, status, and output

Progress bar: enabled by default (requires ruby-progressbar), and only shown for collections with at least 5 records.
Header: prints mode (DRY RUN vs LIVE), record count, transaction mode, and available status triggers.
Live status (without aborting):
- STATUS_INTERVAL=60 prints a status block periodically (checked between records)
- macOS/BSD: Ctrl+T (SIGINFO)
- Any OS: kill -USR1 <pid> (SIGUSR1)

Resuming a partial run (`CONTINUE_FROM`)

If your collection is an ActiveRecord::Relation, you can resume by filtering the primary key:

CONTINUE_FROM=123 COMMIT=1 rake data:shift:backfill_foo

Notes:

Only supported for ActiveRecord::Relation collections (Array-based collections—like those from find_exactly!—cannot be resumed).
The filter is primary_key > CONTINUE_FROM, so it's only useful with monotonically increasing primary keys (e.g. find_each's default behavior).

How shift files map to rake tasks

DataShifter defines one rake task per file in lib/data_shifts/*.rb.

Task name: derived from the filename with any leading digits removed.
- 20260201120000_backfill_foo.rb → data:shift:backfill_foo (leading <digits>_ prefix is stripped)
- backfill_foo.rb → data:shift:backfill_foo
Class name: task name camelized, inside the DataShifts module.
- backfill_foo → DataShifts::BackfillFoo

Shift files are required only when the task runs (tasks are defined up front; classes load lazily). The description "..." line is extracted from the file and used for rake -T output without loading the shift class.

Configuration

Configure DataShifter globally in an initializer:

# config/initializers/data_shifter.rb
DataShifter.configure do |config|
  # Hosts allowed for HTTP during dry run only (no effect in commit mode)
  config.allow_external_requests = ["api.readonly.example.com"]

  # Suppress repeated log messages during a shift run (default: true)
  config.suppress_repeated_logs = true

  # Max unique messages to track for deduplication (default: 1000)
  config.repeated_log_cap = 1000

  # Global default for progress bar visibility (default: true)
  config.progress_enabled = true

  # Default status print interval in seconds when ENV STATUS_INTERVAL is not set (default: nil)
  config.status_interval_seconds = nil
end

Per-shift overrides:

class MyShift < DataShifter::Shift
  progress false                # Disable progress bar for this shift
  suppress_repeated_logs false  # Disable log deduplication for this shift
end

Operational tips

Safety checklist (recommended)

Start with a dry run: run the task once with no environment variables set, confirm logs and summary look right, then re-run with COMMIT=1.
Make shifts idempotent: structure process_record so re-running is safe (for example, update only when the target column is NULL, or compute the same derived value deterministically).
Guard side effects we don't auto-block: use return if dry_run? for any side effect not covered by Automatic side-effect guards (see above).

Choosing a transaction mode (behavior + guidance)

transaction :single (default):
- Behavior: the first raised error aborts the run (all-or-nothing).
- Use when: partial success is worse than failure, or you want a clean rollback on any unexpected error.
transaction :per_record:
- Behavior: in commit mode, records are committed one-by-one; errors are collected and the run continues; the overall run fails at the end if any record failed.
- Use when: you want maximum progress and are OK investigating/fixing a subset of failures.
transaction false / :none:
- Behavior: in commit mode, no automatic transaction; in dry run, the run is still wrapped in a rollback transaction so DB changes are not committed.
- Use when: you have intentional external side effects or your own transaction/locking strategy in commit mode.

Performance and operability (recommended)

Prefer returning an ActiveRecord::Relation from collection for large datasets (DataShifter iterates relations with find_each).
Be aware count happens up front for relations to print the header and size the progress bar. On very large/expensive relations, that extra query may be non-trivial.
Use status output for long runs: set STATUS_INTERVAL in environments where signals are awkward (for example, some process managers).

Utilities for building shifts

`find_exactly!` (fail fast for ID lists)

Use find_exactly!(Model, ids) to fetch a fixed list and raise if any are missing:

def collection
  ids = ENV.fetch("BUYBACK_IDS").split(",").map(&:strip)
  find_exactly!(Buyback, ids)
end

def process_record(buyback)
  buyback.recompute!
end

`skip!` (count but don't update)

Mark a record as skipped. Calling skip! terminates the current process_record immediately (no return needed). The record is counted as "Skipped" in the summary.

def process_record(record)
  skip!("already done") if record.foo.present?
  record.update!(foo: value)  # not executed if skipped
end

Skip reasons are grouped: the summary shows the top 10 reasons by count (e.g. "already done" (42), "not eligible" (3)) instead of logging each skip inline. This keeps the progress bar clean.

Throttling and disabling the progress bar

class SomeShift < DataShifter::Shift
  throttle 0.1       # sleep seconds between records
  progress false    # disable progress bar rendering
end

Generator

Command	Generates
`bin/rails generate data_shift backfill_foo`	`lib/data_shifts/<timestamp>_backfill_foo.rb` with a `DataShifts::BackfillFoo` class
`bin/rails generate data_shift backfill_users --model User`	Same, with `User.all` in `collection` and `process_record(user)`
`bin/rails generate data_shift backfill_users --spec`	Also generates `spec/lib/data_shifts/backfill_users_spec.rb` when RSpec is enabled
`bin/rails generate data_shift fix_order_1234 --task`	Generates a shift with a `task` block instead of `collection`/`process_record`

The generator refuses to create a second shift if it would produce a duplicate rake task name.

Testing shifts (RSpec)

This gem ships a small helper module for running shifts in tests. Require it and include DataShifter::SpecHelper in specs or in RSpec.configure for type: :data_shift.

Helpers:

run_data_shift(shift_class, dry_run: true, commit: false) — Runs the shift; returns an Axn::Result. Use commit: true to run in commit mode.
silence_data_shift_output — Suppresses STDOUT for the block (e.g. progress bar).
capture_data_shift_output — Runs the block and returns [result, output_string] for asserting on printed output.

Use expect { ... }.not_to change(...) and expect { ... }.to change(...) to assert that data stays unchanged in dry run and changes when committed:

require "data_shifter/spec_helper"

RSpec.describe DataShifts::BackfillFoo do
  include DataShifter::SpecHelper

  before { allow($stdout).to receive(:puts) }

  it "does not persist changes in dry run" do
    expect do
      result = run_data_shift(described_class, dry_run: true)
      expect(result).to be_ok
    end.not_to change(Foo, :count)
  end

  it "persists changes when committed" do
    expect do
      result = run_data_shift(described_class, commit: true)
      expect(result).to be_ok
    end.to change(Foo, :count).by(1)
    # Or for in-place updates: .to change { record.reload.bar }.from(nil).to("baz")
  end
end

Requirements

Ruby ≥ 3.2.1
Rails (ActiveRecord, ActiveSupport, Railties) ≥ 7.0
axn (Shift classes include Axn)
ruby-progressbar (for progress bars)
webmock (for dry-run HTTP blocking; optional allowlist via allow_external_requests [...] / DataShifter.config.allow_external_requests)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
.husky		.husky
bin		bin
lib		lib
spec		spec
.gitignore		.gitignore
.lintstagedrc		.lintstagedrc
.rspec		.rspec
.rubocop.yml		.rubocop.yml
CHANGELOG.md		CHANGELOG.md
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
data_shifter.gemspec		data_shifter.gemspec
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataShifter

Installation

Quickstart

Defining a shift

Collection-based shifts (typical)

Task-based shifts (targeted, one-off changes)

Dry run vs commit

Automatic side-effect guards (dry run)

Transaction modes

Progress, status, and output

Resuming a partial run (`CONTINUE_FROM`)

How shift files map to rake tasks

Configuration

Operational tips

Safety checklist (recommended)

Choosing a transaction mode (behavior + guidance)

Performance and operability (recommended)

Utilities for building shifts

`find_exactly!` (fail fast for ID lists)

`skip!` (count but don't update)

Throttling and disabling the progress bar

Generator

Testing shifts (RSpec)

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataShifter

Installation

Quickstart

Defining a shift

Collection-based shifts (typical)

Task-based shifts (targeted, one-off changes)

Dry run vs commit

Automatic side-effect guards (dry run)

Transaction modes

Progress, status, and output

Resuming a partial run (CONTINUE_FROM)

How shift files map to rake tasks

Configuration

Operational tips

Safety checklist (recommended)

Choosing a transaction mode (behavior + guidance)

Performance and operability (recommended)

Utilities for building shifts

find_exactly! (fail fast for ID lists)

skip! (count but don't update)

Throttling and disabling the progress bar

Generator

Testing shifts (RSpec)

Requirements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Resuming a partial run (`CONTINUE_FROM`)

`find_exactly!` (fail fast for ID lists)

`skip!` (count but don't update)

Packages