Migra – A schema diff tool for PostgreSQL

muxator · on March 25, 2018

This fills a gap for me.

I tend to think of a database schema in declarative terms ("table T has these fields"), whereas the SQL DDL statements are more akin to commands (ALTER table T, adding a column). This always felt bad to me, because in order to define the final state of your schema, you need to define the initial state plus the commands necessary to get to the new one.

A diff tool frees you from this pain.

evanelias · on March 25, 2018

Indeed, although you can take it a step further than just diff'ing. Following the philosophy of infrastructure-as-code, a fully declarative approach would be a repository of schema files, and tooling that can convert changes to the repo to actual DDL, which it then knows how to execute safely (even taking into account multiple dev/stage/prod environments, sharding, etc).

I built a tool called Skeema that does this, although it only supports MySQL / MariaDB / Percona Server for now. https://github.com/skeema/skeema/

It's currently a stateless CLI tool, using git-like semantics: you can pull from a db to the filesystem, or push from the filesystem to a db / environment, or diff to preview what DDL would be generated. But the next iteration may include integration with CI/CD workflows, Github API, etc for a more seamless workflow. e.g. if you want to add a column to a table, create a pull request modifying the file, and the tooling automatically applies the DDL to a staging environment. Then get your PR reviewed/approved, and once you merge to master the tooling automatically applies the DDL to production.

cellis · on March 26, 2018

There is also a Javascript version of this at https://github.com/gimenete/dbdiff and a Go version at https://github.com/joncrlsn/pgdiff, each with varying degrees of completeness. As well as apgdiff (Java) https://github.com/fordfrog/apgdiff ( these tools have been tried many times, and I've been keeping track of them :D ), I'm really tired of writing db migrations. Will try this out!

discussedbefore · on April 5, 2018

Would appreciate more from what you've been tracking.

Someone put a lot of love into this list for MSSQL circa 2011: https://en.wikipedia.org/w/index.php?title=Microsoft_SQL_Ser...

ssahoo · on March 26, 2018

We used apgdiff for automated schema diff generation. 2.4 was released many years ago and lacks compatibility with PG 10. Looking for alternatives. Thanks for the mentions.

Edit - Migra does not compare pgdump outputs.

djrobstep · on March 26, 2018

That's right, migra does not compare dump outputs.

I've found that the best parser of PostgreSQL dump files is PostgreSQL, and don't want to reimplement that.

Hence, migra works directly on database connections. If you have a dump file you want to compare, you can just create a temporary database and load the dump into that first. There's an example in the docs of how to do this easily.

ssahoo · on March 26, 2018

this complicates the workflow where schema compare is necessary in source controls.

blaisio · on March 26, 2018

This is awesome!

A lot of people point to migrations as the best way to track changes to a database schema. But there are a lot of problems with them. For example, they involve adding version control on top of another version control system, which can cause a ton of problems. They also don't maintain themselves well. If you leave it alone, running migrations will just take longer and longer over time, even though you don't really get more utility.

I think we need more support from databases themselves to solve this problem.

In the meantime, this is a really good stopgap, because it can theoretically allow you to just have a file with your "ideal schema" for each commit. No need to maintain a separate database migration history too. You can even generate a series of migrations by looking at your git history!

djrobstep · on March 26, 2018

This is a great summary of the advantages, as I see them. Would you mind if I add this comment as a testimonial somewhere in the docs?

blaisio · on March 27, 2018

Sure, go for it!

yangyang · on March 25, 2018

This looks interesting.

If you're using SQLAlchemy, Alembic is a good migration management / generation / application tool: http://alembic.zzzcomputing.com/en/latest/, and works with most (all?) the databases supported by SQLAlchemy. It doesn't handle every kind of database object / attribute thereof, but it's reasonably easily extensible.

djrobstep · on March 25, 2018

Hi, migra author here. I wrote migra partly out of frustration with Alembic and similar migration tools.

Some of the problems:

- Annoying version number management

- Management of long "chain" of migration files

- Unnecessary coupling between ORM and migration tool

- Heavyweight process to generate and run each migration, which slows down local development

- Bad testability

I talked in more depth about this topic at last year's PostgresOpen, if you're interested: https://www.youtube.com/watch?v=xr498W8oMRo

zzzeek · on March 26, 2018

Alembic / SQLAlchemy author here.

> Annoying version number management

the GUIDs? OK.

> Management of long "chain" of migration files

Alembic does not require migration files to be linked together in any particular way, except that individual migrations have dependencies on others. Each file represents an individual series of changes to a database, which is a reasonable system for developers where a particular new feature comes with a particular new set of schema artifacts. It integrates very well with version control and accommodates for workflows like branching and merging of source trees.

That said, I would assume the Migra way is that you have the entire database schema in a versioned file so you can diff directly, and the "generate the changes" aspect is on the fly (otherwise, if you were persisting the DDL for changes, I don't see how you'd avoid having a "chain" of migration files dependent on each other). That is more lightweight if you can get by with it. When you have your tables written out in terms of SQLAlchemy metadata, that's your "schema". So it's there but just in terms of SQLAlchemy structures. Alembic could generate the migrations and just run them without you ever having to see them or commit, however I can't provide a diff tool that I can guarantee is 100% accurate and would not need its results to be checked before running. By having the individual changeset go into source control, that allows the developer who creates that migration to be the one to make sure it generates correctly, as well as being able to add other elements of the change that aren't necessarily part of a plain schema diff, where it then can be committed permanently without the chance of it mis-generating at a later time.

> Unnecessary coupling between ORM and migration tool

100% false. There is no usage of SQLAlchemy ORM within Alembic anywhere. The "sqlalchemy.orm" library is not imported. There's a single unused artifact in the test suite that was copied over from SQLAlchemy:

    $ grep -e "sqlalchemy.orm" `find alembic -name "*.py"`
    alembic/testing/plugin/plugin_base.py:  from sqlalchemy.orm.util import randomize_unitofwork

You're likely referring to the fact that Alembic builds on schema structures defined in terms of SQLAlchemy Core. It's hard to say this is "necessary" or "unnecessary", the schema has to be defined somewhere, with Migra it seems like it's typically hand-written DDL or some other generation tool, with Alembic it's SQLAlchemy Core, since it's for people using SQLAlchemy already.

> Heavyweight process to generate and run each migration, which slows down local development

Alembic has an autogenerate feature that is in some ways similar to migra's schema diff but it does have areas where it does not accommodate for certain structures, the most prominent one is regarding Postgresql enumerated types. But this is not a design flaw, it's a missing feature I'd love someone to work on.

There is also of course the need to express your schema in terms of Python structures, which is probably what you're referring towards when you say "coupled with the ORM" as well as "heavyweight", but this assumes an application is already based on SQLAlchemy and already has this information.

If not, then Migra is a good choice, assuming database agnosticism is also not required, and if you have some totally other means of generating your schema, which I'd imagine is hand-written DDL, if I'm understanding the documentation correctly.

> Bad testability

This is an area where Alembic is lacking in guidance and pre-built features, but not in general capability. We have Alembic migration test fixtures in Openstack that might serve as a basis for something more generically part of Alembic but I don't have the resources to work on this solo. But Alembic does include schema comparison tools and the examples for testing in Migra don't look much different from what one would do with Alembic (indeed this is what we do in openstack, compare schemas using Alembic).

nathancahill · on March 26, 2018

Just wanted to say thanks for the awesome work you've done with those projects. I've been using the Flask/SQLAlchemy/Alembic stack to run many, many projects over the last 4 or 5 years.

djrobstep · on March 26, 2018

Hey, thanks for the response!

I've written a lot of code on top of SQLAlchemy over the years, big fan of your work.

These points are mostly philosophical differences, I appreciate that not everybody has the same perspective I do.

But to me, thinking of database changes in terms of current state/goal state and the differences between them, is more intuitive than as a sequence of versions.

By heavyweight, I mean the need to generate or modify a migration file and version each time a change is required. When I'm working on an app and playing around with table structures I find it much easier to be able to just type "sync" and have everything automatically match what's defined in my models or setup scripts or whatever, without generating a new numbered version each time I make a change.

You don't have to build your schemas through hand-written DDL. You can still build your schemas through Python structures or however you want, as long as you can generate a target with it to compare against.

zzzeek · on March 26, 2018

yes I can definitely see Migra is more productive when switching around between schemas in development. But I'm not sure how I'd go about using this approach if i need to deploy a fixed series of schema migrations to a production server farm.

njharman · on March 26, 2018

> Alembic is a good migration ... tool

Having used it for 2 years that is not the adjective I would apply. I infrequently go looking for a better solution. But, it's a hard problem to solve, so...

yangyang · on March 26, 2018

I've used it for about that long, with no real problems at all. We have probably a hundred or so migrations, and have added support for various things not supported out of the box (triggers, functions, exclusion constraints and a few more), and it's been very solid.

What kind of situation are you using it in where it falls short?

xmatos · on March 26, 2018

I like the idea of a tool like this, but I'm not sure it's a better process than having migration scripts. And I dislike migration scripts.

To me, the perfect process would be, using an MVC web framework as an example, to generate a temp schema from models and run a schema diff tool like this, to update the destination database. That would eliminate migration scripts, but would probably slow down the db sync process, once your db start getting bigger.

Now, if you're not using an ORM, I think it's a bad idea to use a tool like this to update a production db from development. It will work for a single dev, but not for a team project, since you'd need a common dev schema that would be used as a source for the diff, and that will get clogged quickly.

It's still a nice tool to find and update small inconsistencies between different environments. Nice work!

stephenr · on March 26, 2018

I’ve been using a couple of python tools wrapped in two shell scripts to achieve what I think works well:

migration-capture.sh (using https://github.com/mmatuson/SchemaSync) creates a temporary db using a base schema SQL file, applies previously created migrations, and then creates up/down migration scripts by comparing that to a db that’s been modified manually or by an mvc model tool, etc.

migration-apply.sh (using https://github.com/gabfl/dbschema) applies the “up” scripts that haven’t been applied previously OR runs the “down” scripts corresponding to “up” scripts that have been run previously but are not found in the deployed scripts (ie rolling back a deployment automatically runs the down script for migrations that were in the rolled back versions)

Right now my only concern/goal for improvement with the setup is to replace the python tool the apply script relies on, with a pure shell solution so it’s one less dependency to need to install on production machines.

Edit: spelling typo

Edit2: Wtf is a bass schema.

GordonS · on March 26, 2018

I'm working with Postgres using Marten, and it let's you do exactly this. I have a console app that builds a model in memory, compares it to a database, and generates a migration script - it works really well.

I recently tried to replicate this with SQL Server and Entity Framework Core, but trying to reverse engineer a database to a model seems impossible.. I can get close, but there are always some things it gets wrong that need manually fixing... at which point I may as well write the whole damn script by hand anyway. sigh I wish the EF team would just add real support for reverse engineering databases.

Twisell · on March 26, 2018

I don’t really get the part about extensions. It say it leave extension out of the diff.

However is it possible to include extension using a flag? The first use case I was thinking of being precisely to generate upgrade script in the context of an extension.

I’ve experimented around pure SQL extension for some project and the real pain point was generating upgrade script. So far I only used a "nuke and reinstall" approach but being able to prepare upgrade script with migra would be much smoother.

conceptme · on March 26, 2018

I have used pgModeler(https://pgmodeler.io) some time ago for this and mysql workbench for mysql. It was still a bit unstable the last time I tried it but it's pretty nice when it works.

Ericson2314 · on March 25, 2018

This is a very nice idea! Even if it just did schema equality that would be nice.

wereHamster · on March 26, 2018

I've used this one in the past: https://github.com/eulerto/pgquarrel

ergo14 · on March 26, 2018

What are the benefits of using this over Alembic? Both are python tools and alembic supports other vendors too.

_gjrn · on March 26, 2018

How does it compare to sqitch ?

djrobstep · on March 26, 2018

Migra is strictly a diff tool. It doesn't impose a particular migration workflow on you (but is suited to some more than others).

Sqitch is a migration workflow tool, for its particular preferred workflow.

girishso · on March 26, 2018

Looks great. Yesterday only I thought of creating something like this!