Using git to manage a website

veb · on Feb 7, 2011

Is it bad, that I simply SSH into the server and do a "git pull"?

gvb · on Feb 7, 2011

To elaborate on dolinsky and the article, if your web server is doing a "git pull", it means it has ssh access into your workstation. If someone breaks into your web server, this means that they have ssh access into your workstation as well by simply using the keys on your web server. This is bad, very bad.

If you push to your web server, only your public key is exposed if your web server is compromised.

russell_h · on Feb 7, 2011

Not necessarily. If you run an ssh-agent locally and configure ForwardAgent to 'yes' for connections to your web server you can ssh to your server and use ssh from it without actually putting your private key on it.

I'd still recommend pushing to a server though.

mcmatterson · on Feb 7, 2011

An excellent discussion about this exact topic came through HN yesterday. See http://news.ycombinator.com/item?id=2183415, and in particular the comments.

alexgandy · on Feb 7, 2011

Thanks for pointing this out.

cookiecaper · on Feb 7, 2011

I don't know if he meant that he git pulls from his workstation. I git push out to a bare repository on my server, and then I ssh in and git pull from the local bare repository into the project's working directory on my server. This doesn't leave the keys for my workstation on the server, but I still have to log in and git pull in the wd.

dolinsky · on Feb 7, 2011

Good / bad are relative terms. Neither situation would be recommended for a site with a code base that pulls from multiple resources / pushes to multiple servers on every release, but for a single server environment this could suffice.

As for the difference b/w your method and the OPs, he defines it here :

> This is more convenient than defining your workstation as a remote on the server, and running "git pull" by hand or from a cron job, and it doesn't require your workstation to be accessible by ssh.

rmc · on Feb 7, 2011

One advantage of the article's method is that it works if your workstation doesn't have a public IP address or is behind NAT / a firewall. It also works if you move around. With the article's method, you could have your laptop at home update your website, then go down to the local coffee shop and update your website from there.

steveklabnik · on Feb 7, 2011

Only once you have more than one server.

kevinburke · on Feb 7, 2011

I use Fabric with mercurial to achieve a similar effect typing "fab prod deploy" - see Steve Losh's blog posts or bitbucket.org/kevinburke/goodmorningcmc

rapind · on Feb 7, 2011

You can do the same with Capistrano. However for a very basic static site it does sound simpler to just use a post-receive hook rather than having a script ssh in and do a pull.

stavros · on Feb 7, 2011

Yep, same here. I wonder why more people don't do it, my fabric scripts basically push to production and then fabric logs in and updates, restarts services, etc.

trusko · on Feb 7, 2011

That's my preferred way of deployment as well. fabric + git

glenngillen · on Feb 7, 2011

It's probably worth checking out NestaCSM (http://effectif.com/nesta) if you're interested in doing this.

Sinatra in front, but all your posts are managed by Git and can be Markdown/Textile/HAML (or anything supported by Tilt iirc). Push it to Heroku if you want easy/free hosting.

Takes care of publishing an RSS feed, tags/categorisation, and a bunch of other nifty things beyond just generating a static site.

Example: http://blog.peepcode.com/

(disclaimer: I used to work with Graham who created it, but I genuinely think it's awesome and use it for almost every site I build now)

Pyrodogg · on Feb 7, 2011

I've been using Joe Maller's write-up[1] as a guide for a while.

I see this strategy removes the need for a second repo on the server. Other than removing a layer which would save space and lower the likelihood of errors in general are their significant pros/cons to either method.

[1] http://joemaller.com/990/a-web-focused-git-workflow/

georgecmu · on Feb 7, 2011

Can I use git to manage a Drupal-managed website?

fungi · on Feb 7, 2011

that's what we do in conjunction with http://drupal.org/project/features, bugs aside it works well

but what you want in code and what you want in the DB will vary from page to page and feature to feature so often just manually redo stuff (in dev env and then again in production) because deploying changes by code is more effort then its worth.

georgecmu · on Feb 7, 2011

Managing the codebase is straightforward -- that's what git's designed for. My question was more about the actual site contents, which will reside in the database. Can you use git for tracking changes to the contents?

fungi · on Feb 7, 2011

> Managing the codebase is straightforward

not always, often stuff is stored in the db that we want to work on in dev then deploy to production.

but it would be nice to be able to export a selection of nodes to code and preserve there id's, content and metadata. maybe you'll find a features contrib mod http://drupal.org/taxonomy/term/11478 but i've never heard of one.

davegan · on Feb 7, 2011

No, git will not track database changes. Features however can handle much of the configuration (that is typically contained in the database) in code, which can then be managed with git.

If you want to create content on your dev or staging site and push it to production, check out the deploy module: http://drupal.org/project/deploy

Tracking changes to nodes in production is probably best left to Drupal's built in node versioning system. Diff (http://drupal.org/project/diff) is a handy module to track changes between revisions and if you want a little more advanced workflow check out revisioning (http://drupal.org/project/revisioning) and workflow (http://drupal.org/project/workflow). Good luck!

georgecmu · on Feb 7, 2011

Great info, thanks!

bricestacey · on Feb 7, 2011

Probably not. You're better off logging your MySQL queries and filtering out the UPDATE, INSERT, and DELETE queries for the tables you're interested in.

RobGR · on Feb 7, 2011

I would advise against that. There are too many places in he mess of tables of data configuration that might refer to specific auto-increment id columns were things will get out of sync. Basically you are trying to do MySQL replication on specific tables only, but there are too many relationships among all the tables.

If it is a small amount of content, such as a handful of pages for a brochure site, putting them in a feature might be best.

If it is a site where customers or visitors generate content on the site, then periodically re-initialize your dev and staging environments with copies of the live DB, running it through a script to anonymize all the user info and change passwords and etc.

If you want to create content on your dev and move it live, then doing a full database copy and moving it live, with settings that are particular to live overridden in settings.php, is probably the best way. I advise against this, basically you are re-launching the site for every change.

If you are truely desiring to be able to create content in multiple different places and move it to live, then probably the best thing to do explicitly program and configure for that, by specifying feeds of the content and setting up the different sites to ingest each other's feeds.

georgecmu · on Feb 7, 2011

FWIW, I found two takes on database versioning: http://www.gsdesign.ro/blog/mysql-database-versioning-strate... https://launchpad.net/dbvcs

georgecmu · on Feb 7, 2011

Interesting suggestion. Do you know if anyone is doing it this way?

ElbertF · on Feb 7, 2011

Why not? Were I work we use Drupal almost exclusively and use Git for everything.

georgecmu · on Feb 7, 2011

Isn't the content stored in a database rather than static pages? Do you do regular commits of your database files?

jdbeast00 · on Feb 7, 2011

if this was possible and useful drupal wouldn't have Features. the whole point is whats in the database can't be versioned easily.

soult · on Feb 7, 2011

That's similar to my private website (soultcer.com). I use git to create and store the content, and a wiki as content management system. It's nice to work on your website, and all you need is a git push for deploy. If I make a mistake or someone vandalizes the wiki, reverting is easy.

clickable: http://www.soultcer.com/

Edit: In case you are interested, the wiki software was written by a friend and is open source: https://github.com/patrikf/ewiki

chalst · on Feb 7, 2011

The general approach of putting together your website on one machine, the coding machine, and publishing it to your server is one I use, although I prefer a complex build on my design machine and then rsync it with the server.

Checkout Chronicle (http://www.steve.org.uk/Software/chronicle/). I've put up a HN thread (http://news.ycombinator.com/item?id=2186798).

jbrennan · on Feb 7, 2011

This is basically how I run my publishing engine (you can read more about it at the "Colophon" part of this article: http://nearthespeedoflight.com/article/about_the_redesign ).

I wanted to teach myself Ruby and I figured this was a great way to maintain a site, as I'm terrible with both SQL and security. Git solves security and DataMapper solves the SQL, and my Ruby lubricates the rest.

soult · on Feb 7, 2011

You could use git directly for metadata instead of relying on some JSON file. Git will tell you when an article was created and by whom. It will also tell you when and by whom each edit was made.

jbrennan · on Feb 7, 2011

True. But my metadata needs exceed simply who wrote the article, I also have things like tags, pubdate, update, etc.

My system actually allows for the metadata to be embedded in the main article file, but it ended up being simpler for me to split the files most of the time (the editing app I wrote does this for me, so I mostly forget about it now).

dedward · on Feb 7, 2011

I've played with this - in the end I ended up using custom scripts and/or Capistrano scripts (along with git of course) to handle actual deployments. It provided more control and more features, while still letting me leverage git.

bigiain · on Feb 7, 2011

Yeah, sounds familiar. I've been cooking up a way to manage a locally mastered static html website hosted in Amazon CloudFront. I may well use it as an opportunity to learn git (over SNV)...

trusko · on Feb 7, 2011

Have a question. This is nice for simple HTML. How about deploying site were you have to migrate the database etc. This approach wouldn't work. I use fabric with git right now, works well.

jefe78 · on Feb 7, 2011

One solution that occurred to me, was to setup a(assuming MySQL)mirrored database. Another idea is to setup a MySQL dump script or something similar.

Just a couple of ideas :)

trusko · on Feb 7, 2011

I think it is good idea for static sites, for everything else you would be rediscovering fabric and similar tools.

buckwild · on Feb 7, 2011

What about using git to backup an entire hard drive? Maybe using bitbucket (or github if you don't mind sharing your data with the world).

Living on the cloud has never been so easy :-D

beoba · on Feb 7, 2011

You should look at something more along the lines of rdiff-backup (or whatever's fashionable these days) for that.

Using git would be pretty pointless, unless you're going to be frequently branching and merging against your mp3s or something.

kenneth · on Feb 7, 2011

Performance would be nightmarish. Git is super elegant and great, but not very performant. Imagine having to deltify a terabyte of data? Good luck with that!

riledhel · on Feb 7, 2011

wasn't it because he needed better performance one of the reasons Torvalds created git? http://en.wikipedia.org/wiki/Git_(software)

rmc · on Feb 7, 2011

Although git usually has faster performance than most other VCS, that doesn't mean it can easily handle terabyte disk image files. :P

beoba · on Feb 7, 2011

http://en.wikipedia.org/wiki/Bitkeeper#Pricing_change

js2 · on Feb 7, 2011

See https://github.com/apenwarr/bup

beagle3 · on Feb 7, 2011

bup! bup! bup!

https://github.com/apenwarr/bup

boyter · on Feb 7, 2011

I do something similar, but I keep the "central" repository in another location on the server and pull from that.

andrewcamel · on Feb 7, 2011

That's what I do, except I use svn rather than git.

boyter · on Feb 7, 2011

Nice to hear I am not alone on this. The advantage I find is that it makes my deployments more explicit, and my pushes too and from the repository are as often as I feel.

With GIT thats a moot point in the linked article so long as you are using branches for everything and remembering to push them as well, but sometimes I just want to fix something quickly and do so without a branch. I dont care what people say when you start storing a lot of stuff in GIT a branch can take some time to process.

steveklabnik · on Feb 7, 2011

You're not forced to use branches with git. They're just so easy that there's no reason not to.

JamieEi · on Feb 7, 2011

So in other words Heroku minus all that pesky code?