Several people suggested rsnapshot, BackupPC, areca-backup, and rsync. Thank you for all your suggestions, you have been a tremendous help. I have decided to give rsnapshot a try since it was suggested to me that it would actually do what is supposed to do for Windows clients, too (which was initially my perceived show stopper for rsnapshot).
Still, when getting to the implementation, I was a little disappointed by the very permissive access that needs to be provided on the client machines, since the backup is initiated from the backup server. Even the so called more secure suggested solutions seem way too permissive for my taste, since losing the control over the backup system means basically giving total access to the data from all client machines, which is quite a big problem in my opinion.
The data-transfer mechanism employed by rsnapshot is simply
- S ==(connects and reads all data)==> C
- S stores data in the final storage area
What I would consider a better alternative would be a server-initiated dialogue which goes a little like this (S is server, C is client, '=' represent connections via ssh):
- S ---(requests backup initiation procedure)---> C
- S waits for a defined period of time that C connects back to send (already encrypted) data; if it doesn't arrive, it aborts
- S <===(sends encrypted data to be backed up)=== C
- S <-(signals the completion of data transfer)-- C
- S stores the data in the final storage area
On the other hand, the client information is secure since it can be encrypted directly on the client machine and sent only after encryption, the client machine can decide and control what it sends, while the backup server can only store what the client provides. Also, if the server is compromised, the clients' data and system aren't compromised at all, since the data is on the backup machine, but is encrypted with a key only known on the client (and a backup copy of it can be stored somewhere safe).
I am aware this approach can be problematic for permission (user/group preservation), but it doesn't happen if there is a local <-> remote user mapping or simply the numeric IDs are kept.->
I am also aware this means smarter clients and might mean the Windows machine might not be able to implement this completely, but a little more security than "here is all my data" can still be achieved, can't it?
What do other people think? Am I insane or paranoid?
I think I can implement this type of protocol in some scripts (at least one for server and one for clients) and use the backup_script feature of rsnapshot to keep this clean and nice within rsnapshot.
What might prove problematic with this approach is that rsync spedup is lost (might be?) because the copy is done to a temporary directory which, I assume, is empty, so tough luck. Another problem seems to be that every time the backup is done, the client has to encrypt each of the files to backup, which seems to be a real performance penalty, especially if the data to be backed up is quite large.
Is there an encryption layer that does this automatically at file level in the same/similar manner that LUKS does for entire block devices? Having the right file names, but with scrambled/encrypted contents seems to be the ideal solution from this PoV.
Thanks for reading and possible suggestions you might point me to.
P.S.: I just thought of this, if there was an encryption layer implemented with fuse which is mounted in some directory on the client machine, the default rsnapshot mechanism could actually work, and this would mitigate the data accessibility issue and the performance issue since that file system could be contained within a chroot and the encryption/scrambling would be done transparently on the client, so no data is plainly accessible. Does anybody know such a FUSE implementation that does on-the-fly file encryption?
P.P.S.: EncFS does exactly what I want with its --reverse option which is exactly designed for this purpose:
Normally EncFS provides a plaintext view of data on demand. Normally it stores enciphered data and displays plaintext data. With --reverse it takes as source plaintext data and produces enciphered data on-demand. This can be useful for creating remote encrypted backups, where you do not wish to keep the local files unencrypted.Great!