Remote synchronization from FreeNAS to Amazon S3

Details: Category: just going through the motions; Published: 16 August 2013

A simple enough job: You do some backup to your CIF share on a local FreeNAS box and some of this data needs to be replicated someplace else.

As oure NAS boxes tend to get very big these day, it is properly a good idea to check Amazon storage prices before creating your backup plan and bear in mind what kind of internet access you have available. What you need, or what you are willing to pay for. I my case i move around 60 Gb data to Amazon - out of 1.2 TB backup data in all. Test thoroughly how much data you actually is moving, and start with something small and simple.

What's going to happen:

Create a Amazon AWS account
Create a new Jail
Install S3 tools in this jail
Automate sync
Testing and more testing

There is nothing original in the following. It’s all available someplace else, I just tried to put it all together in one place for my own sake. I’m also not especially LINUX or FreeBSD minded, so perhaps there are some unnecessary detours i have made and so on. The FreeNAS user manual covers all the FreeNAS stuff, and more options for the s3cmd you can find at http://s3tools.org/s3cmd

You need: A Amazon AWS account and a 64 bit FreeNAS setup running. For this setup I’m using version 9.1.0.

1. Amazon account

Sign up for a Amazon AWS account at https://aws.amazon.com/ There should be at free 12 month free trial offer, including 5 Gb on your S3 (You will need a credit card for the registration) After some mucking about with identification, you will be able to get the Access Key ID and the Secret Access Key for your account. You will need those.

2. Local FreeNAS setup

I assume you already have your CIFS or NFS service running, your shares and access rights worked out. You may need to adjust your share for offloading to Amazon, but you will figure it out as your setup comes together. i.e. a normal backup share with accumulated backup data for .. 6 months is not a good idea to start with.

2.1 Sync source folder

As your jail is basically a separate machine, you need a connection between your jail and your FreeNAS source backup directories or shares, as the sync process will be initiated from the jail. Not the actual FreeNAS box.

Path for share

Take a note of the path for the source share. For this run, we will sync the entire share but you can pick out separate files, make a list or whatever fits your need. I use an entire folder as i believe this is the most convenient way to keep track of your backups. The source folder for the sync job is therefore: /mnt/ada0/backup

2.2 Jail

If this is your first jail, you will need to run the global jail configuration first. It is covered in the FreeNAS 9.1.0 User Manual, 10.2 Jails Configuration (p. 215). Basically all you just need is to pick a folder to hold your different jails. FreeNAS user manual mention that jails can be created on UFS volumes, but ZFS volumes is recommended (p. 216).

New jail

Set an appropriate name for this jail. Check the IP address (and make a note of it). Make sure your jail type is set to portjail (Type: pluginjail should also work with the PBI module we are going to use, but for some reason I haven't be able to make this work. Stay with the ‘port’ type for your test) Use networks settings appropriate for your LAN.Click ‘OK’ and wait a few seconds for the jail to be created.

Running jail

You now have a separate (very basic) computer running. Its name is ‘amazon’ (192.168.1.190) We will install the tools needed for the S3 sync on this machine - therefore leaving your FreeNAS box intact.

2.2.1 Adding user and SSH

As the jail is a separate computer you will need to add a new user and enable SSH.

2.2.1.1 Add user

You will need to access your newly created jail from its command line, using Shell (This is also covered very nicely in the FreeNAS user manual, 10.3.2 Accessing the Command Line of a Jail). Using Shell on the FreNAS box (My FreeNAS box is ‘dallas’):

[root@dallas ~]# jls
JID IP Address Hostname Path
1 - amazon /mnt/ada0/amazon
2 - port /mnt/ada0/port
[root@dallas ~]#

Our 'amazon' jail have a path of 'mnt/ada0/amazon' and the ID is '1' (The ‘ada0’ bit is just the name of the volume. Your path will be something like: /mnt/[Volume name used in global jail configuration]/amazon]) To actual access the ‘amazon’ jail:

[root@dallas ~]# jexec 1 /bin/tcsh
root@amazon:/ #

That is: jexec JID /bin/tcsh. The JID is the number you found previously. Now you have (command line) access to the amazon jail.

Add a user to ‘amazon’ jail using ‘adduser’. If needed use ‘rmuser’ to remove user and try something different. Important: Remember at add the new user to the ‘wheel’ group. New user will need admin privileges. That is: Login group is jail. Invite jail into other groups? []: wheel.

root@amazon:/ # adduser
Username: jail
Full name: Jail User
Uid (Leave empty for default):
Login group [jail]:
Login group is jail. Invite jail into other groups? []: wheel
Login class [default]:
Shell (sh csh tcsh nologin) [sh]:
Home directory [/home/jail]:
Home directory permissions (Leave empty for default):
Use password-based authentication? [yes]:
Use an empty password? (yes/no) [no]:
Use a random password? (yes/no) [no]:
Enter password:
Enter password again:
Lock out the account after creation? [no]:
Username : jail
Password : *****
Full Name : Jail User
Uid : 1001
Class :
Groups : jail wheel
Home : /home/jail
Home Mode :
Shell : /bin/sh
Locked : no
OK? (yes/no): yes
adduser: INFO: Successfully added (jail) to the user database.
Add another user? (yes/no): no
Goodbye!
root@amazon:/ #

2.2.2 Enable SSH

By default SSH is not enabled. From the FreeNAS shell:

root@amazon:/ # service sshd start
sshd already running? (pid=3142).
root@amazon:/ #

For my port jail /etc/rc.conf already included sshd_enable="YES". You better test this.

root@amazon:/ # vi /etc/rc.conf

rc.conf should include sshd_enable="YES" otherwise include it (For basic surviael instructions for the vi editor, I will recomend http://www.cs.colostate.edu/helpdocs/vi.html).

:x<Return> quit vi, writing out modified file to file named in original invocation.
:q!<Return> quit vi even though latest changes have not been saved for this vi call

Exit the FreeNAS shell, as we can access the jail from Putty.

3.0 Installing S3 tools

Get or startup Putty (http://www.chiark.greenend.org.uk/~sgtatham/putty/) or any other SSH client you like. Use the IP address for the jail and accept the security warning. Login with user and user credientials created previously in 2.1.1.1 Add user. Welcome to the wonderful world of FreeBSD :) Users defined on the actual FreeNAS box will not work. If something goes wrong, go back to the FreeNAS shell and add another user.

3.0.1 Freshports.org

I will get a FreeBSD port for the S3 tools at freshports.org. Which means: Go to http://www.freshports.org and search for ‘s3cmd’. You should find something like the following:

Freshports.org

‘py-s3cmd’ is what we are looking for. As we will install s3 tools (and dependencies) with the remote fetching options, there is no reason to actually download anything from freshport.org. You just need the name. There is a lot of very interesting stuff at freshport for other projects - so take a few minutes and search for … some interesting stuff.

3.1.0 Install S3 tools

As we are installing some software, we need to be root. From the Putty session do the following:

$ su
Password:
root@amazon:/usr/home/jail #

As root add the s3 packet:

pkg_add -r py27-s3cmd

The result should be something like the following:

bsddb databases/py-bsddb
gdbm databases/py-gdbm
sqlite3 databases/py-sqlite3
====: Command not found.
tkinter x11-toolkits/py-tkinter

Number one reason for this install to fail, is missing admin rights. So if your install is failing, make sure you are actually running as root (the ‘su’ command) We are missing 4 extra dependencies (properly) and they must also be installed. Again from the Putty session as root, you run the following 4 separate commands. One at the time.

pkg_add -r py27-bsddb

pkg_add -r py27-gdbm

pkg_add -r py27-sqlite3

pkg_add -r py27-tkinter

Possible you should do this the other way around: First install the dependencies, and last the s3tool packet. All went well? You are actually ready to test s3cmd now.

3.2 Configure and testing s3tools

Everything should be in order to actually start doing something useful now - like moving some data to Amazon S3. First you need to run a simple configuration tool for s3cmd. You need your access keys from your Amazon account to complete this. To start the configuration of s3cmd, do the following from your putty session (as root) (Again the installation process, sync options tweaking and all sort of stuff is better covered at http://s3tools.org/s3cmd)

s3cmd --configure

You must decide whether or not to use encryption or http/https. If your Amazon access keys are okay, the configuration tool will end up telling you all is okay, and connection to S3 is tested. If this fails, please recheck your Amazon keys and run the configuration tool one more time.

3.3 Simple testing

Sign in to your Amazon account and find the S3 service. There is a lot of options in there, but basically you should get an overview of your S3 buckets. As (amazon) bucket is possible public, your bucket name must be unique. You can do a little test run if you like, but all this stuff is much better described at s3tools homepage (http://s3tools.org/s3cmd). For this test you will list your buckets. Create a new bucket - and remove it again. From the Putty session (as root). List your existing Amazon S3 buckets:

s3cmd ls

Noting - no buckets is created yet. Okay - then try to create a new bucket:

s3cmd mb s3://inetpub.somethingsilly

The use of my domain name is to make sure the bucket name is unique. List your bucket again (ls). Now your silly bucket should be available. As this is a silly bucket, remove the it again (bucket must be empty):

s3cmd rb s3://inetpub.somethingsilly

4.0 Sync

Your jail is a separate machine. It has no access to your FreeNAS volumes as such. The way to make backup data (from the FreeNAS shares) available for the jail, is to add storage to this jail. This is done from the FreeNAS tabletop. We will need a source and a destination folder to complete this.

Source folder is where your actual backup data is located. If you are using CIFS shares for your backups, it is the path for these shares. In chapter “2.1 Sync source folder” we found the path for my backup folder. This was: /mnt/ada0/backup. In my setup /mnt/ada0/backup is the path for the backup CIFS share.

Destination folder is a folder within the jail, which will be linked to the storage. Create a empty directory within the jail and use this. For this I created /storage in the jail. /storage within the jail is my destination folder. Therefore s3cmd will use the /storage folder on the jail as … source. I know this is a bit tricky.

If you copy something into the backup share (that is: /mnt/ada0/backup) it will also be available for the destination folder in the jail. It is important for you backup administration this is understood, tested and working. s3tools now have access to the backup files.

4.1 Sync (again)

Now we create a new bucket and sync this bucket with the local storage folder. Create a new bucket called inetpub.storage. Open a putty session to the jail. SU to root.

s3cmd mb s3://inetpub.storage

If the bucket was created successfully, sync the s3://inetpub.storage bucket with the local /storage folder (test beforehand what actually is in this folder).

s3cmd sync --recursive /storage s3://inetpub.storage

You have just sync a local FreeNAS share with Amazon S3. All the extra options and trimmings for s3cmd, is available from s3tools home page. Escpially have a look at “s3cmd sync HowTo” (http://s3tools.org/s3cmd-sync). You should be able to configure s3cmd exactly to your need and expenses.

4.2 What’s missing

Most likely you will need to run your final s3cmd script as a schedule/cron job. It is not especially complicated to configure the cron jobs itself. As Ben Tasker shows in this text (http://www.bentasker.co.uk/documentation/3-documentation/122-syncing-your-files-with-an-s3-account-on-linux) and if you are running a singel job, this should be fine. But I tend to get real emonitial about saving bandwidth and cost, and therefore tend to create all too many minor jobs (often conflicting). This is not good practice, as it get real tricky when administering cron jobs.

Any suggestions, opinions, comments are very welcome. I would especially like to hear from you, if you make something useful out of this and if you know something smart about administration cron jobs from the command line.