Tag Archives: Technology/Internet

TINC – Simple P2P VPN

The world is full of good surprises.

If you joined the NoSQL gang like me, chose Cassandra to store your data and you distributed your system among different datacenters. Wouldn’t it be great to interconnect all your nodes on a virtual private network with no single point of failure? Well, TINC does just that. In fact, it does a little bit more because it’s able to establish a meshed network if hosts can’t directly contact each other (in case of a routing issue, a NAT firewall, etc).

One of the amazing things about this software is that it’s really simple to setup. I followed some setup instructions and it just worked. I didn’t have to increase the verbosity or check any log, it just worked everywhere.

Sources:

Cassandra

I’m a huge fan of all the cloud technologies. I’ve been working on a M2M project on top of cassandra and I can really say I love this distributed database. I’d like to give my feedback on this great database.

Easy management

Cassandra doesn’t require any kind of manual management for complex operations like sharding data accross node restore a crashed server or put a new or a previous disconnected node back into the cluster. You just have to tell the nodes to join the cluster and watch him do all the work.

It’s obviously a little bit more difficult to start with cassandra than it is to start with MySQL but it’s conceptually easier to understand. Management tools are clearly lacking though.

Data paradigm change

You have an extraordinar flexbility with cassandra, you can add columns to column families (“Table” equivalent) at any time. But you can’t use indexes the same way as you do in relationnal databases. For large indexed data, time series, you need to build your own indexes.

Because everything is retrieved on a per-row basis and that each row can be a different server. You need to retrieve as much data as possible per row. Which means you sometimes need to forget about creating a like to the data and putting the data itself. In my cases, data coming from equipment are stored twice. Once depending on the equipmentId then the time and once depending on the equipmentId and the dataType, then the time.
They are some very interesting articles about this:

If found that in many cases, saving objects directly in a json form made my life a lot easier. And as all data is compressed internally, it doesn’t takes too much additional space.

Particularities

As said earlier, it’s best to store rows with a lot of columns in cassandra. Columns are often used in a completely different way than they are used in relationnal databased, then can be time values. But then you also have to take care of not making too much columns. I use 100 000 columns without any problem. If you have 1M and more columns, your data retrieval could take a lot of time (it could be a matter of seconds). I discovered this while doing some profiling, it came as a surprise for me because cassandra is “advertised” as being able to handle billions of columns. So, sure it can handle billions of columns, but you shouldn’t do it.

Cassandra supports TTL (Time To Leave), it’s very useful for temporary data like sessions or cached values. Data is garbage collected automatically.

Because it’s a distributed database, cassandra distribute deletion as if they were values. A deleted column is in fact a column where the value has a deleted state. The data is actually deleted 1 week after it was marked as being deleted. This mecanism allows failing node to be plugged back into the cluster at most one week after they disconnected.
Deleted columns count as classical columns internally, you might end-up with serious performance issues if you delete and create a huge number of columns at the same time.

It eats all your memory

Cassandra with its default settings eats a lot of memory. With 2GB, it will have some OutOfMemoryErrors, with 4GB, it will flush data very frequently. It runs ok with 8GB. And in production, I like to give it 12GB of memory. It’s a not really a problem, you just have to buy bigger server. But if you sell your software so that it can be installed on a client architecture, this can be a little bit more problematic.

cron-apt and the perfect update system

On my spare time, I manage a handful of servers. And even if it’s not really my job, I try to do it well and efficiently. All of them work on Debian because it’s simple to manage. I started using cron-apt a few years ago. I started by upgrading everything automatically, this was a big mistake. I switched to only sending mails on available upgrades and doing the upgrade manually. But this is also quite painful because 95% of the time, it consists in typing “apt-get dist-upgrade -y” and waiting and I have lots more interestings things to do.

So here is my cron-apt configuration, I like it a lot:

In /etc/apt:
– I removed the sources.list file
– I put the content of my sources.list into sources.list.d/main.list, it should look something like that:

1
2
deb http://http.us.debian.org/debian stable main contrib non-free
deb-src http://http.us.debian.org/debian stable main contrib non-free

– I created a directory sources.security.list.d
– I put the following content:

1
2
deb http://security.debian.org/ stable/updates main contrib non-free
deb-src http://security.debian.org/ stable/updates main contrib non-free

Then I added the repositories with packages I want to manually upgrade to /etc/apt/sources.list.d/ and the ones that I want to automatically upgrade (which means that they can’t require any user interaction) to /etc/apt/sources.security.list.d/.

The interesting part is here, in /etc/cron-apt/action.d, this what I have:

0-update

1
2
update -o quiet=2
update -o quiet=2 -o Dir::Etc::sourceparts=/etc/apt/sources.security.list.d -o Dir::State::lists="security-lists"

We launch an update of the two kinds of repositories. For the sources.security.list.d one, we use also a different Dir::State::lists parameter (which is the directory the cache file) so that we don’t to re-download the content of the index files every time.

2-install-security

1
dist-upgrade -y -o quiet=1 -o Dir::Etc::sourceparts=/etc/apt/sources.security.list.d -o Dir::State::lists="security-lists" -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold"

For the –force-conf* options, I found the solution on RaphaĆ«l Hertzog’s blog.

We launch the upgrade (dist-upgrade actually) only on the repositories defined in /etc/apt/sources.security.list.d.

3-download

1
dist-upgrade -d -y -o APT::Get::Show-Upgraded=true

Then we only download files for the upgrade of the non-security packets.

6-clean

1
autoclean -y

And we finally delete all the old packets (the ones that will never be used).

If you want to play with the apt settings yourself, you should use apt-config to see what can change to fit your needs.

This made me save a lot of time. Because Debian produces quite a lot of security updates. Here is the frequency of the updates for one of my servers:

btrfs for a simple and powerful backup system

I’ve been testing btrfs for some months now. One of the most interesting features of this file-system is its snapshoting capabilities. Before that I was using rsnapshot. The issue with rsnapshot is that its lowest atomic level for snapshotting is the files themselves using hard-links. So any database table where one row is changed is copied completely. Btrfs as you might guess will only copy the modified chunks (I don’t know the atomicity of them [but who cares?]).

Here is a simple I’ve been using during these last months to backup my laptop and (remotely hosted) servers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
#!/bin/sh
 
# === ARGUMENTS PARSING ===
 
# We don't want to define a default period
PERIOD=
 
while echo $1 | grep ^- > /dev/null; do
 
    if [ "$1" = "--daily" ]; then
        PERIOD=daily
    fi
 
    if [ "$1" = "--monthly" ]; then
        PERIOD=monthly
    fi
 
    if [ "$1" = "--period" ]; then
        PERIOD=$2
        shift
    fi
 
    shift
done
 
if [ "${PERIOD}" = "" ]; then
        echo "You have to define a period  with the --period arg !" >&2
        exit 1
fi
 
# === END OF ARGUMENTS PARSING ===
 
# === PARAMETERS ===
 
# * Device we will use
DISK=/mnt/externe3
 
# * Subvolume used for the backup
SUBVOLUME=${DISK}/servers-backup
 
# * Current date (you could limit the date to +%Y-%m-%d)
DATE=`/bin/date +%Y-%m-%d_%H-%M-%S`
 
# * snapshot directory that will be used
SNAPDIR=${DISK}/snap/servers-backup
 
# * snapshot volume that will be used
SNAPVOL=${SNAPDIR}/${PERIOD}-${DATE}
 
# * max days to keep daily backups
MAX_DAYLY=60
 
# * max days to keep monthly backups
MAX_MONTHLY=365
 
# * Alert limit
LIMIT_ALERT=95
 
# * High limit
LIMIT_HIGH=90
 
# * Low limit
LIMIT_LOW=85
 
# === END OF PARAMETERS ===
 
# We get the space used over the total allocated space and the total percentage use.
# This is NOT the device total size but it's a lot more reliable than "df -h"
DISK_USED=`/sbin/btrfs filesystem df ${DISK}|grep Data|grep -Po "used=([0-9]*)"|cut -d= -f2`
DISK_TOTAL=`/sbin/btrfs filesystem df ${DISK}|grep Data|grep -Po "total=([0-9]*)"|cut -d= -f2`
DISK_PERC=`echo 100*${DISK_USED}/${DISK_TOTAL}|bc`
 
# We create the snapshot dir if it doesn't exist
if [ ! -d ${SNAPDIR} ]; then
        mkdir -p ${SNAPDIR}
fi
 
cd ${SNAPDIR}
 
# If we are over the low free space limit,
# we delete two days of daily backup.
if [ $DISK_PERC -gt $LIMIT_LOW ]; then
        echo "LOW LIMIT reached: $DISK_PERC > $LIMIT_LOW : Deleting 2 days" >&2
 
        OLDEST_FILES=`ls --sort=time -r|grep "daily-.*"|head -2`
        for file in $OLDEST_FILES; do
                /sbin/btrfs subvolume delete $file;
        done
 
fi
 
# If we are over the high free space limit,
# we delete a month of monthly backup
if [ $DISK_PERC -gt $LIMIT_HIGH ]; then
        echo "HIGH LIMIT reached: $DISK_PERC > $LIMIT_HIGH : Deleting 1 month" >&2
 
        OLDEST_FILES=`ls --sort=time -r|grep "monthly-.*"|head -1`
        for file in $OLDEST_FILES; do
                /sbin/btrfs subvolume delete $file;
        done
 
fi
 
# If we are over the alert free space limit,
# we delete the first two oldest files we can find
if [ $DISK_PERC -gt $LIMIT_ALERT ]; then
        echo "ALERT LIMIT reached: $DISK_PERC > $LIMIT_ALERT : Deleting the 2 oldest" >&2
 
        OLDEST_FILES=`ls --sort=time -r|head -2`
        for file in $OLDEST_FILES; do
                /sbin/btrfs subvolume delete $file;
        done
fi
 
 
# We touch the subvolume to change the modification date
touch ${SUBVOLUME}
 
# We do a snapshot of the subvolume
if [ ! -d "${SNAPVOL}" ]; then
        /sbin/btrfs subvolume snapshot ${SUBVOLUME} ${SNAPVOL}
fi
 
# We delete the backups older than MAX_DAYLY
find ${SNAPDIR} -mindepth 1 -maxdepth 1 -mtime +${MAX_DAYLY} -name "daily-*" -exec /sbin/btrfs subvolume delete {} \;
 
# We delete the backups older than MAX_MONTHLY
find ${SNAPDIR} -mindepth 1 -maxdepth 1 -mtime +${MAX_MONTHLY} -name "monthly-*" -exec /sbin/btrfs subvolume delete {} \;
 
 
# This is the actual backup code
# You need to save your data into the ${SUBVOLUME} directory
 
# We will only do the actual backup for the daily task
if [ "${PERIOD}" = "daily" ]; then
 
rsync -auv --inplace /usr/local/bin ${SUBVOLUME}/localhost/usr/local
 
fi

Then this is how you can use it by adding these cron-tasks :

1
2
0 12 * * *  user /usr/local/bin/backup-servers --period daily   >/var/log/backup-servers-daily.log
55 10 1 * * user /usr/local/bin/backup-servers --period monthly >/var/log/backup-servers-monthly.log

Debian 6.0

Debian released a new version of their system. I updated it on the server that powers this blog, it took me something like one hour to do the whole system upgrade. There was only a little glitch with mysql’s my.cnf file that had an unsupported “skip-bdb” line. Everything else went fine…

The very good thing in this new release is the new kfreebsd version (available in i386 and x86_64). It brings the power of the FreeBSD kernel to the great Debian OS. If you don’t see the point, read this. To put in a nutshell: a more stable kernel with less legal issues, better vendors support and the same softwares.