Tag Archives: Database

Cassandra

I’m a huge fan of all the cloud technologies. I’ve been working on a M2M project on top of cassandra and I can really say I love this distributed database. I’d like to give my feedback on this great database.

Easy management

Cassandra doesn’t require any kind of manual management for complex operations like sharding data accross node restore a crashed server or put a new or a previous disconnected node back into the cluster. You just have to tell the nodes to join the cluster and watch him do all the work.

It’s obviously a little bit more difficult to start with cassandra than it is to start with MySQL but it’s conceptually easier to understand. Management tools are clearly lacking though.

Data paradigm change

You have an extraordinar flexbility with cassandra, you can add columns to column families (“Table” equivalent) at any time. But you can’t use indexes the same way as you do in relationnal databases. For large indexed data, time series, you need to build your own indexes.

Because everything is retrieved on a per-row basis and that each row can be a different server. You need to retrieve as much data as possible per row. Which means you sometimes need to forget about creating a like to the data and putting the data itself. In my cases, data coming from equipment are stored twice. Once depending on the equipmentId then the time and once depending on the equipmentId and the dataType, then the time.
They are some very interesting articles about this:

If found that in many cases, saving objects directly in a json form made my life a lot easier. And as all data is compressed internally, it doesn’t takes too much additional space.

Particularities

As said earlier, it’s best to store rows with a lot of columns in cassandra. Columns are often used in a completely different way than they are used in relationnal databased, then can be time values. But then you also have to take care of not making too much columns. I use 100 000 columns without any problem. If you have 1M and more columns, your data retrieval could take a lot of time (it could be a matter of seconds). I discovered this while doing some profiling, it came as a surprise for me because cassandra is “advertised” as being able to handle billions of columns. So, sure it can handle billions of columns, but you shouldn’t do it.

Cassandra supports TTL (Time To Leave), it’s very useful for temporary data like sessions or cached values. Data is garbage collected automatically.

Because it’s a distributed database, cassandra distribute deletion as if they were values. A deleted column is in fact a column where the value has a deleted state. The data is actually deleted 1 week after it was marked as being deleted. This mecanism allows failing node to be plugged back into the cluster at most one week after they disconnected.
Deleted columns count as classical columns internally, you might end-up with serious performance issues if you delete and create a huge number of columns at the same time.

It eats all your memory

Cassandra with its default settings eats a lot of memory. With 2GB, it will have some OutOfMemoryErrors, with 4GB, it will flush data very frequently. It runs ok with 8GB. And in production, I like to give it 12GB of memory. It’s a not really a problem, you just have to buy bigger server. But if you sell your software so that it can be installed on a client architecture, this can be a little bit more problematic.

MySQL Master-Master desynchronization

Settings a master-master synchronization is pretty easy. You can find a quick guide to do this on google, just try.

What is a little bit more problematic is when you lose your loved sync. And that can happen. It happened to me yesterday, I upgraded my two servers from Debian 4.0 to Debian 5.0. The reason is that the old version (something like 5.0.32) used the Relay_Log_File ${hostname}-relay-bin.XXXXXX. And the new version (5.0.51a-24-log) decided to use mysqld-relay-bin.XXXXXX.

I tried to solve it the easy way with just a :

1
2
STOP SLAVE;
CHANGE MASTER TO RELAY_LOG_FILE="mysqld-relay-bin.000001";

but the server wasn’t happy with that. So I took the time to take the current slave settings :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
mysql> show slave status\G
*************************** 1. row ***************************
             Slave_IO_State:
                Master_Host: **HOST**
                Master_User: slave1
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000521
        Read_Master_Log_Pos: 306043
             Relay_Log_File: mysqld-relay-bin.000050
              Relay_Log_Pos: 306137
      Relay_Master_Log_File: mysql-bin.000521
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 306043
            Relay_Log_Space: 306137
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

On both server, I did something like that :

1
2
3
4
5
STOP SLAVE;
RESET SLAVE;
CHANGE MASTER TO RELAY_LOG_FILE="mysqld-relay-bin.000001", MASTER_LOG_FILE="mysql-bin.000521", MASTER_LOG_POS=306043,
RELAY_MASTER_LOG_FILE='mysql-bin.000521', RELAY_MASTER_LOG_POS=306043;
START SLAVE;

The position of the log is pretty important.

It might sometimes occurs that you RELAY_MASTER_LOG_POS on the slave is higher that the Position that the server possesses for the same file. The reason is that sometimes the master sends its logs, crashs and haven’t saved it to his logs. In this case, you might have a little bit more data on the slave than the server. That means that you should lower the MASTER_LOG_POS of your slave so that it isn’t higher than the Position shown by the “show master status\G” command.

I really recommend using this precise method instead of restarting the whole synchronization process. Which occasionally requires you to stop your servers to copy the files. On my servers, the databases take 15 GB. It isn’t a big database bug that already makes you think twice before doing a copy (or even a rsync).

By the way, I really talked about it, but InnoDB/MySQL is a very simple/efficient/sure choice for a database. The only big drawback of the InnoDB is that there isn’t a simple online backuping solution. I like to save the database and its logs regulary (a bad raw copy of a InnoDB database can be corrected by its logs) and to extract the whole database in SQL every night.

By the way, if you don’t understand everything and particularly why each slave must have two threads to read master logs, you should read this.