Cassandra

Cassandra is a highly scalable distributed database. It has the following properties:

  • Distributed / Decentralized;
  • Column Orientated;
  • Key-Value Store;
  • Fault Tolerant; and
  • Horizontally Scalable.

It is written in Java, originally by Facebook and now by Apache.

In Cassandra version, the following features were added:

  • Lightweight transactions;
  • Triggers;
  • UDF (User-Defined Functions); and
  • CQL Enhancements (User-Defined Types).

Queries in Cassandra are done with CQL (Cassandra Query Language).

Data model: columns

Similar to an RDBMS, a data example may look something like this:

  • Authors
    • Andy
    • Name: Andy
    • Tel: 01234 567890
    • Building: QMB
    • Helen
    • Name: Helen
    • Tel: 03456 561753
    • Email: [email protected]

Notice that data can be different for each row (Andy, Helen).

More features

Lightweight transactions

From the official documentation:

INSERT and UPDATE statements using the IF clause support lightweight transactions [...]

Previously, if two users tried creating an account with the same e-mail, the latter would override and delete the former. Now, with lightweight transactions, we can do this instead:

INSERT INTO customer_account (customerID, customer_email)
VALUES (‘LauraS’, ‘[email protected]’) IF NOT EXISTS;

User-Defined Types

In Cassandra we can now do things like these, where we are using the UDF address in the table user_profiles:

CREATE TYPE address ( street text,
city text,
 zip int );

CREATE TABLE user_profiles ( login text PRIMARY KEY, first_name text,
last_name text,
email text,
addresses map<text, frozen<address>> );

Counters

Previously counters needed a read, and then a write, and were extremely flimsy when doing replication and sharding.

You can read the blog post that discusses the new implementation here.

Let's move on to its topology and structure.

Don't forget to look at the Design Anti-patterns at the end of this section.