In this post we are going to discuss more about different keys available in Cassandra . Primary key concept in Cassandra is different from Relational databases. Therefore it is worth spending time to understand this concept.

Lets take an example and create a student table which had a student_id as a primary key column.

1) primary key

create table person (student_id int primary key, fname text, lname text, dateofbirth timestamp, email text, phone text );

In Cassandra a table can have number of rows. Each row is referenced by a primary key also called row key. There are number of columns in a row but the number of columns can vary in different rows.

For example one row in a table can have three columns where as the other row in the same table can have ten columns. It is also important to note that in Cassandra both column names and values have binary types. Which means column names can have binary values such as string, timestamp or an integer etc. This is different from SQL databases where each row in an SQL table has fixed number of columns and column names can only be text.

We saw that the student_id was used as a row key to refer to person data.

2)Compound primary key:

As the name suggests, compound primary key is comprised of one or more columns which are referenced in the primary key. One component of compound primary key is called partition key where as the other component is called clustering key. Following are different variations of primary keys. Please note that C1, C2, C3,… and so on represent columns in the table.

C1: primary key has only one partition key and no cluster key.

(C1, C2): column C1 is a partition key and column C2 is a cluster key.

(C1,C2,C3,…): column C1 is a partition key and columns C2, C3 and so on make cluster key.

(C1, (C2, C3,…)): It is same as 3 i.e column C1 is a partition key and columns C2,C3,… make cluster key.

((C1, C2,…), (C3,C4,…)): columns C1, C2 make partition key and columns C3,C4,… make cluster key.

It is important to note that when compound key is C1,C2,C3 then the first key C1 becomes partition key and rest of the keys become part of the cluster key. In order to make composite partition keys we have to specify keys in parenthesis such as: ( ( C1,C2) , C3, C4). In this case C1 & C2 are part of partition keys and C3, C4 are part of cluster key.

3) Partition key

The purpose of partition key is to identify the partition or node in the cluster which stores that row. When data is read or write from the cluster a function called Partitioner is used to compute the hash value of the partition key. This hash value is used to determine the node/partition which contains that row. For example rows whose partition key values range from 1000 to 1234 may reside in node A and rows with partition key values range from 1235 to 2000 may reside in node B as shown in figure 1. If a row contains partition key whose hash value is 1233 then it will be stored in node A.


Cassandra Data Modeling   Primary , Clustering , Partition , Compound Keys

4) Clustering key

The purpose of clustering key is to store row data in sorted order. The sorting of data is based on columns which are included in the clustering key. This arrangement makes it efficient to retrieve data using clustering key.

5) Example :

To make these concepts clear we will consider example of school system.

>Create a keyspace with replication strategy ‘SimpleStrategy’ and replication_factor 1.

create keyspace Students_Details with replication = {‘class’ : ‘SimpleStrategy’, ‘replication_factor’:1};

>Now switch to students_details keyspace:

cqlsh> use students_details ;

>command to check the no of tables present in keyspace

cqlsh:students_details> desc TABLES;

>We will create a table student which contains general information about any student. Type the following create statement into cqlsh.

create table student (stuid int, avg_marks float, description text, primary key (stuid));

> Type the following insert statements to enter some data into this table.

insert into student (stuid, avg_marks, description) values (1,25.5,’student 1′);

insert into student (stuid, avg_marks, description) values (2,35.5,’student 2′);

>to view the details just inserted

cqlsh:students_details> select * from student;

stuid | avg_marks | description

――-+――― +――――-

1 | 25.5 | student 1

2 | 35.5 | student 2

> We can see how Cassandra has stored this data under the hood by using cassandra-cli tool. Run cassandra-cli tool in a separate terminal window and type the following command on that terminal.( NOTE * Cassandra-CLI utility (deprecated)

Important: The CLI utility is deprecated and will be removed in Cassandra 3.0. For ease of use and performance, switch from Thrift and CLI to CQL and cqlsh.)

So if you using cassandra verison above 3.0 then use the below commands .

using the EXPAND Command in cqlsh , we can view the details info for the queries .

>EXPAND with no arguments shows the current value of expand setting.

cqlsh:students_details> EXPAND

Expanded output is currently disabled. Use EXPAND ON to enable.

>Enabling the expand command

cqlsh:students_details> EXPAND ON

Now Expanded output is enabled

>Now view the details inserted above- ( the studid will be present in red color in cqlsh, representing the primary key/row key)

>cqlsh:students_details> select * from student;

@ Row 1

――――-+―――

stuid | 1

avg_marks | 25.5

description | student 1

@ Row 2

――――-+―――

stuid | 2

avg_marks | 35.5

description | student 2

(2 rows)

We can see from the above output that the stuid has become the row key and it identifies individual rows.

cqlsh:students_details> select token(stuid) from student;

@ Row 1

―――――――+―――――――-

system.token(stuid) | -4069959284402364209

@ Row 2

―――――――+―――――――-

system.token(stuid) | -3248873570005575792

Also you can see that their are two tokens .

We can use columns in the primary key to filter data in the select statement. Type the following command in the cqlsh window:

select * from student where stuid = 1;

> Now we will create another table called marks which records marks of each student for every day(say every day new exams and marks are recorded). Type the following command on cqlsh:

create table marks(stuid int,exam_date timestamp,marks float,exam_name text , primary key (stuid,exam_date));

> This statement creates marks table with primary key ( stuid , exam_date ). As primary key has two components therefore the first component is considered as partition key and the second component becomes the cluster key. Add some data into the table:

1.

insert into marks(stuid ,exam_date ,marks ,exam_name) values (1,’2016-11-10′,76 ,’examA’);

2.

insert into marks(stuid ,exam_date ,marks ,exam_name) values (1,’2016-11-11′,90 ,’examB’);

3.

insert into marks(stuid ,exam_date ,marks ,exam_name) values (1,’2016-11-12′,68 ,’examC’);

cqlsh:students_details> select * from marks;

@ Row 1

――― +――――――――

stuid | 1

exam_date | 2016-11-09 18:30:00+0000

exam_name | examA

marks | 76

@ Row 2

――― +――――――――

stuid | 1

exam_date | 2016-11-10 18:30:00+0000

exam_na

本文数据库(综合)相关术语:系统安全软件

主题: CassandraSQL
分页:12
转载请注明
本文标题:Cassandra Data Modeling Primary , Clustering , Partition , Compound Keys
本站链接:http://www.codesec.net/view/485577.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(31)