未加星标

Getting Started With Cassandra: Using CQL API and CQLSH

字体大小 | |
[数据库(综合) 所属分类 数据库(综合) | 发布者 店小二04 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

Apache Cassandra is one of the most popular open-source distributed database systems available. It was designed with the goal of handling large amounts of data stored in many servers distributed across geographies while providing high scalability and availability with no single point of failure. Cassandra systems can span multiple data centres, allowing low latency for all connected clients.

This is a three-part tutorial series where I will start with the basics of Cassandra, using CQLSH to create tables and records. Then I'll explain the various data types supported by Cassandra, and then we'll use a Go client library to handle Cassandra operations programmatically.

In this first part, I will cover how the Cassandra data model is laid out in brief and perform basic operations using CQLSH.

For this tutorial series, I am assuming that readers would be able to install Cassandra by themselves on their respective machines depending on the operating system.

The Cassandra Data Model

The Cassandra data model follows the column family approach, which can easily be understood as being analogous to a relational table structure but in a NoSQL way. The description below should make it clearer:

Keyspace

A keyspace can be seen as the outermost container for data in Cassandra. All data in Cassandra should live inside a keyspace. It can be seen as a database in RDBMS which is a collection of tables. In the case of Cassandra, a keyspace is a collection of column families.

Column Family

A column family can be seen as a collection of rows, and each row is a collection of columns. It is analogous to a table in RDBMS but has some differences. The column families are defined, but it is not necessary for each row to have all the columns, and columns can be added or removed from a row as and when required.

Column

The column is the basic unit of data in Cassandra. It has three values: key or column name, column value, and a timestamp.

Super Column

A super column is a special type of column which stores a map of other sub-columns. It makes storing complex data easier and also makes data fetching faster as each column family in Cassandra is stored in a single file on the file system.

Using Cassandra Console

CQLSH is the standard shell for interacting with Cassandra through CQL (Cassandra Query Language). CQL is very similar to SQL (which is mostly used for RDBMS) and hence makes it very easy for developers new to Cassandra to get working with it quickly. CQLSH is shipped with every Cassandra package and should already be installed on your machine when you installed Cassandra.

Create a Keyspace

As we saw in the data model described above, a keyspace is the outermost container and should be created before anything else. To create it, run:

$ cqlsh localhost -e "CREATE KEYSPACE IF NOT EXISTS k1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;"

In the above command, I have assumed that your Cassandra exists on localhost without any user authentication. I have created a keyspace called k1 with replication and durable_writes policy defined.

If you have user authentication defined, you can run:

$ cqlsh -u <username> -p <password> localhost -e "CREATE KEYSPACE IF NOT EXISTS k1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;"

In the above command, replace <username> and <password> with your authentication credentials.

Running a command like this can be a bit cumbersome. Another way is to launch the CQLSH prompt and then run queries directly inside it.

$ cqlsh -u <username> -p <password> localhost
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
[email protected]> CREATE KEYSPACE IF NOT EXISTS k1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;

Moving ahead, I will be using the above method of running queries. Before running any other query, we need to tell CQLSH which keyspace should be used.

[email protected]> USE k1;
[email protected]:k1>

The replication_factor for a keyspace can be altered to suit how much replication is needed as per the replication class .

[email protected]:k1> ALTER KEYSPACE "k1" WITH REPLICATION =
{ 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; Create and Alter a Table

A table is equivalent to a column family in Cassandra. Cassandra supports many different datatypes for storing data, which I will be covering in detail in the next part of this tutorial series. To create a table, simply run the CREATE TABLE command.

[email protected]:k1> CREATE TABLE person (
id text,
name text,
surname text,
PRIMARY KEY (id));

To check how the structure of the table looks once created:

[email protected]:k1> DESCRIBE person;
CREATE TABLE k1.person (
id text PRIMARY KEY,
name text,
surname text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

Now let's say we want to alter the table to store the email of the person as well.

[email protected]:k1> ALTER TABLE person ADD email text;
[email protected]:k1> DESCRIBE person;
CREATE TABLE k1.person (
id text PRIMARY KEY,
email text,
name text,
surname text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE'; Insert and Update Data

Inserting data into a Cassandra table using CQL is pretty straightforward.

[email protected]:k1> SELECT * FROM person;
id | email | name | surname
----+-------+------+---------
(0 rows)
[email protected]:k1> INSERT INTO person (id, name, surname, email) VALUES ('001', 'Shalabh', 'Aggarwal', '[email protected]');
[email protected]:k1> SELECT * FROM person;
id | email | name | surname
-----+-----------------------------+---------+----------
001 | [email protected] | Shalabh | Aggarwal

In this table, we have all the fields for only one data type. Things become a bit complex when we're using different datatypes or composite datatypes. This will be a discussion in the next part of this series.

Let's say we want to update the value in the column email to something else.

[email protected]:k1> UPDATE person SET email='[email protected]' WHERE id='001';
[email protected]:k1> SELECT * FROM person;
id | email | name | surname
-----+--------------------------+---------+----------
001 | [email protected] | Shalabh | Aggarwal Querying Data

Data in a table can be queried simply by using SELECT statements.

Let's insert some more records and query them.

[email protected]:k1> INSERT INTO person (id, name, surname, email) VALUES ('002', 'John', 'Doe', '[email protected]');
[email protected]:k1> INSERT INTO person (id, name, surname, email) VALUES ('003', 'Harry', 'Potter', '[email protected]');
[email protected]:k1> SELECT * from person;
id | email | name | surname
-----+--------------------------+---------+----------
002 | [email protected] | John | Doe
001 | [email protected] | Shalabh | Aggarwal
003 | [email protected] | Harry | Potter
(3 rows)
[email protected]:k1> SELECT name FROM person WHERE id='001';
name
---------
Shalabh
(1 rows)
[email protected]:k1> SELECT name FROM person WHERE id IN ('001', '002');
name
---------
Shalabh
John
(2 rows)

More complex query operators like inequality operators can also be used, or several WHERE conditions can be concatenated using AND / OR , etc.

Conclusion

Cassandra is one of the most popular NoSQL database systems available and is the best build to be used in distributed environments. Dealing with Cassandra is pretty easy for beginners with some knowledge of RDBMS and SQL.

CQL is very similar to SQL to a certain extent, and CQLSH makes testing and debugging much easier. In the next part of this series, I will cover the various datatypes provided by Cassandra and how to deal with them.

本文数据库(综合)相关术语:系统安全软件

主题: CassandraSQLRIMTIRY
分页:12
转载请注明
本文标题:Getting Started With Cassandra: Using CQL API and CQLSH
本站链接:http://www.codesec.net/view/531903.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(76)