您的位置:首页 > 数据库

Cassandra数据库学习

2016-05-18 10:10 253 查看

http://wayneshawn.github.io/2015/04/07/Cassandra-get-started/




在线资源


Cassandra Getting Started

2010-07-15
分布式 Key-Value 存储系统:Cassandra 入门
2015-03-25
Apache Cassandra Wiki

-DATASTAX
Documentation

-Cassandra2.x中文教程系列Blog


Python Cassandra-driver

cassandra-driver
2.5.0


单节点Cassandra使用示范


1.启动Cassandra

若未设置环境变量,进入到Cassandra的bin目录下
[root@server1 bin]# ./cassandra -f


若未使用
-f
选项,Cassandra会作为daemon进程运行。


2.使用cqlsh连接本地Cassandra

[root@server1 bin]# ./cqlsh -f

[root@server1 bin]# ./cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.13 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.

cqlsh> CREATE KEYSPACE mykeyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> use mykeyspace ;

cqlsh:mykeyspace> create table users( name text primary key, age int, email text );

cqlsh:mykeyspace> insert into users(name, age, email) values('wayne', 21, 'leon_sin@126.com');
cqlsh:mykeyspace> insert into users(name, age, email) values('kerr', 22, 'singleon@126.com');
cqlsh:mykeyspace>

cqlsh:mykeyspace> select * from users;
name   | age | email
--------+-----+-------------------
kerr |  22 |  singleon@126.com
lambda |  20 | 227089@qq.com
wayne |  21 |  leon@126.com

CQL指代Cassandra Query Language。


3.使用Cassandra-driver示例cassandraDriverTest.py

from cassandra.cluster import Cluster

cluster = Cluster()
session = cluster.connect('mykeyspace')

#1.you should use %s for all types of arguments
#2.second argument should be a sequence, one element tuple should be ('blah',)
session.execute('INSERT INTO users(name, age, email) VALUES(%s, %s, %s)', ('shawn', 21, 'shawn@163.com'))

rows = session.execute('SELECT name, age, email FROM users')
for (name, age, email) in rows:
print name, age, email


4.关闭Cassandra进程

可以使用
ps -ef|grep cassandra
来查找其进程id,然后kill掉。


简单的两节点Cassandra集群配置

参考资源

-Initializing
a multiple node cluster (single data center)

-简单配置cassandra集群


0.实验环境

VMware9.0.2,CentOS 6.5 64bits,Cassandra 2.0.13


1.先假定在如下系统上都安装了Cassandra

node0 192.168.56.100 (seed)
node1 192.168.56.201


2.更改防火墙设置或者直接关闭防火墙

对于CentOS,
$setup
进入设置(图形界面),可以关闭防火墙

3.关闭Cassandra进程并清除数据

$ps -ef|grep cassandra
$kill pid
$rm -rf /var/lib/cassandra/data/system/*


4.设置/conf/cassandra.yaml

node0:

seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "192.168.56.100"

listen_address: 192.168.56.100
rpc_address: 0.0.0.0
endpoint_snitch: GossipingPropertyFileSnitch

node1:

seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "192.168.56.100"

listen_address: 192.168.56.201
rpc_address: 0.0.0.0
endpoint_snitch: GossipingPropertyFileSnitch


5.设置/conf/cassandra-rackdc.properties

例如:
# indicate the rack and dc for this node
dc=DC1
rack=RAC1


6.启动Cassandra

在我的实验中,node0的主机名为master,node1的主机名为slave1.之所以这样起,因为最初是安装一个hadoop集群配置教程来设置的。对于VMware搭建Cassandra集群来说,关键在于两个能ping通的虚拟机。

先启动node0的Cassandra
[root@master bin]# ./cassandra


再启动node1的Cassandra
[root@slave1 bin]# ./cassandra


7.检查ring是否在运行

列出来的节点状态应该UN(UP Normal)

[root@master bin]# ./nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns (effective)  Host ID                               Rack
UN  192.168.56.201  74.89 KB   256     100.0%            e6121751-682e-4833-8de7-718eac08e718  RAC1
UN  192.168.56.100  105.21 KB  256     100.0%            a153a679-5add-4995-adbf-


8.测试

在之前节点的测试中,我已经在mykeyspace的users表中插入了4条记录。

现在我们在node0中插入第五条记录.

[root@master bin]# ./cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.13 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> use mykeyspace ;
cqlsh:mykeyspace> select * from users;

name   | age | email
--------+-----+-------------------
kerr |  22 |  singleon@126.com
lambda |  20 | 2270891001@qq.com
wayne |  21 |  leon_sin@126.com
shawn |  21 |     shawn@163.com

(4 rows)

cqlsh:mykeyspace> insert into users(name, age, email) values('slave', 40, 'zwxx@126.com');
cqlsh:mykeyspace> select * from users;

name   | age | email
--------+-----+-------------------
slave |  40 |      zwxx@126.com
kerr |  22 |  singleon@126.com
lambda |  20 | 2270891001@qq.com
wayne |  21 |  leon_sin@126.com
shawn |  21 |     shawn@163.com

(5 rows)

cqlsh:mykeyspace>

接下来,我们在node1进行查询,由于node1之前是使用VMware的clone功能从master拷贝来并作相应修改的,因此node1最初也在users表中有4条记录。现在我们去验证是否增加了一条记录。

[root@slave1 bin]# ./cassandra-cli -h 192.168.56.201
Connected to: "Test Cluster" on 192.168.56.201/9160
Welcome to Cassandra CLI version 2.0.13

The CLI is deprecated and will be removed in Cassandra 3.0.  Consider migrating to cqlsh.
CQL is fully backwards compatible with Thrift data; see http://www.datastax.com/dev/blog/thrift-to-cql3 
Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.
[default@mykeyspace]
[default@mykeyspace] list users;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: slave
=> (name=, value=, timestamp=1428733896613000)
=> (name=age, value=00000028, timestamp=1428733896613000)
=> (name=email, value=7a777878403132362e636f6d, timestamp=1428733896613000)
-------------------
RowKey: kerr
=> (name=, value=, timestamp=1428733672723000)
=> (name=age, value=00000016, timestamp=1428733672723000)
=> (name=email, value=73696e676c656f6e403132362e636f6d, timestamp=1428733672723000)
-------------------
RowKey: lambda
=> (name=, value=, timestamp=1428414359621000)
=> (name=age, value=00000014, timestamp=1428414359621000)
=> (name=email, value=323237303839313030314071712e636f6d, timestamp=1428414359621000)
-------------------
RowKey: wayne
=> (name=, value=, timestamp=1428733660801000)
=> (name=age, value=00000015, timestamp=1428733660801000)
=> (name=email, value=6c656f6e5f73696e403132362e636f6d, timestamp=1428733660801000)
-------------------
RowKey: shawn
=> (name=, value=, timestamp=1428417278072000)
=> (name=age, value=00000015, timestamp=1428417278072000)
=> (name=email, value=736861776e403136332e636f6d, timestamp=1428417278072000)

5 Rows Returned.
Elapsed time: 572 msec(s).

运行程序cassandraDriverTest.py,也能看到新增加了一条记录‘slave’

[Kerr@slave1 ~]$ python cassandraDriverTest.py
slave 40 zwxx@126.com
kerr 22 singleon@126.com
lambda 20 2270891001@qq.com
wayne 21 leon_sin@126.com
shawn 21 shawn@163.com


多节点Cassandra配置的地址问题

情景:搭建了3节点Cassandra集群,IP分别为172.16.37.17,172.16.37.18,172.16.37.19(seed 为172.16.37.18).只启动18和19上的Cassandra,那么从17节点能否使用Cassandra-driver连接数据库并查询?(节点之间互相能ping通的)


配置1

IP 172.16.37.18
seeds: "172.16.37.18"
listen_address: c37b18
rpc_address: localhost
endpoint_snitch: GossipingPropertyFileSnitch

IP 172.16.37.17
seeds: "172.16.37.18"
listen_address: c37b17
rpc_address: localhost
endpoint_snitch: GossipingPropertyFileSnitch

IP 172.16.37.19
seeds: "172.16.37.18"
listen_address: c37b19
rpc_address: localhost
endpoint_snitch: GossipingPropertyFileSnitch

.17节点上的cassandra-driver测试程序

from cassandra.cluster import Cluster
cluster = Cluster(['c37b18','c37b19'])
session = cluster.connect('lsflog')
res = session.execute('SELECT * FROM jcleanlog')
print res

结果:
session = cluster.connect('lsflog') File "/usr/lib/python2.6/site-packages/cassandra_driver-2.5.0-py2.6.egg/cassandra/cluster.py",
line 756, in connect self.control_connection.connect() File "/usr/lib/python2.6/site-packages/cassandra_driver-2.5.0-py2.6.egg/cassandra/cluster.py", line 1867, in connect self._set_new_connection(self._reconnect_internal()) File "/usr/lib/python2.6/site-packages/cassandra_driver-2.5.0-py2.6.egg/cassandra/cluster.py",
line 1902, in _reconnect_internal raise NoHostAvailable("Unable to connect to any servers", errors) cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'c37b18': error(111, "Tried connecting to [('172.16.37.18', 9042)]. Last error: Connection
refused"), 'c37b19': error(111, "Tried connecting to [('172.16.37.19', 9042)]. Last error: Connection refused")})


相关知识(添加于20150513)

broadcast_rpc_address

The broadcast_rpc_address should be an IP address that drivers/clients can connect to.link
RPC address to broadcast to ·drivers· and ·other Cassandra nodes·. This cannot be set to 0.0.0.0. If left blank, this will be set to the value of rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address must be set.(/conf/cassandra.yaml)
如果不设置broadcast_rpc_address,它会默认与设置的rpc_address相同。

rpc_address

unset:

Resolves the address using the hostname configuration of the node. If left unset, the hostname must resolve to the IP address of this node using /etc/hostname, /etc/hosts, or DNS.
0.0.0.0:

Listens on all configured interfaces, but you must set the broadcast_rpc_address to a value other than 0.0.0.0.
IP address
hostname

关于Cassandra 的Port使用
(link)
7199 - JMX (was 8080 pre Cassandra 0.8.xx)
7000 - Internode communication (not used if TLS enabled)
7001 - TLS Internode communication (used if TLS enabled)
9160 - Thift client API
9042 - CQL native transport port

关于nodetool的使用

从node1尝试./nodetool <-h node2-ip> Connection refused

我目前只能在启动了Cassandra的节点上使用./nodetool status

比如我尝试从.17节点指定-h 172.16.37.18会
Failed to connect to '172.16.37.18:7199' - ConnectException: 'Connection refused'.


值得注意的是从18节点自己来
./nodetool status
正常
./nodetool -h 172.16.37.18 status
 Connection refused
./nodetool -h localhost status
 正常

似乎跟JMX设置有关
stackoverflow
problem1
/conf/cassandra-env.sh
中有如下语句
# jmx: metrics and administration interface
#
# add this if you're having trouble connecting:
# JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=<public name>"
#
# see
# https://blogs.oracle.com/jmxetc/entry/troubleshooting_connection_problems_in_jconsole # for more on configuring JMX through firewalls, etc. (Short version:
# get it working with no firewall first.)
#
# Cassandra ships with JMX accessible *only* from localhost.
# To enable remote JMX connections, uncomment lines below
# with authentication and/or ssl enabled. See https://wiki.apache.org/cassandra/JmxSecurity #
LOCAL_JMX=yes
if [ "$LOCAL_JMX" = "yes" ]; then
JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.local.port=$JMX_PORT -XX:+DisableExplicitGC"
else
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=true"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password"
fi

注意上述中
JMX accessible *only* from localhost
我尝试注释掉LOCAL_JMX=yes,并将后面的需要authenticate的语句注释掉,但是还是会报错。Error: Password file not
found: /etc/cassandra/jmxremote.password

还需要进一步阅读关于jmx的文档。


配置2

IP 172.16.37.18
seeds: "172.16.37.18"
listen_address: c37b18
rpc_address: 0.0.0.0
broadcast_rpc_address: 172.16.37.18
endpoint_snitch: GossipingPropertyFileSnitch

IP 172.16.37.17
seeds: "172.16.37.18"
listen_address: c37b17
rpc_address: 0.0.0.0
broadcast_rpc_address: 172.16.37.17
endpoint_snitch: GossipingPropertyFileSnitch

IP 172.16.37.19
seeds: "172.16.37.18"
listen_address: c37b19
rpc_address: 0.0.0.0
broadcast_rpc_address: 172.16.37.19
endpoint_snitch: GossipingPropertyFileSnitch

.17节点上的cassandra-driver测试程序运行结果
[Row(job_id=1, event_time=2, idx=0)]
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: