Cassandra & HBASE

7/31/2016

HBASE:

Wide-column store based on Apache Hadoop and on concepts of BigTable.
Apache HBase is a NoSQL key/value store which runs on top of HDFS
Unlike Hive, HBase operations run in real-time on its database rather than MapReduce jobs
HBase is partitioned to tables, and tables are further split into column families.
Column Families in a schema have all columns together
Each Key/Value pair remain as a cell
Each key : key consists of row-key, column family, column, and time-stamp
A row in HBase is a grouping of key/value mappings identified by the row-key.
It can scalable horizontal
Versioning available : 3
Does four operations : put to add or update rows, scan to retrieve a range of cells, return cells for a specified row, and delete to remove rows, columns or column versions from the table.
Schema have tables and column families
Custom Queries FOR OPERATION , Using Phoenix can be SQL type operation
Operation through ZOOKEEPER for controlling the operation Master Server , Region Server etc.
Master server monitors the all region servers, keeps all metadata changes and maintenance
It uses for CAP (CA:Consistency and Availability )
Optimize for read , Single Write master
It can use the Range base scan which support ordered based scan and can be use during the horizontal scalability
Does not support secondary Index : But this can be achieved by trigger on "put up to date the

secondary index

Hbase Co Processors support out-of-the-box simple aggregations in HBase. SUM, MIN, MAX, AVG, STD. Other aggregations can be built by defining java-classes to perform the aggregation
Good for real time analytics and massive data processing
User: Facebook

Cassandra:

Wide-column store based on ideas of BigTable and DynamoDB
Cassandra has decentralized architecture. Any node can perform any operation. It provides AP(Availability,Partition-Tolerance) from CAP theorem.
Cassandra has excellent single-row read performance
Cassandra does not support Range based row-scans
Cassandra is well suited for supporting single-row queries, or selecting multiple rows based on a Column-Value index.
Practical limitation of a row size in Cassandra is 10's of Megabytes,If data is stored in columns in Cassandra to support range scans
Rows larger than that causes problems with compaction overhead and time.
Cassandra supports secondary indexes on column families .Where column name is available not on the dynamic column
Aggregations in Cassandra are not supported by the Cassandra nodes - client must provide aggregations
For Multiple row aggregation spans multiple rows, Random Partitioning makes aggregations very difficult . In this case Storm or Hadoop for aggregations
User :Twitter
Good for Logfiles processing
Symmetric architecture makes it relatively easy to create and scale large clusters
SQL-like Cassandra Query Language eases developers' transition from RDBMS
Allows you to tune for performance or consistency or a balance of both
Community edition of management GUI available
Good documentation (provided by Datastax)

Leave a Reply.