您的位置:首页 > 其它

In-memory Computing with SAP HANA读书笔记 - 第五章:Lenovo System x solutions for SAP HANA

2016-05-15 11:51 609 查看
本文为In-memory Computing with SAP HANA on Lenovo X6 Systems第五章Lenovo System x solutions for SAP HANA的读书笔记。



本章涉及到Lenovo提供的HANA方案,涉及到服务器,一体机和存储,其中最重要的是GPFS存储,因此本章只保留了GPFS这一节,其余全部略过。

GPFS现在的新名字为IBM Spectrum Scale

IBM General Parallel File System

high-performance, shared-disk file management solution that can provide faster, more reliable access to a common set of file data. It enables a view of distributed data with a single global namespace.

GPFS是Cluster文件系统,并且最初是Shared-Disk文件系统,因此所有的服务器都可以同时访问同一文件。

CPFS后来又支持Shared Nothing架构,因此可以很好的横向扩展和并行处理。

Common GPFS features

由于是cluster文件系统,因此GPFS is designed for high-performance parallel workloads

GPFS configurations include direct-attached storage, network block I/O (or a combination of the two), and multi-site operations with synchronous data mirroring.

支持SAN, TCP/IP和Infiniband接口

支持Storage pools,有存储虚拟化的功能

支持snapshot,同步和异步复制和多路径访问,journal logging

GPFS extensions for shared-nothing architectures

共享存储的缺点是单点故障,对于高并发并行处理容易出现I/O瓶颈

GPFS File Placement Optimizer (GPFS FPO)支持shared-nothing架构,类似于Hadoop的GPFS。

这样每一服务器使用自己的本地存储,实现工作负载本地化以及很好扩展性,同时通过复制提供冗余。

Scaling-out SAP HANA that uses GPFS

Lenovo System x Solution for SAP HANA supports a scale-out approach (that is, combining several systems into a clustered solution, which represents a single SAP HANA instance). An SAP HANA system can span multiple servers, partitioning the data to hold and process larger amounts of data than a single server can accommodate.

The scale-out solution is a cluster of servers, which are interconnected with two separate 10 Gb Ethernet networks, one for the SAP HANA application and one for the shared GPFS file system communication. Both networks are redundant.

这里所说的scale-out其实就是使用本地存储的GPFS FPO。

Lenovo的HANA横向扩展方案有以下特性:

The SAP HANA database is split into partitions on each cluster node, which forms a single instance of the SAP HANA database.

这说明HANA数据库可以分区,但对外是一个实例

Each node of the cluster holds its own savepoints and database logs on the local storage devices of the server.

每一个节点有自己的数据和日志,做本地化处理。可以理解为是一个数据库。

The GPFS file system is a shared file system. Because GPFS spans all nodes of the cluster, it makes the data of each node available to all other nodes in the cluster despite the use of local storage devices only

尽管是使用本地存储,但形成的确实全局共享存储。由于使用了本地存储,因此可以通过添加服务器增加处理能力和存储,可以很好的横向扩展。

在横向扩展方案中,节点分为工作节点和备用节点,描述如下:

The node can be a worker node or a standby node. Worker nodes actively process workload. Standby nodes are part of the cluster only and do not process workload while the cluster remains in a healthy state. Standby nodes take over the role of a worker node when it fails. Standby nodes are required for scale-out clusters with high availability.

Scale-out solution without high-availability capabilities

所谓没有高可用特性,是指本配置中没有备用节点,都是工作节点。

不能防止数据丢失,但具有可扩展性。

整个集群对外是一个实例,一个文件系统。内部,每个节点都有自己的数据和日志,实现工作负载分布处理。



Scale-out solution with high-availability capabilities

通过两点提供高可用性:

1. 增加备用节点用于接管

2. 由于接管需要数据(savepoint和日志),因此增加了数据复制



Replication is done in a striping fashion. Every node has a piece of data of all other nodes. In Figure 5-41, the contents of the data storage (that is, the savepoints, here data01) and the log storage (that is, the database logs, here log01) of node01 are replicated to node02, node03, and node04.

增加了数据复制后,数据具有两份拷贝。本地存完整的一份,另一份以stripe的形式分布在其余节点,以保证复制性能。恢复时通过这些节点上的数据重新组合出数据。

同步复制:

Replication occurs synchronously. The write operation finishes only when the data is written locally and on a remote node.

FPO可以保证先读取本地存储,以保证性能。

下面是一个错误恢复的例子,node04接管了失效的node03.



The data that node04 must load into memory is the data of node03 (which failed), including its local storage devices. For that reason, GPFS had to deliver the data to node04 from the second replica, which is spread across the cluster. GPFS handles this process transparently so that the application does not recognize from which node the data was read. If data is available locally, GPFS prefers to read from node04 and avoid going over the network.

一方面在内存中重构数据满足I/O请求,另一方面,在后台重构本地化的存储数据,以实现I/O本地化。重构后再将第二份拷贝分布到其余节点,以防止二次故障。

最终恢复后如下图,node03成了新的standby node



总结

HANA的横向扩展需要集群文件系统,Lenovo用的是GPFS(IBM的),HP用的方案是NFS,HUAWEI用的方案是XFS(以前SGI的)或OCFS(oracle的)。

还有一点需要说明,GPFS不但为集群文件系统,而且存储的架构也是分布式的可横向扩展的,其它如 HP 和HUWAWEI的方案的存储是集中式的,因此从存储层的可扩展性而言,GPFS应优于OCFS,XFS和NFS。

同时HANA本身也应该能实现分区,在https://blogs.saphana.com/2014/12/10/sap-hana-scale-scale-hardware/中有以下一段话:

Note that in a scale-out environment, data has to be distributed amongst the nodes. SAP BW does a great job of this – striping big fact tables across multiple nodes, and residing dimension tables together in a single node. It uses one “master” node for configuration tables. All in all, this does an excellent job of dealing with the major disadvantage of scale-out: the cost of intra-node network traffic for temporary datasets.

For custom data-marts, you will have to partition your own data, which isn’t a big deal, but does require a HANA expert. A good HANA consultant can define a suitable partitioning strategy in a very short period of time.

这算不算是HANA比较适合分析用的BW,而不适合于OLTP?

分区实际上表的分区。

GPFS+HANA原理上非常类似ASM + Oracle RAC,不过后者更通用,前者是专用的。

参考

http://www.ibm.com/support/knowledgecenter/SSFKCN/gpfs_welcome.html

http://www.lenovo.com/images/products/system-x/pdfs/solution-briefs/sap_hana_tdi_config_guide_wp.pdf

https://blogs.saphana.com/2014/12/10/sap-hana-scale-scale-hardware/

http://www.slideshare.net/SAP_Nederland/sap-innovation-forum-28febr-ibmsap-hana-scalability

http://scn.sap.com/docs/DOC-59822

http://www8.hp.com/h20195/V2/getpdf.aspx/4AA5-1437ENW.pdf?ver=2.0
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: