您的位置：首页 > 编程语言

Clojure二分查找

2016-07-08 18:36 323 查看

场景

clojure项目的某个接口需要实现在大量数据，比如十万条数据中查找出某一个数据。

比如：@price-data的结构如下

[{:a 1 :b 2 :c 3}…(一共10万条)]

查找出关键字（:a=X :b=y :c=z）的项。

我原来使用clojure.set提供的函数select来完成这个工作，当然也可以使用clojure.core的过滤函数filter来完成这个工作。

set/select

Usage: (select pred xset) Returns a set of the elements for which pred

is true Added in Clojure version 1.0

来自 https://clojure.github.io/clojure/clojure.set-api.html#clojure.set/select

看着像是SQL的select，但是这里的select是在内存中检索数据。

(require ‘[clojure.set :as set])

(set/select #(and (= (:a %) x)

(= (:b %) y)

(= (:c %) z)

) (set @price-data))

select的源代码：

(defn select
"Returns a set of the elements for which pred is true"
{:added "1.0"}
[pred xset]
(reduce (fn [s k] (if (pred k) s (disj s k)))
xset xset))

select的实现就是使用一个reduce函数，第0次的初始值s是xset,然后会遍历这个xset中的每一个元素k，(fn [s k] (if (pred k) s (disj s k)))，，如果k满足条件pred,那么就将s作为新的结果返回，否则s集合中去掉k这个元素返回，这个新的结果就是下一轮的s,xset中的next元素就是下一个k,依次类推。

(reduce f coll) • (reduce f val coll) f should be a function of 2

arguments. If val is not supplied, returns the result of applying f to the first 2 items in coll, then applying f to that result and the 3rd

item, etc. If coll contains no items, f must accept no arguments as

well, and reduce returns the result of calling f with no arguments.

If coll has only 1 item, it is returned and f is not called. If val

is supplied, returns the result of applying f to val and the first

item in coll, then applying f to that result and the 2nd item, etc. If

coll contains no items, returns val and f is not called.—redeuce的用法

(disj set) • (disj set key) • (disj set key & ks) .

Returns a new set of thesame (hashed/sorted) type, that does not contain key(s).—disj的用法

select的实现和filter的实现其实差不多，都要遍历一次xset，时间复杂度是O(n)。发现这个过程很慢，一次查询大概要15-20ms，因为查询的次数也比较多，比如3000次，这个总的时间基本就是在5s左右，这么慢的速度已经不能忍了。

二分查找的实现

因为select查找大量数据的速度不能满足要求，于是决定对select进行一定的改进。

其主要思想：通过对已经排好序的数组，进行数据指针的比较。

二分查找是基于排好序的算法，复杂度低，二分查找的时间复杂度O(logN)。

(defn bsearch
"the-keyword是一个字符串，可以考虑转化为一个keyword,the-map-list是排序好的list"
[the-keyword the-key the-map-list]
(let [lenth (count the-map-list)
_ (log/info "要找的值是：" the-key)
the-map-list (vec the-map-list)]
(loop [l 0
r (- lenth 1)
idx (math/floor (/ (+ l r) 2))]
(do
(log/info  "现在的情况，l[" l "],r[" r "],中间idx[" idx "],值"  (get the-map-list idx))
(if (or (>= l r) (>= idx  (- lenth 1)) (<= idx 0) (= (compare the-key ((keyword the-keyword) (get the-map-list idx))) 0))
;找到并且结束
(do (log/info "找到的是" idx)
(get the-map-list idx))
(if (> (compare the-key ((keyword the-keyword) (get the-map-list idx))) 0)
(recur (+ 1 idx) r (math/floor (/ (+ 1 idx r) 2)))
(recur l (- idx 1) (math/floor (/ (+ l (- idx 1)) 2))))))
)))

clojure有compare这个api来比较字符串的大小,相当于java提供的compareTo函数。

这个函数的是排序好的数据list。

虽然最后也没有采用这个函数来查找相关的内容，而是采用了数据同步的方法来得到相关的数据，但是这个函数还是性能要好很多的。

算法已经完成，测试结果是使用select大概13s,而二分查找的实现大概5ms。这速度上确实进步了不少。

结语

这个函数实现的是对于只有一个比较key的map，如果是多个key呢，像本文最开始的前提那样。有一个比较简单的做法，就是将三个key的内容拼接在一起作为一个key来比较大小，然后排序查找。

还有一点需要改进的是，查找到的可能是一堆数据，而不是一个数据，这个也稍微处理下就好。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 集合操作 clojure编程二分查找大量数据处理

相关文章推荐

新的分享

章节导航