您的位置:首页 > 编程语言 > Go语言

《Thinking in Algorithm》12.详解十一种排序算法

2014-04-14 23:13 375 查看
分类: Thinking in Algorithm2014-04-10
01:32 3402人阅读 评论(67) 收藏 举报

排序算法堆排序冒泡排序快速排序希尔排序

目录(?)[+]

排序算法在算法中占着很重要的地位,很多算法的实现都是基于排序算法的(如搜索算法和合并算法)。所以排序算法也是笔试面试中必考内容。但是不管他怎么考,也就是那几种算法,一般不会超出我接下来要讲的这11种,所以只要认真的掌握着11中就足够了。

那么是哪11种呢,下面是wiki上总结的11种

1 Simple sorts

1.1 Insertion sort(插入排序)
1.2 Selection sort(选择排序)

2 Efficient
sorts

2.1 Merge sort (归并排序)
2.2 Heapsort
(堆排序)

2.3 Quicksort (快速排序)

3 Bubble sort and variants

3.1 Bubble
sort (冒泡排序)
3.2 Shell
sort (希尔排序)
3.3 Comb
sort (梳排序)

4 Distribution sort

4.1 Counting
sort(计数排序)
4.2 Bucket
sort(桶排序)
4.3 Radix
sort(基数排序)

下面我就对这11中算法逐个详解。

讲之前先了解几组概念。

排序算法的稳定性?

排序算法可以根据稳定性分为两种:稳定和非稳定算法。那么怎么区分它们?如果链表中存在两个相同元素,稳定排序算法可以在排序之后保持他两原来的次序,而非稳定性的则不能保证。如下图








An example of stable sorting on playing cards. When the cards are sorted by rank with a stable sort, the two 5s must remain in the same order in the sorted output that they were originally in. When they are sorted with a non-stable sort, the 5s may end up in
the opposite order in the sorted output.
算法稳定性的好处:排序算法如果是稳定的,那么从一个键上排序,然后再从另一个键上排序,第一个键排序的结果可以为第二个键排序所用。基数排序就是这样,先按低位排序,逐次按高位排序,低位相同的元素其顺序再高位也相同时是不会改变的。

各种算法的比较

下图中是所有的比较排序算法,从图中我们可以看出比较算法的特性:比较排序有很多性能上的根本限制,如在最差情况下,任何一种比较排序至少需要O(nlogn)比较操作

Comparison sorts
NameBestAverageWorstMemoryStableMethodOther notes
Quicksort



on average, worst case is

;
Sedgewick variation is

worst case
typical in-place sort is not stable; stable versions existPartitioningQuicksort is usually done in place with O(log n) stack space.[citation
needed] Most implementations are unstable, as stable in-place partitioning is more complex. Naïve variants
use an O(n) space array to store the partition.[citation
needed] Quicksort variant using three-way (fat) partitioning takes O(n) comparisons when sorting an array of equal keys.
Merge sort



worst case
YesMergingHighly parallelizable (up
to O(log n) using the Three Hungarian's Algorithm[clarification
needed] or, more practically, Cole's parallel merge sort) for processing large amounts of data.
In-place
merge sort


YesMergingCan be implemented as a stable sort based on stable in-place merging.[2]
Heapsort



NoSelection
Insertion sort



YesInsertionO(n + d),[clarification
needed] where d is the number ofinversions.
Introsort



NoPartitioning & SelectionUsed in several STL implementations.
Selection sort



NoSelectionStable with O(n) extra space, for example using lists.[3]
Timsort



YesInsertion & MergingMakes n comparisons when the data is already sorted or reverse sorted.
Shell sort


or


Depends on gap sequence;

best known is


NoInsertionSmall code size, no use of call stack, reasonably fast, useful where memory is at a premium such as embedded and older mainframe applications.
Bubble sort



YesExchangingTiny code size.
Binary tree sort



YesInsertionWhen using a self-balancing
binary search tree.
Cycle sort


NoInsertionIn-place with theoretically optimal number of writes.
Library sort


YesInsertion
Patience sorting

NoInsertion & SelectionFinds all the longest
increasing subsequences inO(n log n).
Smoothsort



NoSelectionAn adaptive sort:

comparisons
when the data is already sorted, and 0 swaps.
Strand sort



YesSelection
Tournament sort


[4]
?Selection
Cocktail sort



YesExchanging
Comb sort



NoExchangingSmall code size.
Gnome sort



YesExchangingTiny code size.
UnShuffle
Sort[5]



In place for linked lists. N*sizeof(link) for array.Can be made stable by appending the input order to the key.Distribution and MergeNo exchanges are performed. Performance is independent of data size. The constant 'k' is proportional to the entropy in the input. K = 1 for ordered or ordered by reversed input so runtime
is equivalent to checking the order O(N).
Franceschini's method[6]


Yes?
Block
sort[7]




YesInsertion & MergingCombine a block-based O(n) in-place merge algorithm[8] with
a bottom-up merge sort. Turns into a full-speed merge sort if additional memory
is optionally provided to it.
下面是整数排序算法和其他非比较排序算法

他们并没有

的限制。

Non-comparison sorts
NameBestAverageWorstMemoryStablen << 2kNotes
Pigeonhole sort


YesYes
Bucket sort (uniform keys)


YesNoAssumes uniform distribution of elements from the domain in the array.[9]
Bucket sort (integer keys)


YesYesIf r is O(n), then Average is O(n).[10]
Counting sort


YesYesIf r is O(n), then Average is O(n).[9]
LSD Radix
Sort



YesNo[9][10]
MSD Radix
Sort



YesNoStable version uses an external array of size n to hold all of the bins.
MSD Radix
Sort (in-place)



NoNo
recursion
levels, 2d for count array.
Spreadsort


NoNoAsymptotics are based on the assumption that n << 2k, but the algorithm does
not require this.
对这些算法进行比较之后,我们会发现排序算法可真多,但在实际应用中也就几种。在少量数据的情况下我们一般会用到插入排序算法,但在大量数据的集合下,我们就会用到堆排序,归并排序,或者快速排序等。对于更多受限制的数据,例如固定间隔的数,则分布排序(计数排序,基数排序)被广泛用到。冒泡排序在实际中很少用到,不过在教学中倒很常见。

维基百科中将那11种算法分为了4种,1.简单排序 2.有效排序 3.冒泡和变体 4.分配排序

1. 简单排序类别

----------------------------------------------

有两种简单排序算法分别是插入排序和选择排序,两个都是数据量小时效率高。实际中插入排序一般快于选择排序,由于更少的比较和在有差不多有序的集合表现更好的性能。但是选择排序用到更少的写操作,所以当写操作是一个限制因素时它被使用到。

1.1 插入排序算法

-------------------------------------------------





Graphical illustration of insertion sort
ClassSorting algorithm
Data structureArray
Worst case performanceО(n2) comparisons, swaps
Best case performanceO(n) comparisons, O(1) swaps
Average case performanceО(n2) comparisons, swaps
Worst case space complexityО(n) total, O(1) auxiliary
常常被用作为复杂算法的一部分,希尔排序是插入排序的一种变体对数据大时更有效。

优点:

实现简单
对于少量数据效率高
对于差不多已经排好顺序的集合效率高,时间复杂度为O(n+d),d是错位数字的个数
比起其他简单二次(O(n^2))算法(选择排序,冒泡排序),他的最好的情况是O(n)(集合接近顺序排好)
稳定,不会改变相等数原有的顺序
in-place,只需要常熟O(1)的额外内存空间

关于的算法流程可以看下图







A graphical example of insertion sort.
从上图中可以看出,就是逐个向右给每一个元素向前面的序列排序。
又如下列数组的变化

3 7 4 9 5 2 6 1

3 7 4 9 5 2 6 1

3 7 4 9 5 2 6 1

3 4 7 9 5 2 6 1

3 4 7 9 5 2 6 1

3 4 5 7 9 2 6 1

2 3 4 5 7 9 6 1

2 3 4 5 6 7 9 1

1 2 3 4 5 6 7 9

算法伪代码实现

[html] view
plaincopyprint?





for i ← 1 to length(A)

j ← i

while j > 0 and A[j-1] > A[j]

swap A[j] and A[j-1]

j ← j - 1

最好,最坏,平均情况
最好:已经排好顺序的集合,这样只需要线性时间即遍历一次集合,每次只需要比较当前元素与前一个元素的大小问题,时间复杂度O(n)
最坏:即刚好与所要的顺序相反,时间复杂度为O(n^2)
平均:时间复杂度也是O(n^2)

1.2 选择排序算法

------------------------------------------------




ClassSorting algorithm
Data structureArray
Worst case performanceО(n2)
Best case performanceО(n2)
Average case performanceО(n2)
Worst case space complexityО(n) total, O(1) auxiliary
插入排序最坏,最好,平均情况都是O(n^2).

算法:将数组分为两部分,一部分是已经排好顺序的,另一部分是未排序的。每次找数组后半部分中最小的一个元素排到前面的序列。如下列数组

64 25 12 22 11

11 25 12 22 64

11 12 25 22 64

11 12 22 25 64

11 12 22 25 64


代码如下

[java] view
plaincopyprint?





int i,j;

int iMin;

for (j = 0; j < n-1; j++) {

iMin = j;

for ( i = j+1; i < n; i++){

if (a[i] < a[iMin]){

iMin = i;

}

}

if ( iMin != j ){

swap(a[j], a[iMin]);

}

}

由此可知时间复杂度为(n − 1) + (n − 2) + ... + 2 + 1
= n(n − 1) / 2 ∈ Θ(n2)
,不管任何情况

2. 有效算法

----------------------------------------------

2.1 归并排序算法

算法逻辑:1. 将列表分为n个子列表,每一个列表只包含一个元素 2. 反复地归并子列表成一个新的有序列表,知道只剩下一个子列表





An example of merge sort. First divide the list into the smallest unit (1 element), then compare each element with the adjacent list to sort and merge the two adjacent lists. Finally all the elements are sorted and merged.
ClassSorting algorithm
Data structureArray
Worst case performanceO(n log n)
Best case performanceO(n log n) typical,
O(n) natural variant
Average case performanceO(n log n)
Worst case space complexityO(n) auxiliary







Merge sort animation. The sorted elements are represented by dots.

自上而下实现:








A recursive merge sort algorithm used to sort an array of 7 integer values. These are the steps a human would take to emulate merge sort (top-down).

[cpp] view
plaincopyprint?





TopDownMergeSort(A[], B[], n)

{

TopDownSplitMerge(A, 0, n, B);

}

CopyArray(B[], iBegin, iEnd, A[])

{

for(k = iBegin; k < iEnd; k++)

A[k] = B[k];

}

// iBegin is inclusive; iEnd is exclusive (A[iEnd] is not in the set)

TopDownSplitMerge(A[], iBegin, iEnd, B[])

{

if(iEnd - iBegin < 2) // if run size == 1

return; // consider it sorted

// recursively split runs into two halves until run size == 1,

// then merge them and return back up the call chain

iMiddle = (iEnd + iBegin) / 2; // iMiddle = mid point

TopDownSplitMerge(A, iBegin, iMiddle, B); // split / merge left half

TopDownSplitMerge(A, iMiddle, iEnd, B); // split / merge right half

TopDownMerge(A, iBegin, iMiddle, iEnd, B); // merge the two half runs

CopyArray(B, iBegin, iEnd, A); // copy the merged runs back to A

}

// left half is A[iBegin :iMiddle-1]

// right half is A[iMiddle:iEnd-1 ]

TopDownMerge(A[], iBegin, iMiddle, iEnd, B[])

{

i0 = iBegin, i1 = iMiddle;

// While there are elements in the left or right runs

for (j = iBegin; j < iEnd; j++) {

// If left run head exists and is <= existing right run head.

if (i0 < iMiddle && (i1 >= iEnd || A[i0] <= A[i1]))

B[j] = A[i0];

i0 = i0 + 1;

else

B[j] = A[i1];

i1 = i1 + 1; }

}

自下而上实现:

[cpp] view
plaincopyprint?





/* array A[] has the items to sort; array B[] is a work array */

BottomUpSort(int n, int A[], int B[])

{

int width;

/* Each 1-element run in A is already "sorted". */

/* Make successively longer sorted runs of length 2, 4, 8, 16... until whole array is sorted. */

for (width = 1; width < n; width = 2 * width)

{

int i;

/* Array A is full of runs of length width. */

for (i = 0; i < n; i = i + 2 * width)

{

/* Merge two runs: A[i:i+width-1] and A[i+width:i+2*width-1] to B[] */

/* or copy A[i:n-1] to B[] ( if(i+width >= n) ) */

BottomUpMerge(A, i, min(i+width, n), min(i+2*width, n), B);

}

CopyArray(A, B, n);

}

}

BottomUpMerge(int A[], int iLeft, int iRight, int iEnd, int B[])

{

int i0 = iLeft;

int i1 = iRight;

int j;

for (j = iLeft; j < iEnd; j++)

{

if (i0 < iRight && (i1 >= iEnd || A[i0] <= A[i1]))

{

B[j] = A[i0];

i0 = i0 + 1;

}

else

{

B[j] = A[i1];

i1 = i1 + 1;

}

}

}

2.2 堆排序算法

----------------------------------------

堆排序利用的是数据结构-堆,首先你要对堆结构熟悉,详见:数据结构--堆

堆排序是选择排序种类的一部分,相对于基本的选择算法,它的提升是用到了对数时间优先队列(即堆)而不是线性时间搜索。尽管实际中它比完美实现的快速排序慢,但它有个优点就是最坏情况下时间复杂度是O(nlogn).堆排序是一种 in-place
algorithm,但不是稳定的排序。




A run of the heapsort algorithm sorting an array of randomly permuted values. In the first stage of the algorithm the array elements are reordered to satisfy the heap
property. Before the actual sorting takes place, the heap tree structure is shown briefly for illustration.
ClassSorting algorithm
Data structureArray
Worst case performance
Best case performance
[1]
Average case performance
Worst case space complexity
auxiliary
算法流程:

1. 建立一个最大或最小堆

2. 用根元素与最后一个元素交换位置,将根元素从堆中移除,堆大小减小1。

3. 修复堆,回到上一步,直到堆中不剩元素。

[html] view
plaincopyprint?





HEAPSORT(A)

1 BUILD-MAX-HEAP(A) //讲数组A转化为堆

2 for i ← length[A] downto 2

3 do exchange A[1] ↔ A[i] //根元素与最后一个元素交换位置

4 heap-size[A] ← heap-size[A] - 1 //数组大小减小1

5 MAX-HEAPIFY(A, 1) //修复替换掉根元素A[1]的堆

上面代码中用到的,BUILD-MAX-HEAP(A)和MAX-HEAPIFY(A, 1),我前面的博客数据结构--堆有详细介绍这两算法。

我们假设数组A开始元素顺序为{ 6, 5, 3, 1, 8, 7, 2, 4 },对它进行排序得到从小到大的顺序。

首先执行BUILD-MAX-HEAP(A),将数组A转化为最大堆。如下图:

1. Build the heap

Heapnewly added elementswap elements
nil6
65
6, 53
6, 5, 31
6, 5, 3, 18
6, 5, 3, 1, 85, 8
6, 8, 3, 1, 56, 8
8, 6, 3, 1, 57
8, 6, 3, 1, 5, 73, 7
8, 6, 7, 1, 5, 32
8, 6, 7, 1, 5, 3, 24
8, 6, 7, 1, 5, 3, 2, 41, 4
8, 6, 7, 4, 5, 3, 2, 1
然后我们在最大堆的基础上进行排序,执行伪代码中2-5的步骤。

2. Sorting.

Heapswap elementsdelete elementsorted arraydetails
8, 6, 7, 4, 5, 3, 2, 18, 1swap 8 and 1 in order to delete 8 from heap
1, 6, 7, 4, 5, 3, 2, 88delete 8 from heap and add to sorted array
1, 6, 7, 4, 5, 3, 21, 78swap 1 and 7 as they are not in order in the heap
7, 6, 1, 4, 5, 3, 21, 38swap 1 and 3 as they are not in order in the heap
7, 6, 3, 4, 5, 1, 27, 28swap 7 and 2 in order to delete 7 from heap
2, 6, 3, 4, 5, 1, 778delete 7 from heap and add to sorted array
2, 6, 3, 4, 5, 12, 67, 8swap 2 and 6 as they are not in order in the heap
6, 2, 3, 4, 5, 12, 57, 8swap 2 and 5 as they are not in order in the heap
6, 5, 3, 4, 2, 16, 17, 8swap 6 and 1 in order to delete 6 from heap
1, 5, 3, 4, 2, 667, 8delete 6 from heap and add to sorted array
1, 5, 3, 4, 21, 56, 7, 8swap 1 and 5 as they are not in order in the heap
5, 1, 3, 4, 21, 46, 7, 8swap 1 and 4 as they are not in order in the heap
5, 4, 3, 1, 25, 26, 7, 8swap 5 and 2 in order to delete 5 from heap
2, 4, 3, 1, 556, 7, 8delete 5 from heap and add to sorted array
2, 4, 3, 12, 45, 6, 7, 8swap 2 and 4 as they are not in order in the heap
4, 2, 3, 14, 15, 6, 7, 8swap 4 and 1 in order to delete 4 from heap
1, 2, 3, 445, 6, 7, 8delete 4 from heap and add to sorted array
1, 2, 31, 34, 5, 6, 7, 8swap 1 and 3 as they are not in order in the heap
3, 2, 13, 14, 5, 6, 7, 8swap 3 and 1 in order to delete 3 from heap
1, 2, 334, 5, 6, 7, 8delete 3 from heap and add to sorted array
1, 21, 23, 4, 5, 6, 7, 8swap 1 and 2 as they are not in order in the heap
2, 12, 13, 4, 5, 6, 7, 8swap 2 and 1 in order to delete 2 from heap
1, 223, 4, 5, 6, 7, 8delete 2 from heap and add to sorted array
112, 3, 4, 5, 6, 7, 8delete 1 from heap and add to sorted array
1, 2, 3, 4, 5, 6, 7, 8completed
如果你觉得还不够清楚的话,你可以看下列两图加深理解。








An example on heapsort.

下图中是堆最大堆进行排序的行为。



2.3 快速排序算法

--------------------------------------------------





Visualization of the quicksort algorithm. The horizontal lines are pivot values.
ClassSorting algorithm
Worst case performanceO(n2)
Best case performanceO(n log n) (simple partition)

or O(n) (three-way partition and equal keys)
Average case performanceO(n log n)
Worst case space complexityO(n) auxiliary (naive)

O(log n) auxiliary
快速排序(类似于归并算法)是一种分而治之算法。首先它将列表分为两个更小的子列表:一个大一个小。然后递归排序这些子列表。下面就用分而治之的方法来排序子数组A[p...r];

步骤:

Divide:从列表中取一个元素作为支点,将数组分为A[p‥q -
1] andA[q + 1‥r] ,A[p ‥ q -
1]中每一个元素都小于A[q] , 而A[q + 1 ‥ r]中每个元素都大于A[q].计算出支点实际存在数组中的位置,即q的值就是PARTITION操作。

Conquer:通过递归的方法对两个数组进行排序

Combine:因为子数组是原地处理的(即in-place),所以不需要合并他们,A[p....r]已经是排好序的。

下面是快速排序的过程伪代码:

[html] view
plaincopyprint?





<strong><span style="color:#cc33cc;">QUICKSORT(A, p, r)</span></strong>

1 if p < r

2 then q ← PARTITION(A, p, r)

3 QUICKSORT(A, p, q - 1)

4 QUICKSORT(A, q + 1, r)

对数组A进行排序,写为:QUICKSORT(A, 1, length[A])。

上面的代码中用到了PARTITION(A,p,r)操作,这个操作是快速排序的核心算法。 下面我们就针对它来详解。

首先看伪代码:

[html] view
plaincopyprint?





<strong><span style="color:#cc33cc;">PARTITION(A, p, r)</span></strong>

1 x ← A[r]

2 i ← p - 1

3 for j ← p to r - 1

4 do if A[j] ≤ x

5 then i ← i + 1

6 exchange A[i] ↔ A[j]

7 exchange A[i + 1] ↔ A[r]

8 return i + 1

关于上面PARTITION(A,p,r)操作的流程可以看下图



简单介绍下,i是两数组分隔的位置,而j是遍历时的索引。当找到小于A[r]的数时,则执行i++.

而PARTITION算法的返回值是i+1,即支点A[r]所处位置。

这里我们了解了什么是PARTITION操作,实质上他就是找到支点所处数组中的位置。

快速排序性能:

快速排序的性能取决于PARTITION操作,它是否是平衡操作,即能否将数组分为两个大小差不多的数组。如果他分配不均的话,就变成了插入排序。

最坏情况:

最坏的情况就是每次都是不平衡分配,使得一个拥有n-1个元素,一个则含有0个元素。我们知道PARTITION操作的时间复杂度为O(n),而不平均分配的递归会是

T(n) = T(n - 1) + T(0) + Θ(n)

= T(n - 1) + Θ(n)

这样会使得时间复杂度变为O(n^2).跟插入排序一样,而且插入排序在排好序的数组运行时间复杂度为O(n).

最好情况:

每次分配都是一个大小为n/2,另一个为n/2-1。这样递归式变为了

T (n)≤ 2T (n/2)
+Θ(n)

这样的话,时间复杂度就是O(nlgn)

平均情况:

时间复杂度也是O(nlgn).算法导论中有详细证明,这里我就举个例子说明下

假设每次分配都是9/10和1/10,那么递归式就是

T(n)≤T (9n/10)
+T (n/10) +O(n) 由下图我们可以知道时间复杂度依然是O(nlgn)



3. 冒泡排序和变体类别

--------------------------------------------------------

这种类别的算法在实际中很少使用到,因为效率低下,但在理论教学中常常提到。

3.1 冒泡排序

----------------------------------------------------------



ClassSorting algorithm
Data structureArray
Worst case performance
Best case performance
Average case performance
Worst case space complexity
auxiliary
冒泡排序效率非常低,效率还不如插入排序。数据量大时效率低,对于顺序颠倒的序列效率最低。

算法流程:简单概括就是每次找到序列中最大或最小的元素排到最后面去,循环知道每个元素都处于正确位置。如下图:








An example of bubble sort. Starting from the beginning of the list, compare every adjacent pair, swap their position if they are not in the right order (the latter one is smaller than the former one). After each iteration, one less element (the last one) is
needed to be compared until there are no more elements left to be compared.
代码如下

[html] view
plaincopyprint?





for(int x=0; x<n; x++)

{

for(int y=0; y<n-1; y++)

{

if(array[y]>array[y+1])

{

int temp = array[y+1];

array[y+1] = array[y];

array[y] = temp;

}

}

}








A bubble sort, a sorting algorithm that continuously steps through a list, swapping items
until they appear in the correct order. The list was plotted in a Cartesian coordinate system, with each point (x,y) indicating that the value y is stored at index x. Then the list would be sorted by Bubble sort according to every pixel's value. Note that
the largest end gets sorted first, with smaller elements taking longer to move to their correct positions.

3.2 希尔排序

-------------------------------------------------------------------

希尔排序是in-place算法,但不是稳定的。





Shellsort with gaps 23, 10, 4, 1 in action.
ClassSorting algorithm
Data structureArray
Worst case performanceO(n2)
Best case performanceO(n log n)
Average case performancedepends on gap sequence
Worst case space complexityО(n) total, O(1) auxiliary
希尔排序算法步骤:

先取一个小于n的整数d1作为第一个增量,把文件的全部记录分成d1个组。所有距离为dl的倍数的记录放在同一个组中。先在各组内进行直接插人排序;然后,取第二个增量d2<d1重复上述的分组和排序,直至所取的增量dt=1(dt<dt-l<…<d2<d1),即所有记录放在同一组中进行直接插入排序为止。

 该方法实质上是一种分组插入方法。



如上图我们去d1=5,d2=3,d3=1

d=5时,分组为 (a1, a6, a11), (a2, a7, a12), (a3, a8), (a4, a9), (a5, a10),对组内的元素进行分别插入排序,得到第二排数组

d=3时,分组为(a1, a4, a7, a10), (a2, a5, a8, a11), (a3, a6, a9, a12),对其分组插入排序,得到第三排数组。

d=1时,分组为 (a1,..., a12),进行插入排序,得到结果。

d的取值

这里你可能会有疑惑,那我们写程序的时候,对于d的值应该怎样取呢?

一般情况下,第一次去n/2,第二次取b/2/2....这样做的最坏时间复杂度为O(n^2).但wikipedia上有一些更优秀的取值,可以改善最坏情况下的时间复杂度,如下表:

General term (k ≥ 1)Concrete gapsWorst-case

time complexity
Author and year of publication



[when N=2p]
Shell, 1959[2]



Frank & Lazarus, 1960[6]



Hibbard, 1963[7]

, prefixed with 1


Papernov & Stasevich, 1965[8]
successive numbers of the form



Pratt,
1971[9]

, not greater than



Knuth, 1973[1]
还有些更复杂的取值,我这里就不列举了,平时写程序的时候,我看到大多数都是用的第一种,就是发明算法的这个人提出来的。

希尔排序的时间性能优于直接插入排序的原因

当文件初态基本有序时直接插入排序所需的比较和移动次数均较少。
当n值较小时,n和n2的差别也较小,即直接插入排序的最好时间复杂度O(n)和最坏时间复杂度0(n2)差别不大。
在希尔排序开始时增量较大,分组较多,每组的记录数目少,故各组内直接插入较快,后来增量di逐渐缩小,分组数逐渐减少,而各组的记录数目逐渐增多,但由于已经按di-1作为距离排过序,使文件较接近于有序状态,所以新的一趟排序过程也较快。

下面就用d=N/2^k 写代码

[cpp] view
plaincopyprint?





void shellsort2(int a[], int n)

{

int j, gap;

for (gap = n / 2; gap > 0; gap /= 2)

for (j = gap; j < n; j++)//从数组第gap个元素开始

if (a[j] < a[j - gap])//每个元素与自己组内的数据进行直接插入排序

{

int temp = a[j];

int k = j - gap;

while (k >= 0 && a[k] > temp)

{

a[k + gap] = a[k];

k -= gap;

}

a[k + gap] = temp;

}

}

从代码中可以看出shell排序就是里面嵌套一个插入排序,外面是分组。

3.3 梳排序

-----------------------------------------------------------



ClassSorting algorithm
Data structureArray
Worst case performance
[1]
Best case performance
Average case performance
, where p is
the number of increments[1]
Worst case space complexity
它是冒泡排序的一种变体,就像希尔排序一样,也是利用一个间隔值来堆其进行分组,只不过希尔排序内部嵌套的是插入排序,而梳排序嵌套的是冒泡排序。

为什么梳排序和希尔排序都可以通过分组来提高效率呢?
因为插入排序和冒泡排序有几个相似的点:1.当文件初态基本有序时时间复杂度为O(n). 2.数据量小时效率更好,因为最好情况n和最坏情况n^2相差不大。
就是因为上面两种特性,所以分组可以减少数据量,且分组的排序使得文件初态基本有序。

类似希尔排序取间隔的方法,只不过梳排序每次取间隔为n/1.3,下一次再除以1.3,知道间隔为1.看下面的例子。

假设待数组[8 4 3 7 6 5 2 1]

待排数组长度为8,而8÷1.3=6,则比较8和2,4和1,并做交换

[8 4 3 7 6 5 2 1]

[8 4 3 7 6 5 2 1]

交换后的结果为

[2 1 3 7 6 5 8 4]

第二次循环,更新间距为6÷1.3=4,比较2和6,1和5,3和8,7和4

[2 1 3 7 6 5 8 4]

[2 1 3 7 6 5 8 4]

[2 1 3 7 6 5 8 4]

[2 1 3 7 6 5 8 4]

只有7和4需要交换,交换后的结果为

[2 1 3 4 6 5 8 7]

第三次循环,更新距离为3,没有交换

第四次循环,更新距离为2,没有交换

第五次循环,更新距离为1,三处交换

[2 1 3 4 6 5 8 7]

[2 1 3 4 6 5 8 7]

[2 1 3 4 6 5 8 7]

三处交换后的结果为[1 2 3 4 5 6 7 8]

交换后排序结束,顺序输出即可得到[1 2 3 4 5 6 7 8]

实现代码:

[cpp] view
plaincopyprint?





void comb_sort(int *input, size_t size) {

const float shrink = 1.3f;

int swap;

size_t i, gap = size;

bool swapped = false;

while ((gap > 1) || swapped) {

if (gap > 1) {

gap = (size_t)((float)gap / shrink);

}

swapped = false;

for (i = 0; gap + i < size; ++i) {

if (input[i] - input[i + gap] > 0) {

swap = input[i];

input[i] = input[i + gap];

input[i + gap] = swap;

swapped = true;

}

}

}

}

4. 线性时间的排序

--------------------------------------------------

4.1 计数排序

---------------------------------------------

计数排序不是基于比较的排序算法,它的优势在于对一定范围内(小范围)的整数排序时,它的复杂度为O(n+k),快于任何比较算法。

算法步骤:

我们假设输入的数组A[1....n],数组长度length[A]=n。此时我们要实现它的排序还需要两个而外的数组。

数组B[1...n],用于存储排序后的结果

数组C[0...k],k代表着数组A的的元素都属于0--k

配合下图理解:



(a): 数组C分别记录数组A中0,2,3,5出现的次数。

(b): 对 a图中的数组C进行c[i] = c[i] + c[i-1];得出b图中的结果。

(c): 从数组A中取元素A[8]=3,因为3在数组C中记录显示,小于等于3的元素个数为7(包含他自己),所以将3存入有序数组B中的索引为7.

(d): 同理取出A[7]=0.

(e): 取出A[6]=3

...........

(f): 最后取出A[1]=2,存入对应的B中。得出最终结果。

实现的伪代码

[html] view
plaincopyprint?





COUNTING-SORT(A, B, k)

1 for i ← 0 to k

2 do C[i] ← 0

3 for j ← 1 to length[A]

4 do C[A[j]] ← C[A[j]] + 1

5 ▹ C[i] now contains the number of elements equal to i.

6 for i ← 1 to k

7 do C[i] ← C[i] + C[i - 1]

8 ▹ C[i] now contains the number of elements less than or equal to i.

9 for j ← length[A] downto 1

10 do B[C[A[j]]] ← A[j]

下面用java实现,是对上面的代码进行优化后的,就是减小了数组C的大小,原来的k<=max,而下面的代码是k<max-min+1

[java] view
plaincopyprint?





public class CountSort{

public static void main(String []args){

//排序的数组

int a[] = {100, 93, 97, 92, 96, 99, 92, 89, 93, 97, 90, 94, 92, 95};

int b[] = countSort(a);

for(int i : b){

System.out.print(i + " ");

}

System.out.println();

}

public static int[] countSort(int []a){

int b[] = new int[a.length];

int max = a[0], min = a[0];

for(int i : a){

if(i > max){

max = i;

}

if(i < min){

min = i;

}

}

//这里k的大小是要排序的数组中,元素大小的极值差+1

int k = max - min + 1;

int c[] = new int[k];

for(int i = 0; i < a.length; ++i){

c[a[i]-min] += 1;//优化过的地方,减小了数组c的大小

}

for(int i = 1; i < c.length; ++i){

c[i] = c[i] + c[i-1];

}

for(int i = a.length-1; i >= 0; --i){

b[--c[a[i]-min]] = a[i];//按存取的方式取出c的元素

}

return b;

}

}

4.2 桶排序

-----------------------------------------------------

ClassSorting algorithm
Data structureArray
Worst case performance
Average case performance
Worst case space complexity
算法步骤:

桶排序假设待排序的一组数统一的分布在一个范围中,并将这一范围划分成几个 子范围,也就是桶。
将待排序的一组数,分档规入这些子桶。并将桶中的数据进行排序。
将各个桶中的数据有序的合并起来。

仔细想一想,这是不是一种“分治”策略呢?再仔细想一想,计数排序是不是桶排序的 一种特化呢?

下面假设数组范围为(0,1),则利用桶排序的步骤如下图



伪代码:

[html] view
plaincopyprint?





BUCKET-SORT(A)

1 n ← length[A]

2 for i ← 1 to n

3 do insert A[i] into list B[⌊n A[i]⌋]

4 for i ← 0 to n - 1

5 do sort list B[i] with insertion sort

6 concatenate the lists B[0], B[1], . . ., B[n - 1] together in order

4.3 基数排序

---------------------------------------------------------

ClassSorting algorithm
Data structureArray
Worst case performance
Worst case space complexity
1 最低位优先(Least Significant Digit first)法,简称LSD法

算法流程:

Take the least significant digit (or group of bits, both being examples of radices)
of each key.
Group the keys based on that digit, but otherwise keep the original order of keys. (This is what makes the LSD radix sort a stable
sort).
Repeat the grouping process with each more significant digit.

The sort in step 2 is usually done using bucket sort or counting sort, which are efficient in this case since there are usually only a small number of digits.

这里我就不翻译了(英语比较捉急),直接用wikipedia上面的,怕翻译的捉急。因为我发现很多人民间版的定义都有错,不得不承认wiki是个好东西。我就简单说下大致意思:

1. 得到最低位的数值,即个位数的数值

2. 对提取出来的数值进行排序。(可以用计数排序或桶排序)

3. 重复上面的操作,知道遍历每一个数位。

实例:例如个位,个位都是[0-10)范围内的。先对他进行归类,把小的放上面,大的放下面,然后个位排好了,在来看10位,我们也这样把小的放上面,大的放下面,依次内推,直到最高位排好。那么不就排好了吗?我们只需要做d(基数个数)的循环就可以了。时间复杂度相当于O(d * n) 因为d为常量,例如5位数,d就是5.所以近似为O(n)的时间复杂度。这次自己写个案例:

最初的数据
排好个位的数据
排好十位的数据
排好百位的数据
981
981
725
129
387
753
129
387
753
955
753
456
129
725
955
725
955
456
456
753
725
387
981
955
456
129
387
981
又如下图:



c代码:

[cpp] view
plaincopyprint?





#include <stdio.h>

#define MAX 20

#define SHOWPASS

#define BASE 10

void print(int *a, int n)

{

int i;

for (i = 0; i < n; i++)

printf("%d\t", a[i]);

}

void radixsort(int *a, int n)

{

int i, b[MAX], m = a[0], exp = 1;

//Get the greatest value in the array a and assign it to m

for (i = 1; i < n; i++)

{

if (a[i] > m)

m = a[i];

}

//Loop until exp is bigger than the largest number

while (m / exp > 0)

{

int bucket[BASE] = { 0 };

//Count the number of keys that will go into each bucket

for (i = 0; i < n; i++)

bucket[(a[i] / exp) % BASE]++;

//Add the count of the previous buckets to acquire the indexes after the end of each bucket location in the array

for (i = 1; i < BASE; i++)

bucket[i] += bucket[i - 1];

//Starting at the end of the list, get the index corresponding to the a[i]'s key, decrement it, and use it to place a[i] into array b.

for (i = n - 1; i >= 0; i--)

b[--bucket[(a[i] / exp) % BASE]] = a[i];

//Copy array b to array a

for (i = 0; i < n; i++)

a[i] = b[i];

//Multiply exp by the BASE to get the next group of keys

exp *= BASE;

#ifdef SHOWPASS

printf("\nPASS : ");

print(a, n);

#endif

}

}

int main()

{

int arr[MAX];

int i, n;

printf("Enter total elements (n <= %d) : ", MAX);

scanf("%d", &n);

n = n < MAX ? n : MAX;

printf("Enter %d Elements : ", n);

for (i = 0; i < n; i++)

scanf("%d", &arr[i]);

printf("\nARRAY : ");

print(&arr[0], n);

radixsort(&arr[0], n);

printf("\nSORTED : ");

print(&arr[0], n);

printf("\n");

return 0;

}

最高位优先(Most significant digital)法,简称MSD法

算法步骤:

A recursively subdividing MSD radix sort algorithm works as follows:

Take the most significant digit of each key.
Sort the list of elements based on that digit, grouping elements with the same digit into one bucket.
Recursively sort each bucket, starting with the next digit to the right.
Concatenate the buckets together in order.

前第三步是遍历桶,这里我们换成处理桶。什么意思呢?如果从最高位开始,实际上已经能保证大体上是从小到大的递增序列了!但是位数相同时,就不一定了!实际上就是:桶外有序,而桶类无序!

这时候,就是递归的思想起作用了!既然桶外有序,我们就不管桶外了,关注处理桶内的数据。从次高位开始,再建立10个桶,然后把数据放到桶里,按第一次的方式来处理,直到处理到最低位!

MSL的代码稍微有点复杂,要用到递归!

[cpp] view
plaincopyprint?





#include<stdio.h>

#include<string.h>

#include<algorithm>

using namespace std;

struct Node{

int key;

struct Node *next;

Node(int _key){

key=_key;next=NULL;

}

};

void sort(int *a,int s,int n,int high){//把数组a中的数据[s,e)进行排序

Node *ibuck[10],*itail[10],*p;

int i,kth,low,num;

if(high==1)return;

low=high/10;

memset(ibuck,0,sizeof(ibuck));

for(i=s;i<s+n;i++){//往桶里扔

kth=(a[i]%high)/low;//取出序列中的数,根据位数放置到对应的桶中

p=new Node(a[i]);//创建新结点

//把数放到对应的桶中 这里一定要接到末尾,而不能从头结点插入

ibuck[kth]!=NULL ? itail[kth]->next=p,itail[kth]=p:ibuck[kth]=p,itail[kth]=p;

}

for(i=0;s<n;i++){//把桶中的数据放回数组中

num=0;

while(ibuck[i]!=NULL){

a[s++]=ibuck[i]->key;

num++;

p=ibuck[i],ibuck[i]=ibuck[i]->next,delete p;//收回动态开辟的空间

}

if(num>1)

sort(a,s-num,num,high/10); //这个地方我处理了好久

}

}

void base_sort_MSD(int *a,int n){

int Max,high,i;

for(Max=a[0],i=1;i<n;i++)Max=max(Max,a[i]);

for(high=1;Max/high>0;high*=10);

sort(a,0,n,high);

}

int main(){

int n=10;

int data[]={1000,50,80000,81000,3,26,467,6987,10953,2354};

base_sort_MSD(data,n);

for(int i=0;i<n;i++)

printf("%d ",data[i]);

}

我觉得这个人总结的不错,我后面也借鉴了一点他的。/article/4520680.html

留下一个问题:一个文件中存着大量的值在0-1精确到小数点后10位的书然后怎么排好?

好吧,这11种排序算法终于弄完了,不过还有很多没知识点没总结,如他们之间的区别等。这个我之后肯定会补上。现在感觉还理解的不透彻。毕竟刚刚深入的去研究,如果文中又发现错误的地方,希望提出来。

这篇博客中我尽量找的伪代码实现,除非那些比较难理解的伪代码,因为我觉得这是个理论系列,看完之后必须自己去实现一遍,不然等于白看,如果你觉得有些根据伪代码实现不了的,可以到网上找各种语言的代码。

最后推荐一个人的算法专栏:白话经典算法 。里面虽然讲的东西不多,但很多作者独到的见解,而且写的比较容易理解,不像我的。。嗨,以后努力改进。

参考资料:

1.算法导论

2.Sorting Algorithm (里面有对应的各个排序算法链接)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: