您的位置：首页 > 其它

更改内存分配策略改善归并排序效率

2015-03-30 18:48 267 查看

归并排序是一种相当稳健的排序算法，无论何种输入序列，其期望时间复杂度和最坏时间复杂度都是Θ(nlogn),这已经达到了基于比较排序算法的渐进下界。

因此归并排序时常会用于对可能导致quicksort退化的序列排序。

归并排序是典型的分治算法，一个最常见的实现如下：

[code]void mergeSort(int a[], const int low, const int high) {
    if (low < high) {
        const int midIndex = (low + high) >> 1;
        // take apart
        MergeSort(a, low, midIndex);
        MergeSort(a, midIndex + 1, high);
        // merge two parts
        Merge(a, low, midIndex + 1, high);
    }
    return ;
}

void Merge(int a[], const int lowFirst, const int highFirst, const int highLast) {
    int l = lowFirst;
    int r = highFirst;
    int len = highLast - lowFirst + 1;

    int* pBuffer = new int[len];
    assert(pBuffer != NULL);
    int* p = pBuffer;

    while (l < highFirst && r <= highLast) {
        *p++ = (a[l] <= a[r]) ? a[l++] : a[r++];
    }

    while (l < highFirst) {
        *p++ = a[l++];
    }
    while (r <= highLast) {
        *p++ = a[r++];
    }

    p = pBuffer;
    for (int i = lowFirst; i <= highLast;) {
        a[i++] = *p++;// special case: i = 10, p[0]
    }

    delete [] pBuffer;
}

但是在实践中，归并排序花费的时间往往超过预期，对于普通的序列而言，所花费的时间甚至远远超过quicksort。

究其原因，和归并排序的内存策略有关（不断地分配new与释放free内存）。

归并排序不是原地排序，需要额外的存储空间。并且在每次merge过程中，需要动态分配一块内存以完成对两个数据堆的排序合并。并且排序完毕之后，我们需要将存储空间中的数据复制并覆盖原序列。

最后一步操作是由归并排序自身性质决定，无法优化，所以我们只能针对Merge操作。

经过分析很容易知道，对于长度为n的序列，要执行logn次的merge操作，这意味着需要进行logn次的内存分配和回收。内存操作开销较大。

如果能够一次性分配长度为n的存储空间，那么就省掉了大量的分配操作，可以极大提高效率。

由于归并的分治特性，我们需要在原来的函数基础之上，包装一层驱动函数(driver function)

[code]// driver function
void _mergeSort(int a[], const int count) {
    // allocation only once
    int* pTmpBuf = new int[count];
    assert(pTmpBuf != nullptr);
    _MSort(a, pTmpBuf, 0, count - 1);
    delete [] pTmpBuf;
    return ;
}

// devide the sequence recuresively 
void _MSort(int a[], int tmpBuffer[]/*extra space*/, const int left, const int right) {
    if (left < right) {
        const int midIdx = (left + right) >> 1;
        _MSort(a, tmpBuffer, left, midIdx);
        _MSort(a, tmpBuffer, midIdx + 1, right);

        _Merge(a, tmpBuffer, left, midIdx + 1, right);
    }
    return ;
}

// merge two parts
void _Merge(int a[], int tmpBuffer[], const int lBegin, const int rBegin, const int rEnd) {
    int l = lBegin;
    int r = rBegin;
    int bufPos = lBegin;

    while (l < rBegin && r <= rEnd) {
        tmpBuffer[bufPos++] = a[l] < a[r] ? a[l++] : a[r++];
    }

    while (l < rBegin) {
        tmpBuffer[bufPos++] = a[l++];
    }

    while (r <= rEnd) {
        tmpBuffer[bufPos++] = a[r++];
    }

    for (bufPos = lBegin; bufPos <= rEnd; ++bufPos) {
        a[bufPos] = tmpBuffer[bufPos];
    }
    return ;
}

为了检验性能提升，笔者对100，1000，10000，100000的数据规模分别进行测试，每次测试排序运行100次，得到如下数据表和图示。

ps：为了减少干扰，以上测试均在Release下进行。

经过测试发现，改进后的算法基本上比原始的要快30～50倍。至于表格中1000测试那行，猜测和缓存命中有关。多次测试后，基本也快出30-40倍。

以上改进可以得到一个结论：对于需要频繁分配内存的算法而言，一次性分配或者采用lazy-deletion以提高复用的策略可以大幅提高算法效率。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航