您的位置:首页 > 编程语言 > Go语言

Stanford - Algorithms: Design and Analysis, Part 1 - Week 6 Assignment: hash table and heap

2015-04-16 06:54 1621 查看
太久没有更新博客了,原来做的作业都没有写blog,现在印象已经有点模糊了,不过还是得贴出来,方便以后查阅。。

本次作业算是比较简单的一次。。。。

一共两道题,第一题题目要求如下:


Question 1

Download the text file here. (Right click and save link as).

The goal of this problem is to implement a variant of the 2-SUM algorithm (covered in the Week 6 lecture on hash table applications).
The file contains 1 million integers, both positive and negative (there might be some repetitions!).This is your array of integers, with the ith row
of the file specifying the ith entry
of the array.

Your task is to compute the number of target values t in
the interval [-10000,10000] (inclusive) such that there are distinctnumbers x,y in
the input file that satisfy x+y=t.
(NOTE: ensuring distinctness requires a one-line addition to the algorithm from lecture.)

Write your numeric answer (an integer between 0 and 20001) in the space provided.

OPTIONAL CHALLENGE: If this problem is too easy for you, try implementing your own hash table for it. For example, you could compare performance under the chaining and open addressing approaches
to resolving collisions.

本题的难度并不高,而且我并没有用optional challenge。。。我就是直接用了C++ STL里面的unordered_set表示hash table:

先大概说一下题目意思,课上有讲过,下面的截图可以说明题意:



我们要用的就是那个amazing solution。

具体实施步骤如下:

1, 将给定文件里面的所有元素全都插入到hash table中,根据hash table的性质,不会出现同样的元素:

void store_file(string filename) {
	ifstream infile;
	infile.open(filename, ios::in);
	long tmp;
	while (infile >> tmp) {
		hash_table.insert(tmp);
	}
	cout << "FINISH!" << endl;
	infile.close();
}
2,遍历-10000 到 10000所有的值,调用2sum函数,检测是否可以存在两个元素和为该遍历的值,

2sum函数就是像amazing solution里面说的:for each x in A, Look up t-x,实现如下:

bool two_sum(int target) {
	for (auto it = hash_table.begin(); it != hash_table.end(); ++it) {
		long tmp = target - *it;;
		auto res = hash_table.find(tmp);
		if (res != hash_table.end() and res != it)
			return true;
	}
	return false;
}


我印象中我的算法还是比较废时的,不太清楚是因为文件太大费时间,还是我的解法里面有问题。。不过,至少可以得到正确结果,下面是完整代码:

# include <iostream>
# include <fstream>
# include <unordered_set>

using namespace std;

void store_file(string);
bool two_sum(int);
long get_count(void);

unordered_set<long> hash_table;
const int LEFT = -10000;
const int RIGHT = 10000;

int main(int argc, char** argv) {

store_file("1_final_test.txt");

long cnt = get_count();

cout << cnt << endl;

return 0;
}

void store_file(string filename) { ifstream infile; infile.open(filename, ios::in); long tmp; while (infile >> tmp) { hash_table.insert(tmp); } cout << "FINISH!" << endl; infile.close(); }

bool two_sum(int target) { for (auto it = hash_table.begin(); it != hash_table.end(); ++it) { long tmp = target - *it;; auto res = hash_table.find(tmp); if (res != hash_table.end() and res != it) return true; } return false; }

long get_count(void) {
int cnt = 0;
for (int i = LEFT; i <= RIGHT; ++i) {
if (two_sum(i))
++cnt;
if (! (i%1000))
cout << "check: " << i << endl;
}
return cnt;
}


然后是第二题,题目要求如下:


Question 2

Download the text file here.

The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers,
arriving one by one. Letting xi denote
the ith
number of the file, the kth
median mk is
defined as the median of the numbers x1,…,xk.
(So, if k is
odd, then mk is ((k+1)/2)th
smallest number among x1,…,xk;
if k is
even, then mk is
the(k/2)th
smallest number among x1,…,xk.)

In the box below you should type the sum of these 10000 medians, modulo 10000 (i.e., only the last 4 digits). That is, you should compute (m1+m2+m3+⋯+m10000)mod10000.

OPTIONAL EXERCISE: Compare the performance achieved by heap-based and search-tree-based implementations of the algorithm.

同样,我没有理什么optional exercise,直接只用了heap-based的方法,

先上传一张课上讲解的图:



可能ppt里面讲的不够清楚,我就大概说说我的理解吧:

0,构建两个heap,一个是max heap,一个是min heap,max heap存储小于等于median的元素,而min heap存储大于等于median的元素,所以median只可能出现在两个位置,min heap的root或者max heap的root,代码如下:

<span style="white-space:pre">	</span>priority_queue<int, vector<int>, greater<int>> min_heap;
<span style="white-space:pre">	</span>priority_queue<int> max_heap;
1.1,最开始两个元素要特殊处理,min heap存放大的,max heap存放小的

/* for the first two elements, add smaller to max_heap on the left, and bigger
	 * one to the min_heap on the right */
	max_heap.push(min(input_stream[0], input_stream[1]));
	cnt += input_stream[0];
	min_heap.push(max(input_stream[0], input_stream[1]));
	cnt += max_heap.top();
1.2,后面的元素,比max heap的root小,放进max heap,否则,放进min heap,

当然,在这之后要做处理,不能让max heap 和min heap的size相差大于1,否则,median就不在min heap或者max heap的root里面了

/* step 1: add next item to one of the heaps: if next item is smaller than 
		 * max_heap root, add it to max_heap, else add it to min_heap */
		if (input_stream[i] < max_heap.top())
			max_heap.push(input_stream[i]);
		else
			min_heap.push(input_stream[i]);

		/* step 2: balance the heaps(after this step, heaps will be either balanced
		 * or one of them will contains one more item): if number of elements in one
		 * of the heaps is greater than other more than one, remove the root element
		 * from the one containing more elements and add to the other one */
		if (min_heap.size() > max_heap.size() and min_heap.size() - max_heap.size() > 1) {
			int tmp = min_heap.top();
			min_heap.pop();
			max_heap.push(tmp);
		}
		else if (max_heap.size() > min_heap.size() and max_heap.size() - min_heap.size() > 1) {
			int tmp = max_heap.top();
			max_heap.pop();
			min_heap.push(tmp);
		}
2,最后就是在min heap root和max heap root中选择哪个才是median:

/* if heaps contain even elements: median is max_heap's root, else median is
		 * root of heap with more elements */
		int val;
		if (i % 2)
			cnt += + max_heap.top();
		else {
			if (max_heap.size() > min_heap.size())
				cnt += + max_heap.top();
			else
				cnt += min_heap.top(); 
		}
		cnt %= MOD;


完整代码如下:
# include <iostream>
# include <vector>
# include <fstream>
# include <string>
# include <functional>
# include <queue>
# include <algorithm>

using namespace std;

void store_file(string); /* prototype */
int median_maintain(void);

const int MOD = 10000;
vector<int> input_stream;

int main(int argc, char** argv) {
store_file("Median.txt");
int sum = median_maintain();
cout << "result: " << sum << endl;
}

void store_file(string filename) {
ifstream infile;
infile.open(filename, ios::in);
int tmp;
int cnt = 0;
int i = 0;
while (infile >> tmp)
input_stream.push_back(tmp);
infile.close();
}

int median_maintain(void) {
priority_queue<int, vector<int>, greater<int>> min_heap;
priority_queue<int> max_heap;
int cnt = 0;

/* for the first two elements, add smaller to max_heap on the left, and bigger * one to the min_heap on the right */ max_heap.push(min(input_stream[0], input_stream[1])); cnt += input_stream[0]; min_heap.push(max(input_stream[0], input_stream[1])); cnt += max_heap.top();

for (int i = 2; i < input_stream.size(); ++i) {
/* step 1: add next item to one of the heaps: if next item is smaller than * max_heap root, add it to max_heap, else add it to min_heap */ if (input_stream[i] < max_heap.top()) max_heap.push(input_stream[i]); else min_heap.push(input_stream[i]); /* step 2: balance the heaps(after this step, heaps will be either balanced * or one of them will contains one more item): if number of elements in one * of the heaps is greater than other more than one, remove the root element * from the one containing more elements and add to the other one */ if (min_heap.size() > max_heap.size() and min_heap.size() - max_heap.size() > 1) { int tmp = min_heap.top(); min_heap.pop(); max_heap.push(tmp); } else if (max_heap.size() > min_heap.size() and max_heap.size() - min_heap.size() > 1) { int tmp = max_heap.top(); max_heap.pop(); min_heap.push(tmp); }

/* if heaps contain even elements: median is max_heap's root, else median is * root of heap with more elements */ int val; if (i % 2) cnt += + max_heap.top(); else { if (max_heap.size() > min_heap.size()) cnt += + max_heap.top(); else cnt += min_heap.top(); } cnt %= MOD;
}
return cnt;
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: