您的位置:首页 > 其它

《Cracking the Coding Interview》——第10章:可扩展性和存储空间限制——题目6

2014-04-24 22:04 489 查看
2014-04-24 22:01

题目:你有10亿条url,怎么检测其中时候有重复呢?

解法:Hash,算签名,然后用K-V数据库保存数据查重。

代码:

// 10.6 You have 10 billion URLs, how would you do to detect duplicates in them.
// Answer:
//    1. Use digital sign algorithm to convert string to a number of checksum.
//    2. Use this sign as the hash key, if memory allow, use an in-memory hash table to detect duplicates.
//    3. If memory won't fit in, use K-V database instead. 10GB scale should be acceptable for one machine, so I won't seek help from another computer.
int main()
{
return 0;
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐