您的位置:首页 > 理论基础 > 数据结构算法

为存储浏览器的历史设计数据结构

2013-05-19 18:08 316 查看
为存储浏览器的历史设计数据结构。

Design a DS for storing browsing history.

这是一个需要详细分析需求和典型应用实例的复杂问题。只有通过分析我们才能合理的衡量不同可能的问题。以下是我的一些简单的想法。

This is a very complex question that needs a detailed analysis of the requirements and typical use cases. Only then can we properly weigh the different possible solutions. Below are just some simple thoughts off the top of my head:

一个简单的办法是保存2个数据结构。第一个是基于queue的双链表,它以每个页面的访问日期排序。每个节点都有时间戳,URL和一个指向queue中表示下一个之前访问的具有相同URL的节点(该页面在更久之前被访问)的指针。queue将被封装位为固定大小和固定时间范围,这样必要时可以从queue的后面删除记录。

A simple idea is to keep 2 data structures. First I'd have a doubly-linked list-based queue sorted by date for each page access. Each node would have the timestamp, URL, and a pointer to the node in the queue representing the next older access of that same
URL. The queue would be capped to both a certain size and certain time range and then I could delete entries from the back of the queue as needed.

第二,使用先以URL域名字母排序,然后以域名内页面字母排序的TreeSet,将URL都映射一个指针,该指针指向双链表中表示最近访问过的该页面的节点。

Second, I would have a TreeSet sorted alphabetically by URL domain name first and then by page URL within that domain name, mapping URLs to pointers to the node in the doubly-linked list that represents the most recent access to that page.

如果需要查询某个特定网页被访问的所有时间,能够迅速的做到这一点。URL都是以字母顺序保存的,选择正确的一个,做查询,找到最近访问的该页面,最近访问的该页面有一个指针指向之前被访问的该页面。如果想查看所有的访问历史,可以查看整个queue,将会显示他们按时间顺序访问过的所有网页。

That way, if someone wanted to look at when all their accesses to a specific page occurred, it could be done very quickly. They'd be presented with URLs in alphabetical order and they'd select the right one, a lookup would be done and they'd get the record
of the most recent access, and the record of the most recent access has a pointer to the next access, etc. Then if they wanted to see their *entire* history of visited sites, they could view the entire queue and it would be showing them a log of things they've
visited in chronological order.

可能需要使用额外的数据结构。例如,存储一个指针指向一天前的节点,或者两天前的节点,等等。因此某个时间戳内的历史能够在最少数目的节点遍历下表示。当然也需要更新这些节点。(后面一句不理解)

The use of additional data structures is possibly desirable. For instance, maybe we'd store a pointer somewhere to a node that's a day old, a node that's 2 days old, etc., so certain timeslices of the history could be presented with the least amount of node
traversals. Of course these pointers would have to be updated too, but background work is much better than having time during which the user is actively waiting for something to happen.

上面仅仅是内部内存结构。如果要坚持使用这个数据结构。最好使用类似数据库的东西(但是是在本地的,不必使用像SQL Server所具有的远程服务器的方式)。如果使用索引的数据库表,至少需要date和URL的索引。

That's just if it's an in-memory structure. Another thought is that you need to persist this data structure. So maybe I'd rather use something that's more like a database (but local, without the whole remote server aspect that technologies like SQL Server
have). If you used an indexed database table, you'd want at least an index on the date and an index on the URL.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: