您的位置:首页 > 编程语言 > C语言/C++

查询词提示系统的简单实现

2015-11-18 08:30 387 查看
问题来源:

闲逛到牛客网,这是百度2016研发工程师笔试题(五)中的最后一道编程题,一共12道,一小时内完成;我只对了5道,25分。下面的代码是线下做的,断断续续花了大半天(中间国足客场0:0香港,终于杀死悬念)。

问题原文:

设计一个查询词提示系统

查询词提升是现代搜索引擎中广泛使用的一种技术,当用户输入查询词前缀时,会给出一系列相关的查询词推荐,例如在搜索框内输入”中国”,会提升”中国好声音”,”中国银行”, > “中国联通”等,尝试设计一个查询词提示系统,回答以下问题:

1. 给定一个查询词集合,用何种数据结构和算法来构建最基本的提示系统?要求输入中文和拼音都能正常工作

2. 用户输入的前缀下可能有很多可提示的查询词,如何对这些查询词进行排序,将用户选择概率更高的词放在前面?

我的思路:

Use a header item for a search prefex and it points to a doubly linked list for search keys that share the same search prefex

class SearchKeySystem provide interfaces for adding prefex, getting tips for one prefext, stimulating a search operation, and dumping the important data

The basic function works and still more to be done (marked as “TBD” in comments)

The most difficult part is the design and the implementation of maintaining the doubly linked list to be always sorted

Code for the Search Key system:

#pragma once
//Thomas Tang 2015-11-18 @cd

/*
查询词提升是现代搜索引擎中广泛使用的一种技术,当用户输入查询词前缀时,会给出一系列相关的查询词推荐,例如在搜索框内输入"中国",会提示"中国好声音",
"中国银行", "中国联通"等,尝试设计一个查询词提示系统,回答以下问题:
1.给定一个查询词集合,用何种数据结构和算法来构建最基本的提示系统?要求输入中文和拼音都能正常工作
2.用户输入的前缀下可能有很多可提示的查询词,如何对这些查询词进行排序,将用户选择概率更高的词放在前面?

Summary: use a header item for a search prefex and it points to a doubly linked list for search keys that share the same search prefex
class SearchKeySystem provide interfaces for adding prefex, getting tips for one prefext, stimulating a search operation, and dumping the important data
The basic function works and still more to be done (marked as "TBD" in comments)
The most difficult part is the design and the implementation of maintaining the doubly linked list to be always sorted
*/

#include <iostream>
#include <string>
#include <vector>

std::string GetPinYin(const std::string &key) //may be case sensitive or maybe case insensitive, may be only the first letter is capitalized
{
return key;
}

//Data for the search prefex(work as the Header item)
struct SearchPrefex
{
std::string Prefex; //prefex for search, by which you get the tips
std::string PrefexPinYin; //case insensitive
std::string SearchKeys; //a string of all the search keys separated by ; and it's always sorted by frequency and it's always updated
struct SearchKey* FirstSearchKey; //points to its first <SearchKey> item
SearchPrefex(const std::string &prefex)
{
Prefex = prefex;
PrefexPinYin = GetPinYin(prefex);
SearchKeys="";
FirstSearchKey = NULL;
}
};

//Data for the search keys
struct SearchKey
{
std::string Key;  //keys used for actual search
std::string KeyPinYin; //case insensitive
unsigned long Frenquncy; //how many times it is used for actual search
struct SearchKey* Next; //next SearchKey with the same prefex
struct SearchKey* Previous; //previous SearchKey with the same prefex, NULL if it's the first item for the prefex
SearchKey(const std::string &key)
{
Key = key;
KeyPinYin = GetPinYin(key);
Frenquncy = 1;
Next = NULL;
Previous = NULL;
}
/*bool operator>(const SearchKey &anotherkey)
{
if(Frenquncy >anotherkey.Frenquncy)
return true;
else
return false;
}*/

};

class SearchKeySystem
{

//maintain the search prefex and keys in multiple linked lists,
//each linked list has a header item (of type SearchPrefex and it has the tip string <SearchKeys>) and multiple SearchKey items sorted by the Frequency
//whenever the SearchKey items are updated (as Frequency changed or new items added), the <SearchKeys> in the header item get changed
//TBD: This vector should always be sorted by the length of the search prefex from the longest to the shortest
std::vector<SearchPrefex*> AllSearchKeys;

void UpdateSearchKeyHelper(SearchPrefex *prefexItem, const std::string &key)
{
std::cout << "Helper: before update the prefex is:  "<< prefexItem->Prefex <<". The keys are" <<prefexItem->SearchKeys << std::endl;
SearchKey *firstkeyitem = prefexItem->FirstSearchKey;
//Header only
if(firstkeyitem==NULL)
{
prefexItem->FirstSearchKey = new SearchKey(ke
4000
y);
prefexItem->SearchKeys = key;
std::cout << "Helper: after update the prefex is:  "<< prefexItem->Prefex <<". The keys are" <<prefexItem->SearchKeys << std::endl;
}
else //has other SearchKey items
{
SearchKey *item = firstkeyitem;
while(item->Key != key && item->Next!= NULL)
item = item->Next;

//search for SearchKey item that equals to <key>
if(item->Key == key) //found the SearchKey item
{
item->Frenquncy++;
//sort it by checking with previous items
SearchKey *updatedItem = item;

//if it is the first key item, no more changes needed, else need to do more checking
//in fact, this check is not necessary, as the following while cover it
/*if(updatedItem != firstkeyitem)
{*/
while(item->Previous!=NULL && item->Previous->Frenquncy < updatedItem->Frenquncy)
item = item->Previous;
if(item!=updatedItem)//found a smaller Frequency one, need to switch the two,
{
SearchKey *Previous_of_updateditem = updatedItem->Previous;
SearchKey *Next_of_updateditem = updatedItem->Next;

SearchKey *Previous_of_item = item->Previous;
SearchKey *Next_of_item = item->Next;
if(Next_of_item = updatedItem) //two adjacent items to be switched
{
updatedItem->Previous = Previous_of_item; //could be null
if(Previous_of_item==NULL)
prefexItem->FirstSearchKey = updatedItem;
else
Previous_of_item->Next = updatedItem;
updatedItem->Next = item;

item->Previous = updatedItem;
item->Next = Next_of_updateditem;
if(Next_of_updateditem!=NULL) Next_of_updateditem->Previous = item;
}
else
{

updatedItem->Previous = Previous_of_item;
if(Previous_of_item==NULL)
prefexItem->FirstSearchKey = updatedItem;
else
Previous_of_item->Next = updatedItem;
updatedItem->Next = Next_of_item;
if(Next_of_item!=NULL) Next_of_item->Previous = updatedItem;

Previous_of_updateditem->Next = item;
item->Previous = Previous_of_updateditem;
item->Next = Next_of_updateditem;
if(Next_of_updateditem!=NULL) Next_of_updateditem->Previous = item;
}

//update the Tip string for the new sort

SearchKey *keyItem = prefexItem->FirstSearchKey;
std::string tempstr = "";
while(keyItem != NULL)
{
tempstr = (tempstr==""? keyItem->Key: tempstr+";"+keyItem->Key);
keyItem = keyItem->Next;
}
prefexItem->SearchKeys = tempstr;
std::cout << "Helper: after update the prefex is:  "<< prefexItem->Prefex <<". The keys are" <<prefexItem->SearchKeys << std::endl;
}

}
else if(item->Next == NULL) //not found the search key, append it to the end
{
item->Next = new SearchKey(key);
item->Next->Previous = item;
//no need to sort it, also need to update the SearchKeys string
prefexItem->SearchKeys = prefexItem->SearchKeys + ";" +key;
std::cout << "Helper: after update the prefex is:  "<< prefexItem->Prefex <<". The keys are" <<prefexItem->SearchKeys << std::endl;
}
}
}

public:
SearchKeySystem()
{}

~SearchKeySystem()
{
std::vector<SearchPrefex*>::iterator it= AllSearchKeys.begin();
for(; it!=AllSearchKeys.end(); it++)
{
SearchKey *keyItem = (*it)->FirstSearchKey;
SearchKey * tempkeyItem = NULL;
delete *it;
while(keyItem != NULL)
{
tempkeyItem = keyItem->Next;
delete keyItem;
keyItem = tempkeyItem;
}
}
}

//The function dumps all the important data in the search system
void DumpAllTheData()
{
std::vector<SearchPrefex*>::iterator it= AllSearchKeys.begin();
for(; it!=AllSearchKeys.end(); it++)
{
std::cout << (*it)->Prefex << "---" << (*it)->SearchKeys << std::endl;
}
}

//This function stimulates one actual search operation of <key>
//It may triggers the changes in the search system (to update its frequency, or to update the data in SearchPrefex)
void UpdateSearchKey(const std::string &key)
{
std::cout << "did a search operation for "<< key << std::endl;
std::vector<SearchPrefex*>::iterator it= AllSearchKeys.begin();
size_t nsize = key.size();
for(; it!=AllSearchKeys.end(); it++)
{
if( nsize > (*it)->Prefex.size())
nsize =  (*it)->Prefex.size();
if(!key.compare(0, nsize, (*it)->Prefex.c_str(), nsize) )
break;
}
//match one prefex or not match to any prefex(use it as a prefex and also a key)
if(it!=AllSearchKeys.end())
{
UpdateSearchKeyHelper(*it, key);
}
else
{
AllSearchKeys.push_back(new SearchPrefex(key));
UpdateSearchKeyHelper(AllSearchKeys[AllSearchKeys.size()-1], key);
std::cout << "added a new prefex for "<< key << std::endl;
}
}

//The Function adds a new search <prefex> into the system
//ignore it if already in the system, add it if it's a completely new one(without appending any SearchKey Items to it)
void AddSearchPrefex(const std::string &prefex)
{
std::vector<SearchPrefex*>::iterator it= AllSearchKeys.begin();
for(; it!=AllSearchKeys.end(); it++)
{
if((*it)->Prefex == prefex)
break;
}
//found it or not found it in the vector of prefex list
if(it==AllSearchKeys.end())//not found
{
AllSearchKeys.push_back(new SearchPrefex(prefex));
std::cout << "added a new prefex: "<< prefex << std::endl;
}
}

//The function accepts a search <prefex> (not actually hit ENTER to do the search),
//Returns the Tips(return empty string if no tips available; return a ; separated string for availe tips)
//<prefex> could be Chinese characters or PinYin(Not supported but minor changes can do)
//The tips are sorted by the frequency used in real search.
std::string GetTips(std::string &prefex)
{
std::cout << "get tips for "<< prefex << std::endl;
std::vector<SearchPrefex*>::iterator it= AllSearchKeys.begin();
for(; it!=AllSearchKeys.end(); it++)
{
if((*it)->Prefex == prefex)
break;
}
//found it or not found it in the vector of prefex list
if(it!=AllSearchKeys.end())
{
return ((*it)->SearchKeys);
}
else
{
AllSearchKeys.push_back(new SearchPrefex(prefex));
std::cout << "added a new prefex: "<< prefex << std::endl;
return "";
}
}

};


Code for using the search key system:

#include <stdio.h>
#include <iostream>
#include "SearchKeys.h"

int main(int argc, char* argv[])
{
SearchKeySystem *pSearchKeySystem = new SearchKeySystem();

char ch = '0';

while(ch != '5')
{
std::cout << "1. Add new search prefex\n"
"2. Display tips for input search prefex\n"
"3. To simulate the search operation\n"
"4. Dump the collected data\n"
"5. Exit\n";
std::cin >> ch;
std::cin.get();

switch(ch)
{
case '1':
{
std::cout << "Input prefex you want to add (one per a line), use RET to return to upper menu\n";
std::string prefex="";
do
{
std::getline(std::cin, prefex);
if(prefex == "RET")
break;
else
{
pSearchKeySystem->AddSearchPrefex(prefex);
}
}
while(true);
}
break;
case '2':
{
std::cout << "Input prefex and its tips will be displayed, use RET to re
aba5
turn to upper menu\n";
std::string prefex="";
do
{
std::getline(std::cin, prefex);
if(prefex == "RET")
break;
else
{
std::cout << pSearchKeySystem->GetTips(prefex) << std::endl;
}
}
while(true);

}
break;
case '3':
{
std::cout << "Input the search key, use RET to return to upper menu\n";
std::string key="";
do
{
std::getline(std::cin, key);
if(key == "RET")
break;
else
{
pSearchKeySystem->UpdateSearchKey(key);
}
}
while(true);

}
break;

case '4':
{
std::cout << "Here is all the data\n";
pSearchKeySystem->DumpAllTheData();
}
break;
case '5':
{
std::cout << "ByeBye!\n";
}
break;
default:
std::cout << "Wrong choice! Please try again!\n";
break;
}
}

delete pSearchKeySystem;
return 0;
}


More to be done:

The search prefex list is not sorted in the code

//This vector should always be sorted by the length of the search prefex from the longest to the shortest
std::vector<SearchPrefex*> AllSearchKeys;


PinYin(拼音)is not supported right now

It is possible to use API to get pinyin for each Chinese character from 金山词霸 or any other sources? Interesting! I will look into that later.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息