您的位置：首页 > 产品设计 > UI/UE

Research on Architectures and Key Techniques of Gansu Science & Technology Documentation Sharing Pla

2014-01-19 18:25 519 查看

Abstract. Gansu Science & Technology Documentation Sharing Platform consists of five systems: full-text retrieval and web-publishing system, heterogeneous digit resources unitive search system, original text delivery system, user management and accounting system, statistical analysis system. The application and technique architectures of the platform were elaborated in this paper ,and the major key technologies on the platform were also expounded ,which include unitive search system, web2.0, web services and data security. The platform having been running shows that it integrated 173 resource databases,implemented "one stop" services,improved document resource integration degree.,improved service quality, level of management and market competitiveness capacity of documentary information organizations,reduced the repetitive investment of document resource and the development of duplicate of databases resource which have the same content.

Keywords: document sharing, architecture, unitive search, original text delivery, web service

1   Introduction

July 2004, the general office of state council forwarded "2004-2010 National Science and Technology Infrastructure Construction Program",and proposed the target of building science and technology infrastructure which is resource-rich,layout reasonable,technologically advanced, fully functional, operational efficiency.Currently,from national to provincial and municipal,they builded science & technology documentation sharing platform at all levels so as to provide better service for the scientific innovation.Since 2005,Gansu province relied on Institute of Science & Technology Information of Gansu as the main undertaker for building Gansu Science & Technology Documentation Sharing Platform(http://www.gsstd.cn), to the present,nearly 2,000 individuals and 104 group users registered, in the construction process,building models and key technology of the platform are very worthy of study and reflection, which sum up, can serve the future development of the role of inspiration and reference.

References [1] proposed four modes of current science & technology documentation sharing platform and analized in detail: resource-oriented mode, integrating service mode, technological application mode and comprehensive mode. Gansu science & technology documentation sharing platform is  comprehensive, joint catalog, reference,the original text delivery and other work shows the construction model of integrated service, on the other hand, resource integration reflects technological application. At present, the platform integrated literature resources in seven major collection of literature units, a total of 173 resource databases. Document types include journal papers, standards, patents, dissertations, union catalog, report of the meeting, agency products, local characteristic resources, network of development research center of the state council family library literature, through the unitive portal platform website to offer free secondary document search services for community, according to the user's need to provide the appropriate primary document paid services.

2   Platform’s Architecture

Software architecture directly restricts success or failure of software development[2]. The design and implementation of software development of science & technology documentation sharing platform which was acceptable and robust has important significance to its construction.Based on previous researchers’ R＆D harvest,and according to requirement of construction and maintenance, its software architecture was constructed from different aspects of application and technology.

2.1   Application architecture

This platform is composed of five systems, as shown in fig. 1.

Fig. 1. Application Architecture

1) www service

WWW service including resource directory and notice information.It supplied entry of system such as online registration,personal information query,accounting query and document delivery query.The entry of www servie supplied entry of registered users and nonregistered users. Nonregistered users cannot download document and ,but can search title and abstract.

2) full-text retrieval and web-publishing system

As construction unit of documentation,information which is massive and repeatedly used was collected.However,in these data that cannot be transformed into field such as text, image, audio, video, compound document, it also accounts for certain proportion.They cannot be efficiently handled by traditional RDBMS. And their real value was substantially reduced.Meanwhile,as structured information which could be efficiently handled by traditional RDBMS,It has defects such as processing speed slow, none-uniformity, insufficeient that indexed them.It could not meet information swift growth needs.

The full text retrieval takes the text data as the main processing object, provides the advanced inquiry method according to the material content but not the external characteristic realizes. This platform uses full text retrieval system which was widely applied in the domestic books intelligence system——TRIP(http://trip.istic.ac.cn/html/tripchn/docs/trip.htm).

3) heterogeneous digit resources unitive search system

The unification retrieval system refers to the user submit retrieval request through sole and user-friendly interface which could access many web databases and search engine at a time, gain more accurate and orderly retrieval result. It shows high precision and retrieval efficiency in higher recall.

User may retrieve various resources database on one retrieval and two retrievals.The field include title, author, full text, keyword, category number and so on. The inquiry condition was saved which could directly use in the later inquiry. The retrieval result display by paging. User can collect the retrieval result and browse digest information.if it has online full text in retrieval result, registration user may directly download it which could be automatically accounted by system. If it has offline full text,user could gain full text by original text delivery.

4) original text delivery system

It process request of original text delivery which user submit from search result. Before processing request, it authenticate users’ identity and the balance of account. If passed, request of original text delivery would be transmited to corresponding collection unit. It adopts advanced service pattern which is end user-oriented. Readers submit request of original text delivery on line by myself and gain full text in email when register in this system.The entire process does not need any third party involvement. The way of gaining literature is convenient, quick, conforms to reader's information acquisition custom under especially informationization swift development environment.

5) user management and accounting system

User management include several service such as user registeration, user management,send messages,send email, batch processing on user period of validity, batch processing on user prestore fee and user imformation online query. Among them, sends messages or email to the registered user is advantageous to publicize, then two batch processing functions greatly raised the literature service efficiency. The user management also provides the user authenticate [3] and the jurisdiction inspection, it assists to complete the user authentication of the literature WWW service, the unification retrieval and the original text delivery system.

Accounting system completes account of full text downloading and the original text request, it may recharge and refund money for the user.

6) statistical analysis system

According to the received document retrieval services, full download and the original ordering information, we count the use of various types of documents resources, users geographical and age distribution to provide reliable data for the decision analysis.

Statistical analysis of original text delivery system provides "Costs Statistics ", "User Statistics", "Sended Request Statistics", "received request statistics" and other modules, notably the three statistical feature of the design can be described as originality. "User Statistics" provides the results from the user education, job title, type, total number of accounts, using or not using the number of multiple dimensions of analysis, screening: the "send or receive requests statistics" module, not only count each document collection unit of sended or received requests number, but also provide details of the treatment results——satisfied, not satisfied and the specific reasons of not satisfied, the service hours, tolls. This is easy to understand document service units of the various documents for the processing of collections, the museum services of external circumstances, from mining problem, then the document collection for each unit of the library resources development, training and readers to communicate with the outside hall and other work provide the basis for the launching.

2.2 Technical architecture

This platform based on php and Web Service to view, business logic and data layers of a reasonable division of the overall framework of the system more optimized. Technical architecture was divided into the portal layer (service channel level), application support layer,application layer,information resources layer, as shown in Fig. 2.

Fig. 2. Technical Architecture

System architecture with four layers:client (request information), business (process the request) and data (to operate) which are physically isolated. Client layer consists of the browser and rich client, the browser interface for the display system, the rich client to handle ajax requests and XML data. Service layer consists of Web servers and application servers of literature gateway, Web server handle display logic, application servers of literature gateway deal with document retrieval business logic, business logic in the middle layer, do not care about what type of client request data, but also with the back-end system to maintain relative independence, is conducive to system expansion. Four- layer structure has better portability, you can work across different platforms, allowing users to request load balancing between multiple servers. Security is more easy to implement, because the business logic has been isolated with the customer.

Four- layer communication between the following: The browser through Ajax asynchronous call to send user requests to the Web server, Web server and the literature gateway were used between the SOAP protocol, the gateway application server through HTTP, Z39.50, ODBC or JDBC with the literature resources such as databases.

3   Key Techniques

3.1 Unitive search

Unitive search [4] also known as cross-database searching, unitive search system must work around two of the most significant features such as the heterogeneous and distributed computing, as long as the shield of heterogeneous database resources, realistic solutions will be able to propose with rational use of distributed processing technology to achieve a unitive document resources inquiries.

From the technical point of view, there are two kinds of unitive search model: joint search and integration search(http://www.chnlib.com/zylwj/shuzitsg/200605/221.html).Joint search process commonly used simulation Web access [5], the search criteria which was entered in unified search interface will be automatically saved and sended to multiple digital resources system, the digital resources system start their search system to search and show search results in the same interface.

In method of implementation of web-based simulation, the core technology is web information extraction. This paper proposes a new method which can extract the useful information from the different document sites automatically——web information extraction based on sub-tree breadth [6].The method is that view title number of per page of scientific and technical literature web site and store the number in the database, and then use the HTML Tidy(http://tidy.sourceforge.net) to clean up these pages into XML documents, and then generate DOM tree, computing breadth of someone sub-tree in DOM tree, by judging the breadth of a sub-tree is equal title number of per page of pre-stored in the database to establish the key information block[7-8], and then extract information from key information block. Experiments show that the method can guarantee a high accuracy in terms of recall and precision [9].

3.2 Web2.0

To improve user experience, the system uses Ajax, Tag and other Web2.0 technology to implement personalized service.Personalized service refers to users can obtain information or services by their own purposes and needs. For example: Users can collect  their own commonly used search document type and commonly used literature database by tag for later use. In user's document classification, a free label was used, allowing users to freely define. In resource database display,ajax was used, the user clicks the label to select resources database list of data output, in the whole operation, user does not need to refresh the entire page. In addition, my database, my favorites, my search history is the performance of personalized service.

Although the main advantage of PHP is superior processing character speed and reliability, through combination of Apache 2.0 makes the unitive search system has good stability and performance [10], but it does not support multi-threading, and unitive search system needs to search multiple databases at the same time, if an ordinary single-threaded program, processing speed will slow that people can not tolerate. Multi-task programming techniques was improved by ajax to enhance the program efficiency and avoid a " suspended animation "state of program interface.

3.3 Web service

The distributed store of digital resources is one of its main features, the traditional unitive search system was designed for a specific portal, it was lack of sharing resources and accounting between sharing units. System needs to establish Web service program for title search, abstracts obtain and full-text download by asynchronous call procedures, results transfer by XML file between users and documents node group, it achieved grid resource sharing, resource exchange visits and billing. For users, document search services mainly refers to it receives user requests and then starts the searching machine, and a XML file of search results is generated by services. The main services include resource list service, browsing titles service, browsing abstracts service and full-text download service.

3.4 Data security

This platform provide services by network, in this process, how to protect data security, mainly depend on following several technology:(1)strengthen right setting, legitimate users can be accessed by password; or through the IP address settings, users in IP segment can be accessed. (2)restrict data traffic of accessing databases to prevent malicious downloads and databases collapse. (3)using encryption and digital signature technology during network transmission, to prevent theft and destruction.

4   Conclusion

Key technology and the model of this platform has been successfully applied in addition in Gansu province, but also promote to China petroleum, Qinghai province(http://www.textqh.com), Ningxia province(http://www.nxkjwx.com.cn) and other provincial-level scientific and technical document sharing platform. The practice shows that the platform improve their integration degree. This research offers a feasible approach for the realization of "one stop" services. The application of this platform greatly promoted the process of information sharing of documentary information organizations,improved service quality, level of management and market competitiveness capacity of documentary information organizations,reduced the repetitive investment of document resource and the development of duplicate of databases resource which have the same content.

Along with the development of science & technology documentation sharing platform, research and development of heterogeneous digital resources unitive search will enter a deeper level, function will get further rich, but unitive search based on web service was created to solve non-standard data interface of vendors, and access and retrieval interface which was based on standard and norms of resource database is the solution to the current "information island" phenomenon of the effective ways and means.Our country should early strengthen standards and norms and compulsory promotion to complete our country digital libraries as soon as possible.

References

1.Hu Manggu. An Inquiry into Mode of Constructing Sharing Platforms of Scientific and Technical Documents in China and Thinking of Sustainable Development. Digital Library Forum,2008,(7):67～70(in Chinese)

2. Bass L,Clements P, Kazman R. : Software Architecture in Practice Second Edition. Addison-Wesley(2003)

3. Xiao Wanrong,Yang Shengju.Design and Implementation of Unified Identity Authentication System Based on LDAP. Computer Science,2008,35(5):298～301(in Chinese)

4. Huang Di.A Review of Cross Searching Technique for Heterogeneous Databases.Library and Information Service,2003,(6):94～97(in Chinese)

5. Cao Fang,Shi Shaoting. Design and Implementation of Heterogeneous Digital Documentation Unification Retrieval System Based on Web Simulation Process.Journal of the China Society for Scientific and Technical Information,2006,25(5):575～579(in Chinese)

6. Wang Quan, Shi Shaoting.Web Information Extraction Based on Sub-tree Breadth.Computer Engineering,2009,35(3):89～90,93(in Chinese)

7. Deng Cai,Yu Shipeng,Wen Jirong,et al.Block-based Web Search.Proc.of the 27th Annua1 lnternationa1 ACM SIGIR Conference.Sheffield,South Yorkshire, ACM Press,2004:456～463

8. Deng Cai,He Xiaofei,Wen Jirong,et al.Block-level Link Analysis.Proc.of the 27th Annual International ACM SIGIR Conference.Sheffield,South Yorkshire, ACM Press,2004:440～447

9. Gaizauskas R,Wilks Y.Information Extraction:Beyond Document Retrieval.Computational Linguistics and Chinese Language Processing,l998,3(2):17～60

10. Wang Quan, Shi Shaoting. Design and implementation of unitive search system based on PHP. Journal of Lanzhou University of Technology,2008,34(1):91～94(in Chinese)

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航