您的位置:首页 > 产品设计 > UI/UE

SGI STL (4) :: String Implementation Issue

2016-03-08 12:50 399 查看

Issue in String Draft Standard

the problem is that, if two strings share a common representation, they are vulnerable to modification through a pre-existing reference or iterator.

#include <string>
#include <stdio.h>

main()
{
string s("abc");
string t;
char & c(s[1]);

// Data typically shared between s and t.
t = s;
// How many strings does this modify?
c = 'z';
if (t[1] == 'z')
{
printf("wrong\n");
} else
{
printf("right\n");
}
}


updating a reference to one of s’s elements should only modify s, not t as well. However, given the design of basic_string, though, it is very difficult for a reference-counted implementation to satisfy that requirement.

The only known way for a reference-counted implementation to avoid this problem is: whenever a program obtains a reference or an iterator to a string (e.g. by using operator[] or begin()), that particular string will no longer use reference counting; assignment and copy construction will copy the string’s elements instead of just copying a pointer.

Or, totally abandon reference-counting impl.

So what should I use to represent strings?

Use SGI Ropes

perform reasonably well for all applications that do not require very frequent small updates to strings.

It is the only alternative that scales well to very long strings, i.e. that could easily be used to represent a mail message or a text file as a single string.

The disadvantages are:

Single character replacements are slow.

Portability and compilation time may be an issue in the short term.

C strings

This is likely to be the most efficient way to represent a large collection of very short strings. The primary disadvantages are that

Operations such as concatenation and substring are much more expensive than for ropes if the strings are long. A C string is not a good representation for a text file in an editor.

The user needs to be aware of sharing between string representations. If strings are assigned by copying pointers, an update to one string may affect another.

provide no help in storage management. This may be a major issue, although a garbage collector can help a
4000
lleviate it.

Most operations on entire strings (e.g. assignment, concatenation) do not scale well to long strings.

vector < char >

If a string is treated primarily as an array of characters, with frequent in-place updates, it is reasonable to represent it as vector or vector. The same is true if it will be modified by STL container algorithms.

Unlike C strings, vectors handle internal storage management automatically, and operations that modify the length of a string are generally more convenient.

Disadvantages are:

Vector assignments are much more expensive than C string pointer assignments; the only way to share string representations is to pass pointers or references to vectors.

Most operations on entire strings (e.g. assignment, concatenation) do not scale well to long strings.

A number of standard string operations (e.g. concatenation and substring) are not provided with the usual syntax, and must be expressed using generic STL algorithms. This is usually not hard.

Conversion to C strings is currently slow, even for short strings. That may change in future implementations.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: