您的位置:首页 > 其它

Handling Invalid Characters in an XML String (zz.IS2120.BG57IV3)

2012-11-15 17:41 441 查看


There are 5 predefined entity references in XML:

//z 2013-08-20 18:03:27 IS2120@BG57IV3.T3597203987.K[T191,L2147,R75,V2925]


<<less than
>>greater than
&&ampersand
''apostrophe
""quotation mark
//z 2014-04-10 17:47:22 BG57IV3@XCL T1043027031.K.F253293061 [T191,L2414,R116,V3989]

严格来讲,只有 < 和 & 在xml是非法的。但作为一个良好的习惯,上述字符串最好都替换掉的。

Note:
Only the characters "<" and "&" are strictly illegal in XML. Apostrophes, quotation marks and greater than signs are legal, but it is a good habit to replace them.

Recipe 15.7. Handling Invalid Characters in an XML String

Problem

//z 2012-11-15 17:45:37 IS2120@BG57IV3.T760357750 .K[T3,L107,R3,V27]

You are creating an XML string. Before adding a tag containing a text element, you want to check it to determine whether the string contains any of the following invalid characters:
<
>
"
'
&


If any of these characters are encountered, you want them to be replaced with their escaped form:
<
>
"
'
&


Solution

//z 2012-11-15 17:45:37 IS2120@BG57IV3.T760357750 .K[T3,L107,R3,V27]

There are different ways to accomplish this, depending on which
XML-creation approach you are using. If you are using XmlWriter, theWriteCData,WriteString,WriteAttributeString,WriteValue,
and WriteElementString methods take care of this for you. If you are usingXmlDocument andXmlElements, theXmlElement.InnerText
method will handle these characters.
The two ways to handle this using an XmlWriter work like this. TheWriteCData method will wrap theinvalid
character text in aCDATA section, as shown in the creation of theInvalidChars1 element in the example that follows. The other method, usingXmlWriter, is to use theWriteElementString method that will automatically escape
the text for you, as shown while creating theInvalidChars2 element.
// Set up a string with our invalid chars.
string invalidChars = @"<>\&'";
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
using (XmlWriter writer = XmlWriter.Create(Console.Out, settings))
{
writer.WriteStartElement("Root");
writer.WriteStartElement("InvalidChars1");
writer.WriteCData(invalidChars);
writer.WriteEndElement();
writer.WriteElementString("InvalidChars2", invalidChars);
writer.WriteEndElement();
}


The output from this is:
<?xml version="1.0" encoding="IBM437"?>
<Root>
<InvalidChars1><![CDATA[<>\&']]></InvalidChars1>
<InvalidChars2><>\&'</InvalidChars2>
</Root>


There are two ways you can handle this problem with XmlDocument andXmlElement. The first way is to surround the text you are adding to the XML element with a CDATA section and add it to theInnerXML
property of the XmlElement:
// Set up a string with our invalid chars.
string invalidChars = @"<>\&'";
XmlElement invalidElement1 = xmlDoc.CreateElement("InvalidChars1");
invalidElement1.AppendChild(xmlDoc.CreateCDataSection(invalidChars));


The second way is to let the XmlElement class escape the data for you by assigning the text directly to theInnerText property like this:
// Set up a string with our invalid chars.
string invalidChars = @"<>\&'";
XmlElement invalidElement2 = xmlDoc.CreateElement("InvalidChars2");
invalidElement2.InnerText = invalidChars;


The whole XmlDocument is created with these XmlElements in this code:
public static void HandlingInvalidChars( )
{
// Set up a string with our invalid chars.
string invalidChars = @"<>\&'";

XmlDocument xmlDoc = new XmlDocument( );
// Create a root node for the document.
XmlElement root = xmlDoc.CreateElement("Root");
xmlDoc.AppendChild(root);

// Create the first invalid character node.
XmlElement invalidElement1 = xmlDoc.CreateElement("InvalidChars1");
// Wrap the invalid chars in a CDATA section and use the
// InnerXML property to assign the value as it doesn't
// escape the values, just passes in the text provided.
invalidElement1.InnerXml = "<![CDATA[" + invalidChars + "]]>";
// Append the element to the root node.
root.AppendChild(invalidElement1);

// Create the second invalid character node.
XmlElement invalidElement2 = xmlDoc.CreateElement("InvalidChars2");
// Add the invalid chars directly using the InnerText
// property to assign the value as it will automatically
// escape the values.
invalidElement2.InnerText = invalidChars;
// Append the element to the root node.
root.AppendChild(invalidElement2);

Console.WriteLine("Generated XML with Invalid Chars:\r\n{0}",xmlDoc.OuterXml);
Console.WriteLine( );
}


The XML created by this procedure (and output to the console) looks like this:
Generated XML with Invalid Chars:
<Root><InvalidChars1><![CDATA[<>\&']]></InvalidChars1><InvalidChars2><>\
&'</InvalidChars2></Root>


Discussion

The CDATA node allows you to represent the items in the text section as character data, not as escapedXML, for ease of entry. Normally thesecharacters
would need to be in their escaped format (< for< and so on), but theCDATA section allows you to enter them as regular text.
When the CDATA tag is used in conjunction with the
InnerXml property of theXmlElement class, you can submit characters that would normally need to be escaped first. TheXmlElement
class also has an InnerText property that will automatically escape any markup found in the string assigned. This allows you to add these characters without having to worry about them.

See Also

See the "XmlDocument Class," "XmlWriter Class," "XmlElement Class," and "CDATA Sections" topics in the MSDN documentation.
//z 2012-11-15 17:45:37 IS2120@BG57IV3.T760357750 .K[T3,L107,R3,V27]

XML 非法 字符 转义 字符 处理
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐