Paul Selles
Computers and cats
Tag Archives: XmlReaderSettings
XAML Formatting: Programmatically Tabify XAML and XML Files in C#
April 2, 2014
Posted by on With many developers working together, formatting styles can sometimes be an issue. This is most noticeable when working with XAML files, and the biggest culprit for formatting issues is poor tabbing and inconsistent tab characters. Luckily there is an easy way to standardize XAML formatting through that implantation of a custom check-in policy that can programmatically fix all XAML files prior to being checked in.
Using XmlWriter in conjunction with the XmlWriterSettings class we can easily customize our output [1][2]. The XmlWriterSettings properties give us a lot of control as to how we want the XML or XAML to look and our settings are documented and are worth looking over. I am also using XmlReader and XmlReaderSettings to convert the text into an XmlDocument, were the XmlReaderSettings class is used to ensure that we ignore any potentially invalid XML characters in our XAML [3][4][5][6]. I encountered one tricky bit were the XmlReader will break on decimal and hex character reference, this was solved by doing a text replace on all “&” with “&” pre-XmlReader and converting back post-XmlWriter.
public static class Tabify { // Tabify XML document public static void Xml(string filename) { DoTabify(filename, false); } // Tabify Xaml document public static void Xaml(string filename) { DoTabify(filename, true); } // Tabify private static void DoTabify(string filename, bool xaml=false) { // XmlDocument container XmlDocument xmlDocument = new XmlDocument(); // We want to make sure that decimal and hex character references are not lost string xmlString = File.ReadAllText(filename); xmlString = xmlString.Replace("&", "&"); // Xml Reader settings XmlReaderSettings xmlReadSettings = new XmlReaderSettings() { CheckCharacters = false, // We have some invalid characters we want to ignore }; // Use XML reader to load content to XmlDocument container using (XmlReader xmReader = XmlReader.Create(new StringReader(xmlString), xmlReadSettings)) { xmReader.MoveToContent(); xmlDocument.Load(xmReader); } // Customize how our XML will look, we want tabs, UTF8 encoding and new line on attributes XmlWriterSettings xmlWriterSettings = new XmlWriterSettings() { Indent = true, // Indent elements IndentChars = "\t", // Indent with tabs CheckCharacters = false, // Ignore invalid characters NewLineChars = Environment.NewLine, // Set newline character NewLineHandling = NewLineHandling.None, // Normalize line breaks Encoding = new UTF8Encoding() // UTF8 encoding }; // We do not want the xml declaration for xaml files if (xaml) xmlWriterSettings.OmitXmlDeclaration = true; // For XAML this must be false!!!! StringBuilder xmlStringBuilder = new StringBuilder(); // Write xml to file using saved settings using (XmlWriter xmlWriter = XmlWriter.Create(xmlStringBuilder, xmlWriterSettings)) { xmlWriter.Flush(); xmlDocument.WriteContentTo(xmlWriter); } // Restore decimal and hex character references xmlString = xmlStringBuilder.ToString().Replace("&", "&"); File.WriteAllText(filename, xmlString); } }
Paul
References
[1] XmlWriter Class. MSDN Library.
[2] XmlWriterSettings Class. MSDN Library.
[3] XmlReader ClassXmlReader Class. MSDN Library.
[4] XmlReaderSettings Class. MSDN Library.
[5] XmlDocument Class. MSDN Library.
[6] Parsing Xml with Invalid Characters in C#. Paul Selles
Parsing Xml with Invalid Characters in C#
July 3, 2013
Posted by on The Problem
I’ve stumbled upon an interesting predicament. I need to parse some SQL relationships from an automatically generated XML file that contains invalid characters. Here is an example XML file that I will use to highlight the problem that I saw:
<?xml version="1.0" encoding="utf-8"?> <Cats> <Cat Id="1" Type="Tabby"> <Property Name="Fur" Value="Coarse"/> <Property Name="Color" Value="Orange" /> <Part Name="Paws"> <Property Name="Claws" Value="Very sharp" /> </Part> <Part Name="Nose"> <Property Name="Cute" Value="true" /> </Part> <Info> I have an invalid character. </Info> </Cat> <Cat Id="2" Type="Short hair"> <Property Name="Fur" Value="Soft"/> <Property Name="Color" Value="Black" /> <Part Name="Paws"> <Property Name="Polydactyl" Value="true" /> <Property Name="Claws" Value="Sharp" /> </Part> <Part Name="Nose"> <Property Name="Cute" Value="true" /> </Part> <Info> I don't have an invalid character. </Info> </Cat> </Cats>
So above we have a small XML file cataloging my two cats. Within the Info tags you may notice that the first Cat entry has a superfluous character, 0x13; this falls outside of the valid XML character set [1]. The W3C recommendation, however, is no guarantee that every XML file that you encounter will follow the recommendations to a tee.
In C# we can try using the two most common XML parsing libraries System.Xml and System.Xml.Linq to import the XML file to the XmlDocument and XDocument objects using their respective Load functions [2][3]. If we try to do this we can expect to see the following exception:
‘ ‘, hexadecimal value 0x13, is an invalid character. Line 13, position 35.
The Solution
There is a workaround that is made possible with the lightweight disposable XmlReader class and the XmlReaderSettings support class that allows us to customize the behavior of XmlReader [4][5]. The XmlReaderSettings property that interests us the most is the Boolean CheckCharacters. Setting CheckCharacters property to false will let us read the XML document without verifying if the processed text data is within the valid XML character set [6]. The XmlDocument and XDocument objects can now be loaded from the XmlReader incident free:
static XmlDocument ReadXmlDocumentWithInvalidCharacters(string filename) { XmlDocument xmlDocument = new XmlDocument(); XmlReaderSettings xmlReaderSettings = new XmlReaderSettings { CheckCharacters = false }; using (XmlReader xmlReader = XmlReader.Create(filename, xmlReaderSettings)) { // Load our XmlDocument xmlReader.MoveToContent(); xmlDocument.Load(xmlReader); } return xmlDocument; }
static XDocument ReadXDocumentWithInvalidCharacters(string filename) { XDocument xDocument = null; XmlReaderSettings xmlReaderSettings = new XmlReaderSettings { CheckCharacters = false }; using (XmlReader xmlReader = XmlReader.Create(filename, xmlReaderSettings)) { // Load our XDocument xmlReader.MoveToContent(); xDocument = XDocument.Load(xmlReader); } return xDocument; }
Once we load our XML code then we are free to parse it, and since I prefer working with the System.Xml.Linq library, that’s all I will do:
static void PrintXDocument(XDocument xDocument) { foreach (XElement xElement in xDocument.Elements(xDocument.Root.Name).DescendantsAndSelf()) { Console.Write(("".PadRight(xElement.Ancestors().Count() * 4) + (xElement.HasElements == true || string.IsNullOrEmpty(xElement.Value) ? xElement.Name.LocalName : (xElement.Name.LocalName + " \"" + xElement.Value.Trim() + "\"")))); foreach (XAttribute xAttribute in xElement.Attributes()) Console.Write(" " + xAttribute.Name.LocalName + "=\"" + xAttribute.Value + "\""); Console.WriteLine(); } Console.ReadLine(); }
And the results:
Cats
Cat Id=”1″ Type=”Tabby”
Property Name=”Fur” Value=”Coarse”
Property Name=”Color” Value=”Orange”
Part Name=”Paws”
Property Name=”Claws” Value=”Very sharp”
Part Name=”Nose”
Property Name=”Cute” Value=”true”
Info “I have an invalid character.‼”
Cat Id=”2″ Type=”Short hair”
Property Name=”Fur” Value=”Soft”
Property Name=”Color” Value=”Black”
Part Name=”Paws”
Property Name=”Polydactyl” Value=”true”
Property Name=”Claws” Value=”Sharp”
Part Name=”Nose”
Property Name=”Cute” Value=”true”
Info “I don’t have an invalid character.”
We are not out of the woods yet
We are dealing with damaged goods here: that invalid character is still present, so we have to be careful. Notice the ‼ in the output above, that is 0x13.
An example of what can go wrong is evident if we try to print out the contents of our XDocument object:
Console.WriteLine(XDocument.Load(filename).ToString());
Normally we will get a printout of the containing XML. In this case we will see the exception we saw above.
Paul
References
[1] Extensible Markup Language (XML) 1.0 (Fifth Edition). 26 Nov 2008. W3C Recommendation
[2] XmlDocument Class. MSDN Library
[3] XDocument Class. MSDN Library
[4] XmlReader Class. MSDN Library
[5] XmlReaderSettings Class. MSDN Library
[6] XmlReaderSettings.CheckCharacters Property. MSDN Library
Recent Comments