EldoS | Feel safer!

Software components for data protection, secure storage and transfer

XML character entity references in run time

Also by EldoS: Solid File System
A virtual file system that offers a feature-rich storage for application documents and data with built-in compression and encryption.
#22104
Posted: 10/22/2012 08:57:27
by Leonardo Herrera (Standard support level)
Joined: 02/14/2011
Posts: 66

Hello,
I've been doing some XML work lately and I observed the following:

  • Creation of a new node with text containing a special character (one that is a predefined XML character entity reference, for example, 'B&W') and then accessing it using the OuterXML property will return the XML text without the character being escaped. However, reloading the document (saving it to disk and reading it again) will show the proper escaped entity.
  • Not all XML entity references are escaped; so far, I've only seen amp, lt and gt being automatically escaped, but apos and quot are not.

The first item affected me because I was creating a document in memory, calculating a digest then saving; having the character escaped during the save rendered the signature invalid. I resorted to "reloading" the document (saving to disk and reading it again) before calculating the hash for the node.

I guess the second item is not actually a bug since the produced documents are still valid, but is there a way to turn on escaping of all the predefined XML entities?

Thanks for your comments,
Leonardo Herrera
#22117
Posted: 10/22/2012 12:49:54
by Dmytro Bogatskyy (EldoS Corp.)

Quote
Creation of a new node with text containing a special character (one that is a predefined XML character entity reference, for example, 'B&W') and then accessing it using the OuterXML property will return the XML text without the character being escaped. However, reloading the document (saving it to disk and reading it again) will show the proper escaped entity.

Are you using TElXMLDOMDocument.CreateTextNode method? This should be fixed in the next build.
The standard behavior is to use TElXMLDOMElement.TextContent property to get or set text content of the element.
Quote
Not all XML entity references are escaped; so far, I've only seen amp, lt and gt being automatically escaped, but apos and quot are not.

Characters ' (apostrophe) and " (quote) are not required to be replaced with apos and quote entity references when used in the text. When those characters are used in the attribute value they could be replaced with corresponding xml entity reference (if a character match a quote character that used for attribute value).
#22142
Posted: 10/23/2012 21:15:46
by Leonardo Herrera (Standard support level)
Joined: 02/14/2011
Posts: 66

Quote
Dmytro Bogatskyy wrote:
Are you using TElXMLDOMDocument.CreateTextNode method? This should be fixed in the next build.
The standard behavior is to use TElXMLDOMElement.TextContent property to get or set text content of the element.

Yes, I use CreateTextNode. Thanks for looking at it.
Quote
Dmytro Bogatskyy wrote:
Characters ' (apostrophe) and " (quote) are not required to be replaced with apos and quote entity references when used in the text. When those characters are used in the attribute value they could be replaced with corresponding xml entity reference (if a character match a quote character that used for attribute value).

Thanks, I figured so much. Is there a non-hacky way to make them quoted for a certain document or node?
#22152
Posted: 10/24/2012 04:26:09
by Dmytro Bogatskyy (EldoS Corp.)

Quote
Is there a non-hacky way to make them quoted for a certain document or node?

You can split a text and create a Text nodes and Entity Reference (using CreateEntityReference method) for corresponding characters.
For example, you can copy and slightly modify AddTextToXMLElement from SBXMLUtils.
#33009
Posted: 04/09/2015 20:29:51
by Leonardo Herrera (Standard support level)
Joined: 02/14/2011
Posts: 66

Hello,

Just today got bitten by this old bug and wonder if you guys have a fix or something I can use with my old version (SBB9).

I'm trying to set the text of a node (not an Element) but special characters are not escaped.

In the meantime, I'm using the following function (based on the AddTextToXMLElement procedure in SBXMLUtils, as adviced):

Code
procedure SetTextChild(node: ElXMLDOMNode; Text: String);
var
  Document: TElXMLDOMDocument;
  N: TElXMLDOMNode;
  i: integer;
  c: XMLChar;
begin
  while node.ChildNodes.Length > 0 do
    node.RemoveChild(node.LastChild);

  if not Assigned(node) then
    raise EElXMLError.Create(sNoElementAvailable);

  Document := node.OwnerDocument;
  while Length(Text) > 0 do
  begin
    i := 0;
    while i < Length(Text) do
    begin
      c := Text[i + AnsiStrOffset];
      if (c = #0) or (c = '<') or (c = '&') then // isXMLref
        Break;

      Inc(i);
    end;

    if i >= Length(Text) then
    begin
      N := Document.CreateTextNode(Text);
      Text := '';
      node.AppendChild(N);
    end
    else
    begin
      if i > 0 then
      begin
        N := Document.CreateTextNode(Copy(Text, AnsiStrOffset, i));
        node.AppendChild(N);
      end;
      c := Text[i + AnsiStrOffset];
      Text := Copy(Text, i + AnsiStrOffset + 1, MaxInt);
      N := Document.CreateEntityReference(' ');
      if c = '<' then
        N.NodeValue := 'lt'
      else if c = '&' then
        N.NodeValue := 'amp'
      else
        N.NodeValue := IntToStr(Word©);

      node.AppendChild(N);
    end;
  end;
end;




Regards!
#33013
Posted: 04/10/2015 05:18:50
by Dmytro Bogatskyy (EldoS Corp.)

Hello,

Quote
I'm trying to set the text of a node (not an Element) but special characters are not escaped.

To what node are you trying to add a text node?
Also by EldoS: Rethync
The cross-platform framework that simplifies synchronizing data between mobile and desktop applications and servers and cloud storages

Reply

Statistics

Topic viewed 955 times

Number of guests: 1, registered members: 0, in total hidden: 0




|

Back to top

As of July 15, 2016 EldoS Corporation will operate as a division of /n software inc. For more information, please read the announcement.

Got it!