EldoS | Feel safer!

Software components for data protection, secure storage and transfer

Unwanted Zero-Width Characters

Also by EldoS: CallbackFilter
A component to monitor and control disk activity, track file and directory operations (create, read, write, rename etc.), alter file data, encrypt files, create virtual files.
Posted: 03/19/2013 11:58:13
by Peter Cattlin (Standard support level)
Joined: 03/19/2013
Posts: 1

I am using SSHBlackbox.VCL to develop a client in Delphi XE2 for connecting to
a linux shell via SSH. The code is pretty much copied from the ClientDemo sample, with data handling as follows :-

procedure TfrmSSH.HandleOnData(Sender: TObject; Buffer: pointer; Size: integer);
  S: AnsiString;
  SetLength(S, Size);
  Move(Buffer^, S[1], Size);

My own proc ProcessSSHEscape conducts a variety of functions related to the processing of ANSI escape sequences. Much use is made of Regular Expressions as a means of capturing these sequences. Occasionally, the regular expressions fail because the sequence is interrupted by what seem to be unicode zero-width characters.

The particular flavour of PCRE regular expression engine does not recognise \u patterns, hence I cannot even try removing the most common zero-width characters \u200B and \uFEFF using TRegEx.Replace methodology. Neither can I remove non-ASCII characters with '' using a replacement pattern [^\x00-\xFF], not that I would favour this as a permanent solution. The regular expression engine will simply not recognise the problematic characters.

Could you please suggest some means of cleaning the data string of these zero-width nuisance characters, preferrably using some means of configuring one or more of the socket components. The full list of components is as follows :-


I have tried changing TerminalType in TElTerminalInfo and UseUTF8 in TElSSHClient, to no avail.
Posted: 03/19/2013 12:05:24
by Eugene Mayevski (EldoS Corp.)

SecureBlackbox treats transported data as opaque binary data and doesn't modify them in any way. It's up to the application to perform any conversion or cleanup if necessary (filenames and ASCII mode in FTP and SFTP are a different case). If you receive anything as an escape sequence in SSH, you do the cleanup.

Personally I recommend you to keep away from regular expressions if you have a fixed set of rules. It's almost always more efficient to write special procedures for each rules, than rely on the regexp engine which then fails or limits you (as in your case) and almost always works much slower than the routine you write yourself.

Sincerely yours
Eugene Mayevski



Topic viewed 820 times

Number of guests: 1, registered members: 0, in total hidden: 0


Back to top

As of July 15, 2016 EldoS Corporation will operate as a division of /n software inc. For more information, please read the announcement.

Got it!