EldoS | Feel safer!

Software components for data protection, secure storage and transfer

ZipReader for LARGE file - decompress in chunks?

Also by EldoS: CallbackProcess
A component to control process creation and termination in Windows and .NET applications.
#36538
Posted: 04/20/2016 23:18:35
by Brett Beattie (Standard support level)
Joined: 03/08/2011
Posts: 25

Hi,
I have a large file (16GB) that is compressed to a 1GB zip file, that I have download to a memory stream using the SB SFTP toolset.

The compression method is Deflate64, so standard .net ZipArchive fails to process the file.

What I want to do is use a StreamReader on the decompress stream and read line by line to process the contents, rather than decompress the whole 16GB to a memory stream at once.

e.g. similar code using ZipArchive
Code

      using (System.IO.Compression.ZipArchive archive = new ZipArchive(inputMemStr))
            {
              foreach (ZipArchiveEntry entry in archive.Entries)
              {
                var decompressionStream = entry.Open();
                StreamReader sr = new StreamReader(decompressionStream);
                while (!sr.EndOfStream)
                {
                  outline = sr.ReadLine();
                  //process the data line
                  ...
                }
              }
            }


equivalent TElZipReader code would be:
Code
        TElZipReader zr = new TElZipReader();
        zr.Open(fs1);
        using (MemoryStream outStrm = new MemoryStream())
        {
          zr.Directory.get_Entries(0).ExtractionStream = outStrm;
          zr.Extract(zr.Directory.get_Entries(0), null);
          StreamReader sr = new StreamReader(outStrm);
          while (!sr.EndOfStream)
          {
            outline = sr.ReadLine();
            //process the data line
            ...
          }
        }


HOWEVER the ZipReader extract method
Code
zr.Extract(zr.Directory.get_Entries(0), null);

seems to decompress the whole file at once, and without it there is nothing in outStrm.

At the moment I see my best option is to decompress to a temporary file, which will need to be encrypted due to the nature of the data, and then process the encrypted file line by line.

I'd really like to be able to do this in memory without the use of a file. Any way this can be achieved? Or can it be added in future?
#36542
Posted: 04/21/2016 07:35:22
by Eugene Mayevski (EldoS Corp.)

Our components use the approach, which is different from the one in .NET. In .NET Framework many components return a stream object, which performs actual job (i.e. when you try to read from it, it feeds the data from some external source). In opposite, in SecureBlackbox components perform the job directly when you ask them to. Thus a temporary file would be needed. You can extract the data to memory stream (use OnExtractionStreamNeeded event for this). Potentially you can implement your own class, which will pass the data being written to it, to processing. This way you would emulate the approach of .NET, but in more "event-driven" way.


Sincerely yours
Eugene Mayevski
#36547
Posted: 04/21/2016 20:26:36
by Brett Beattie (Standard support level)
Joined: 03/08/2011
Posts: 25

Ok,
I have just realised this isnt working.

I previously (mistakenly) was using an older version of SB (8.2.197) which worked perfectly with these files. It happily unzipped the file.

When I switched to my latest licensed version (10.0.233) I get the following error - 'Extensions parsing error'

I set the value of IgnoreArchiveErrors to true, but that seems to make no difference in this case. I can see from another topic that you say this is because SB is strict in its processing but that doesn't really help.
This is a large file simply compressed using windows (Send To -> Compressed (Zipped) Folder.

(unfortunately 10.0.233 is in my production environment for the other packages used, so I cannot revert)

I also just download the latest demo version of SB (14) and tried that and I get the same error :-(

Is there any way I can work around that error?
#36550
Posted: 04/22/2016 05:32:51
by Eugene Mayevski (EldoS Corp.)

The architecture of the components never changed, so if something worked in version 8 and stopped working in version 10, this is not due to the streaming way of implementing SecureBlackbox.

If version 14 exposes the problem as well, then it's the issue of certain extensions in the file, which are not handled (either because they are malformed, or because of the error - I can't say). To be able to reproduce the issue and see, what's wrong, we would need one of those files (it doesn't need to be huge - even empty file would work if it exposes the error).

I must warn that while we can give some verdict regarding the reason of the issue, we will not be able to patch the old version. We will only be able to introduce the issue in the upcoming SecureBlackbox 15.


Sincerely yours
Eugene Mayevski
#36572
Posted: 04/25/2016 18:57:49
by Brett Beattie (Standard support level)
Joined: 03/08/2011
Posts: 25

Thanks. I was hoping there was some flag or parameter that I might have missed to make this work :-(

I believe you need a file that is over 2GB for windows to use Deflate64 to compress it.

All I can suggest is you find a large multi GB file and compress it in windows (right click -> Send To -> Compress (Zipped) Folder) and see if SB can extract it?
I cant see a way to force windows to use deflate64 on smaller files.

Regards
#36578
Posted: 04/26/2016 04:13:01
by Mykola Olshevsky (Basic support level)
Joined: 07/07/2005
Posts: 450

Hi,
It seems you are mixing together Defalte64 as compression algorithm (slightly improved original Deflate, which compresses better in some certain situations), and Zip-64 as 64-bit extensions to handle files larger then 2Gb.
As far as I remember Windows do not support Deflate-64, but supports Zip-64 extensions. SecureBlackbox supports them as well so it should be something wrong in your particular case.

Regarding reading extracted data line by line - you can simply make a descendant of Stream, which handles .Write in the way you need.
#36580
Posted: 04/26/2016 15:52:23
by Vsevolod Ievgiienko (EldoS Corp.)

Hello.

We've reproduced the problem and improved the code to handle Zip64 extension correctly in case of "big" files. The improvement will be include into the next SecureBlackbox build.
Also by EldoS: Rethync
The cross-platform framework that simplifies synchronizing data between mobile and desktop applications and servers and cloud storages

Reply

Statistics

Topic viewed 2339 times

Number of guests: 1, registered members: 0, in total hidden: 0




|

Back to top

As of July 15, 2016 EldoS Corporation will operate as a division of /n software inc. For more information, please read the announcement.

Got it!