EldoS | Feel safer!

Software components for data protection, secure storage and transfer

CBFS FileMapper performance compared with raw access

Also by EldoS: CallbackDisk
Create virtual disks backed by memory or custom location, expose disk images as disks and more.
#31164
Posted: 10/28/2014 15:03:12
by Todd Gleason (Standard support level)
Joined: 09/11/2014
Posts: 21

Before I get started, let me say that I've already read these performance-related articles, so I know there's some background on performance concerns already:

https://www.eldos.com/forum/read.php?FID=13&TID=5271
https://www.eldos.com/cbfs/articles/7979.php
https://eldos.com/forum/read.php?PAGEN_1=1&FID=13&TID=2922#nav_start

In our specific case, we've taken the FileMapper, and extended it with some additional security and NLog-based logging. Then we've run side-by-side integration tests via MSTest comparing the performance with our FileMapper derivative, and raw Windows performance. The test platform is running Windows 7 64-bit. For maximum compatibility we are running with the 32-bit version of the CBFS5Net.dll.

In running these tests, we're seeing not only a lot of performance variation (with both FileMapper and raw access), but also we're seeing significantly larger average times for operations to complete.

One strange thing with this is that when I run these side-by-side tests against a local USB drive, I get very small differences between FileMapper and raw access. But coworkers running from our CI environment calling to a network share drive (which is more like the "real" use case) see larger variations and overall much larger times with FileMapper, despite the presence of filesystem caching.

So some example timings: 3 reads of a 41 MB file without FileMapper lasted 2.25 sec, 1.19 sec, and 2.05 sec, while with FileMapper it was 2.47 sec, 2.60 sec, 2.57 sec. So, average of 1.83 sec for raw and 2.55 sec for FileMapper (a 39% increase in time). We used separate files for FileMapper vs. the raw access to equalize the affect of any caching, and first we wrote the files using the respective technologies, so in all likelihood the files are cached at some level in the OS by the time we actually begin timing. When I look at that 2.60 sec figure I see:

0 time: Security checks, file info (CbFsGetFileInfo()), opening file (CbFsOpenFile())
0.063 sec: Delay until CbFsReadFileNative()
0.54 sec: Reading the 41 MB file (CbFsReadFileNative())
0.037 sec: Delay before CbFsSetFileAttributes()
0.02 sec: CbFsCloseFile()
1.94 sec: Mysterious delay afterward before test finishes, when (presumably) CBFS is shuttling the data to the test, and the test is receiving it and ending its timing checks

So here the question is: How can it take the rest of the so much system longer to shuttle the data in memory to the test, than it took to read it in the first place? Understandably, Windows has probably cached the data read inside CbFsReadFileNative(), but regardless, does this amount of time--even considering the context switching--seem correct, or not?

Here are some of the settings we're using at the moment:

ShortFileNameSupport = true;
UseFileCreationFlags = true;
// https://www.eldos.com/forum/read.php?FID=13&TID=843
// mentions "page size" and suggests that there is less overhead with
// larger ones, so we'll try 64K pages
ClusterSize = 512*128;
SerializeCallbacks = false;
ParallelProcessingAllowed = true;
MinWorkerThreadCount = 2;
MaxWorkerThreadCount = 16;
StorageCharacteristics = 0; // default is removable, which it isn't
// in order to log effectively per-user, this seems necessary
CallAllOpenCloseCallbacks = true;
// apparently a smaller cache size increases performance
CacheSize = 65536*4;
ProcessRestrictionsEnabled = true;
NonexistentFilesCacheEnabled = true;
FileCacheEnabled = false;
MetaDataCacheEnabled = true;
FileSystemName = "NTFS";

Some notes:
1. We need to log file accesses and associate them back to the relevant process and user, so I think we need to make sure CallAllOpenCloseCallbacks = true.
2. Several machines must run this service and may access the underlying same network files, so I think we need FileCacheEnabled = false to avoid stale data.
3. I wasn't sure about some of the parallelization flags, but I haven't noticed them making a difference here. These tests aren't running parallel anyway.
4. I don't see that changing the cluster size from 16K (what we originally defined) to 64K sped anything up. Not sure if it should be even larger.
5. I'm not sure about the CacheSize value. One would think a larger cache would be faster but I was told this smaller value (256K) performed better than the default (300 MB?)

So, bearing all that in mind, are we missing any ways we might boost performance? Does it seem like CBFS is performing adequately or like there are unexpected performance issues with it? Or is there a potential issue in how the test was conducted?
#31168
Posted: 10/29/2014 05:01:01
by Volodymyr Zinin (EldoS Corp.)

Quote
1.94 sec: Mysterious delay afterward before test finishes, when (presumably) CBFS is shuttling the data to the test, and the test is receiving it and ending its timing checks

CBFS just passes the obtained from the OnRead callback data to the system. Could you clarify what request or what action lasts 1.94 sec?

Quote
1. We need to log file accesses and associate them back to the relevant process and user, so I think we need to make sure CallAllOpenCloseCallbacks = true.

Yes.

Quote
2. Several machines must run this service and may access the underlying same network files, so I think we need FileCacheEnabled = false to avoid stale data.

Yes. Perhaps in a future version we will add an API to have a possibility to flush cached data.

Quote
3. I wasn't sure about some of the parallelization flags, but I haven't noticed them making a difference here. These tests aren't running parallel anyway.

It can increase throughput generally in the case if a file is being read in parallel from several processes/threads.

Quote
4. I don't see that changing the cluster size from 16K (what we originally defined) to 64K sped anything up. Not sure if it should be even larger.

This property as well as SectorSize can influence on behavior of some programs. So during read/write they will use data chunks that is multiple or equal to sector or cluster size.

Quote
5. I'm not sure about the CacheSize value. One would think a larger cache would be faster but I was told this smaller value (256K) performed better than the default (300 MB?)

It makes sense only when FileCacheEnabled is true.

Quote
So, bearing all that in mind, are we missing any ways we might boost performance? Does it seem like CBFS is performing adequately or like there are unexpected performance issues with it? Or is there a potential issue in how the test was conducted?

It seems your approach is right. There is nothing to add there.
#31178
Posted: 10/29/2014 08:59:02
by Todd Gleason (Standard support level)
Joined: 09/11/2014
Posts: 21

Quote
CBFS just passes the obtained from the OnRead callback data to the system. Could you clarify what request or what action lasts 1.94 sec?


It's a file read request from .NET. The only timing available to me is the difference between when CbFsCloseFile() ends (that's the final callback) and when the test's timed method completes. So I don't know what is happening in CBFS itself to marshal the data between the FileMapper's service and the .NET request.

On the test end, we were timing a call to this method:

Code
        public static byte[] ReadFile(string filename, byte[] expectedContents)
        {
            byte[] bytes = File.ReadAllBytes(filename);

            if (expectedContents != null)
                Assert.IsTrue(expectedContents.SequenceEqual(bytes), "Expected same data");
            WriteMsg("Read file:  " + filename);
            return bytes;
        }

I was actually passing in the "expectedContents" value so .SequenceEqual() was getting run. I'll take it out and get the test re-run without that, but that should be equal "overhead" on the client end compared with the non-FileMapper version. And the WriteMsg() method just does Debug.WriteLine() (which was inactive due to it being a Release build) and Console.WriteLine() (which was redirected by mstest).

Quote
It makes sense only when FileCacheEnabled is true.

I was wondering if that was the case.

Quote
This property as well as SectorSize can influence on behavior of some programs. So during read/write they will use data chunks that is multiple or equal to sector or cluster size.

Since we're already at 64K clusters, could we expect a meaningful speedup by increasing it further? SectorSize BTW is at the default.
#31206
Posted: 10/30/2014 08:36:39
by Volodymyr Zinin (EldoS Corp.)

Quote
toddgleason wrote:
Since we're already at 64K clusters, could we expect a meaningful speedup by increasing it further?

As I wrote above it depends on an application which performs file reading/writing. I.e. whether it takes into account the "cluster size" value. In the case it isn't some predefined chunk size is used. You can check it by using Process Monitor from sysinternals.com. Run it and set there a filter to show only operations for your virtual disk. Then start copying a file and see in the Process Monitor logs for the read/write requests what a chunk size they have. After it recreate the virtual disk with another cluster size and run the test again. Check if the read/write chunk size is changed or not.

Quote
toddgleason wrote:
On the test end, we were timing a call to this method: ....

As I understand the following call lasts about 2 sec
Code
byte[] bytes = File.ReadAllBytes(filename);

Is it so?
Based on the log you specified in the first post this function performs the following:
1. Opens the file (the CbFsGetFileInfo() and CbFsOpenFile() callbacks are called).
2. Performs reading (the OnRead() callback is called).
3. After the data is already in .NET code the file is closed (the OnClose callback is called).
So it seems the .NET code which does such 1.94 sec delay.
#31482
Posted: 11/21/2014 18:16:19
by Todd Gleason (Standard support level)
Joined: 09/11/2014
Posts: 21

Looking at this further with Process Monitor, I see something running ONLY when accessing a file via CBFS, but not when File.ReadAllBytes() is directly called on a physical file.

Process Monitor showed:

Code
[time]   [service].exe   3024   SetBasicInformationFile   [physical file]   SUCCESS   CreationTime: 0, LastAccessTime: 0, LastWriteTime: 0, ChangeTime: 0, FileAttributes: AN

Actually, this is wrapped by CreateFile/CloseFile in ProcMon.

We logged a call to CbFsSetFileAttributes() at that same time.

I didn't see this in ProcMon when we did the same thing directly (no CBFS), which leaves me wondering if CBFS called OnSetFileAttributes unnecessarily. I can attempt to optimize our handler (probably putting the activity into a background thread and offering up cached data should a request be made for it in the meantime), but I'd rather know if there's something extra going on, and whether it's anything we have control over.
#31495
Posted: 11/24/2014 02:28:05
by Volodymyr Zinin (EldoS Corp.)

The differences can be because the CBFS disk has different characteristics than the physical drive you compare with. For example different file system name, sector size, whether the disk is removable or not. Also CBFS in some cases can handle requests without calling the user callbacks or it calls them later.
#31522
Posted: 11/26/2014 11:31:46
by Todd Gleason (Standard support level)
Joined: 09/11/2014
Posts: 21

Here is another case of an unexplained delay in the callbacks:

Code
2014-11-26 08:06:43.0543|Trace|18|CbFsReadFileNative():  \path\file.ext:  Bytes to read: 84934656:  start
2014-11-26 08:06:46.1517|Trace|18|CbFsReadFileNative():  \path\file.ext:  Bytes read:  84934656:  end
2014-11-26 08:06:46.1837|Trace|33|CbFsSetFileAttributes():  \path\file.ext;  CT:  00010101 00:00:00;  LAT:  20141126 08:06:46; LWT:  20141126 07:52:15;  Attr:  32:  start
2014-11-26 08:06:46.1837|Trace|33|CbFsSetFileAttributes():  \path\file.ext:  end
2014-11-26 08:06:46.1837|Trace|18|CbFsCloseFile():  Y:\folder\sharedsubfolder\path\file.ext:  start
2014-11-26 08:06:46.1837|Trace|18|CbFsCloseFile():  Y:\folder\sharedsubfolder\path\file.ext:  end
2014-11-26 08:07:16.5909|Trace|33|CbFsGetFileInfo():  \path\file.ext:  start

Note that after the file is read and successfully closed, there is a full 30 second delay until calling GetFileInfo(). I've never seen this delay with native file access, but occasionally I have seen it with CBFS. 30 seconds is a typical timeout value, so I wonder what could possibly be timing out in CBFS at this time?
#31523
Posted: 11/26/2014 11:50:52
by Eugene Mayevski (EldoS Corp.)

As I re-read the complete thread, I see that you have made certain changes and now the code behaves strangely.

The first thing to do is take an unmodified Mapper sample and measure performance with it. Next start adding modifications one by one and see which change causes the problem.


Sincerely yours
Eugene Mayevski
#31524
Posted: 11/26/2014 12:00:34
by Todd Gleason (Standard support level)
Joined: 09/11/2014
Posts: 21

I've actually seen this delay from the beginning, but it's rare enough that I haven't readily caught it in a log.
#31525
Posted: 11/26/2014 12:02:35
by Eugene Mayevski (EldoS Corp.)

Try using FileStream and its Read() method instead of File.ReadAllBytes. You are the first person in 7.5 years to report this kind of issue so the most likely reason is the way you use CBFS. Which means that we need to narrow down the problem to certain factor or combination of factors that lead to the delay.


Sincerely yours
Eugene Mayevski
Also by EldoS: MsgConnect
Cross-platform protocol-independent communication framework for building peer-to-peer and client-server applications and middleware components.

Reply

Statistics

Topic viewed 8539 times

Number of guests: 2, registered members: 0, in total hidden: 0




|

Back to top

As of July 15, 2016 EldoS Corporation will operate as a division of /n software inc. For more information, please read the announcement.

Got it!