EldoS | Feel safer!

Software components for data protection, secure storage and transfer

use of NotifyDirectoryChange()

Also by EldoS: RawDisk
Access locked and protected files in Windows, read and write disks and partitions and more.
#33899
Posted: 07/06/2015 13:06:20
by Todd Gleason (Standard support level)
Joined: 09/11/2014
Posts: 21

I've seen several threads on the forum from especially 2008-2012 that talk about NotifyDirectoryChange(), and it appears that there was something of an evolution with this API in terms of blocking vs. non-blocking, and allowability of use inside or outside callbacks.

In our system we use a FileMapper derivative to access network-shared files, and we must work properly with concurrent accesses. One thing we have done to this end is to set both FileCacheEnabled = false and MetaDataCacheEnabled = false.

Here is a valid scenario that we must then contend with:

1. Machine A (running CBFS) creates a file
2. Machine B (possibly also running CBFS) immediately opens and rewrites the file to be larger
3. Machine A immediately reads the file back in

According to what I've read about NotifyDirectoryChange(), we need to have a way to make sure that Machine A knows about the change made in step 2, before reading the file in during step 3. Unfortunately, using something like a FileSystemWatcher is not reliable, and not always prompt. Using something like timer-based detection is definitely not prompt. These sorts of techniques appear necessary to support the use of a FileSystemWatcher in the client's code, but they are at odds with a smooth experience simply opening files in a concurrent environment.

What this leaves appears to be executing the notifications during CBFS operations. For instance:

We can execute NotifyDirectoryChange() during file open, but this means also failing the file open, which means the client sees an error that seems like it ought to be avoidable.

We can execute NotifyDirectoryChange() during the CbFsGetFileInfo() that occurs at the start of step 3, but when we do this, we get an IOException in the client, with the message "The volume for a file has been externally altered so that the opened file is no longer valid." Oddly, this occurs even though we sent the notification in CbFsGetFileInfo(), and the file was ostensibly closed at that time. (At least, the File Mapper didn't have a handle.)

There are other problems as well:

1. If we close from one location and then immediately open from another we can get into failures.
2. We have seen cases where we both called NotifyDirectoryChanged() from CbFsGetFileInfo() and also returned the new size, in which it did not fail to open, yet the subsequent open and read calls only read out according to the old size of the file.

I can reproduce #1 in using the mapper sample, as well as the IOException mentioned earlier, but not #2 (so far at least).

What is the current guidance for handling these scenarios? Should I open a ticket and send the code?
#33900
Posted: 07/06/2015 13:23:31
by Eugene Mayevski (EldoS Corp.)

I think the role of NotifyDirectoryChange() is misunderstood in your case.

This method is used to notify the OS about the change in the backend data which (change) was performed not via the virtual disk but by other means.

When you bring the shared network disks into equation, you make the things much more complicated and NotifyDirectoryChange is not a solution at all here.

Can you please describe or maybe even draw the architecture of your system, i.e. which of the systems shares the disk and where CBFS works? And also please describe where the changes are originated and where they must tracked.

It is vital that you represent as more detailed scheme as possible, then I'll be able to suggest you the proper solution. One thing I can say is that most likely one function call won't be enough.


Sincerely yours
Eugene Mayevski
#33902
Posted: 07/06/2015 15:30:28
by Todd Gleason (Standard support level)
Joined: 09/11/2014
Posts: 21

The attached diagram shows what the machine boundaries look like in the system. A given machine may be either a desktop machine or a server that supports multiple desktop clients. Each of them individually runs an instance of our File Mapper derivative. They each map to the same network share.

If you take out the File Mappers, you can see how the filesystem is used without it present.

The fundamental requirement is: make the system work the same as it did before introducing the File Mapper into the system.


#33903
Posted: 07/06/2015 15:35:32
by Todd Gleason (Standard support level)
Joined: 09/11/2014
Posts: 21

This diagram shows the concurrency behavior a bit. The curved red line shows the logical path of data through the system, back and forth between two machines, and the activities occurring on those machines.

The challenge is to make it so that when the file is re-read, it is fully intact and correct based on the changes made on the second machine, and it is extremely undesirable to fail the open request the first time after the second machine wrote the file.

In addition, doing this should not break the functionality that is provided by installing a file system watcher: The client should be able to install one and when files are added/changed/deleted, it should be able to see these, the same as when pointed at a physical path.


#33904
Posted: 07/06/2015 15:51:27
by Eugene Mayevski (EldoS Corp.)

Thank you for the diagram. As I suspected ...

There's no universal way for Machine 2 or Machine 3 to reliably track the changes made by Machine 1 on the network share without active participation of the network server itself or (only in case when the changes are made exclusively by your application) of your application.

If you can run a monitor on the network share server, then you can have a queue of changes (actually, this is what FileSystemWatcher does, but it tends to lose events and has other shortcomings) and your clients can ask a server about file change events reliably.

If you can't run such monitor but the changes are made only by your application, then you can broadcast the notifications somehow.

Finally, if you can't do any of these, you can create semaphore files (just files with special and/or unique names) which will signal from ClientX to ClientY that the change has been made.

NotifyDirectoryChange() can be used as a last step in notifying the OS about the changes in the remote filesystem, but before you employ NotifyDirectoryChange() , you need to properly detect the change. And this is, as described, quite complicated task by itself.


Sincerely yours
Eugene Mayevski
#33929
Posted: 07/07/2015 13:12:29
by Todd Gleason (Standard support level)
Joined: 09/11/2014
Posts: 21

I agree, there's no easy reliable, immediate way to detect changes in this environment. I have a solution in place that does both FileSystemWatcher and backup polling, but it's tricky. Still, something like this is needed to handle changes where there could be an external FileSystemWatcher watching a directory, so it seems necessary.

For the basic concurrency needs though, it is better if we can ignore this approach and design around the change detection snafu, and the only way I can think of is if we can do it "just in time". To me this means that calling NotifyDirectoryChanges() from CbFsGetFileInfo() should NOT result in an IOException on the client, but rather should make certain that the new data (such as file length) provided through CbFsGetFileInfo() is honored and that a subsequent open will open and read the entire file (assuming that's how the client coded it).
#33930
Posted: 07/07/2015 14:08:07
by Eugene Mayevski (EldoS Corp.)

Quote
toddgleason wrote:
To me this means that calling NotifyDirectoryChanges() from CbFsGetFileInfo() should NOT result in an IOException on the client, but rather should make certain that the new data (such as file length) provided through CbFsGetFileInfo() is honored and that a subsequent open will open and read the entire file (assuming that's how the client coded it).


To answer this particular part -- NotifyDirectoryChange when called from the callback / event handler will work asynchronously, postponing execution to after the callback. This is probably not what you expect and here we have a GIGO problem - you try to use it in the way it doesn't work, and so you get an error with unidentified cause.

In general there's usually one or two ways to do things right, then there are various hacks with different level of reliability. In your case it seems to be hacking, and it's hard to answer the question "how do I correctly use X in the code based on hacks and shortcomings". Maybe it is possible to get everything working in a hacky way, but the task is much more complicated than when everything is done "right".

So let's get back to understanding, what NotifyDirectoryChange does and what it does not. It does not (and most likely never will) refresh anything when you call it from the callback. The reason is that doing this would lead straight to deadlocks. Your next actions and changes should be based on realizing the above fact and reviewing your options in regards to CBFS and NotifyDirectoryChange in particular.

I don't know if the above helps you or not (hope it does, even if just a bit) but right now the things look like they need to be re-thought and changed.


Sincerely yours
Eugene Mayevski
#33952
Posted: 07/08/2015 11:27:49
by Todd Gleason (Standard support level)
Joined: 09/11/2014
Posts: 21

It sounds like what you are saying is that we shouldn't call NotifyDirectoryChange() from CbFsGetFileInfo(). CbFsGetFileInfo() however is the first relatively reliable opportunity for us to know that the file has changed.

So if we can't call NotifyDirectoryChange() from CbFsGetFileInfo(), then what you would hope is that CBFS would respect the new file size reported by CbFsGetFileInfo(). In our testing we have seen this to not always be the case. Is there a way of telling why we sometimes don't get CbFsReadFileNative() calls to the new end of file when the client code is requesting the entire file? Is there some condition under which CBFS would ignore the newly reported file size?
#33953
Posted: 07/08/2015 12:01:58
by Eugene Mayevski (EldoS Corp.)

Quote
toddgleason wrote:
It sounds like what you are saying is that we shouldn't call NotifyDirectoryChange() from CbFsGetFileInfo(). CbFsGetFileInfo() however is the first relatively reliable opportunity for us to know that the file has changed.


CbFsGetFileInfo just won't be called if the OS doesn't know beforehand that the file has been changed. You need to get to know somehow that the file has been changed, then you call NotifyDirectoryChange() and this call causes the OS to request file information.

Quote
toddgleason wrote:
Is there some condition under which CBFS would ignore the newly reported file size?


The file size is just not reported right / in correct time.

Suppose the OS knows the file is 1000 bytes. Some process X has opened the file knowing the file is 1000 bytes (the OS knows this and the process is supposedly aware of this fact too). Now some other process on the different computer has appended the file and made it grow to 1500 bytes. There is *no* way your process X would be able to pick the extra 500 bytes without reopening the file.
The reason is that if you somehow learn that file size has grown and call NotifyDirectoryChange() reporting the file change, the file handle that X has will be *invalidated* by the OS. And if you don't call NotifyDirectoryChange(), the OS won't know that the file has grown further.

In theory if the process X knows about possibility of the external change , it can request file size from the OS using SeekFile(0, FROM_END) and the filesystem would report the new file size then. However, such call must be explicitly coded into the process X.


Sincerely yours
Eugene Mayevski
#33958
Posted: 07/08/2015 15:04:40
by Todd Gleason (Standard support level)
Joined: 09/11/2014
Posts: 21

The scenario described is not what I'm doing. Let me repeat the original scenario, with some more details, and adjusted for how I'm simulating a multi-machine test using a single machine:

1. Client (Machine A) creates a file using a CBFS path. It writes 16K to the file, then closes the file.
2. Same client (simulating Machine B) immediately opens the same file but using the physical path to the file. Client appends 16K more to the file, for a total of 32K, and closes the file.
3. Same client (acting again in the role of Machine A) immediately reads the file back in using the CBFS path. It sees only 16KB.

In step #3, I look in our logs and I see a call to CbFsGetFileInfo(), which returns that the file is 32 KB. Right after this, CbFsOpenFile() is called, and then there are several calls to CbFsReadFileNative() which read out 16K only.

This behavior is not reproducible directly using the mapper sample, but the mapper sample has other problems; since CbFsCloseFile is implemented rather than CbFsCleanupFile(), you must wait as much as 2 seconds in order for the file to actually close in step 1 so that step 2 will succeed. This delay is enough that the reverse round trip succeeds (CBFS appears to honor the size given when this delay is present).

I experimented with delays in our system when switching between the CBFS path and the physical path of the file. With running 200 trials, it appears that sleeping for 100 ms - 900 ms will fail some of the time. Sleeping for 1-2 seconds always succeeded.

Then I tried moving these sleeps into the File Mapper. Unfortunately it wasn't entirely successful. Regardless whether it sleeps for 1-4 seconds inside the File Mapper, the client code still needs to sleep for at least 500 ms in order to reliably succeed.

This behavior suggests to me that there is some timing-dependent code inside CBFS itself, some of which occurs before CbFsGetFileInfo() is called.
Also by EldoS: Rethync
The cross-platform framework that simplifies synchronizing data between mobile and desktop applications and servers and cloud storages

Reply

Statistics

Topic viewed 7803 times

Number of guests: 1, registered members: 0, in total hidden: 0




|

Back to top

As of July 15, 2016 EldoS Corporation will operate as a division of /n software inc. For more information, please read the announcement.

Got it!