EldoS | Feel safer!

Software components for data protection, secure storage and transfer

SolFS Storage overhead

Also by EldoS: Rethync
The cross-platform framework that simplifies synchronizing data between mobile and desktop applications and servers and cloud storages
#1764
Posted: 12/01/2006 02:19:28
by Jure Bogataj (Priority Standard support level)
Joined: 05/23/2006
Posts: 35

Hello!

I have one question regarding SolFS overhead when storing files. I have stored one folder inside SolFS storage (inside I have 3729 directories and inside every directory one file; so I also have 3729 files inside SolFS storage). Size of those files was 93MB on NTFS, size on disk was 100MB and now SolFS storage takes 145MB of disk space. I have used PageSize=4096 and average file size stored inside SolFS is 25KB.
How come that SolFS storage takes 45% more space than files on disk before?
How can I minimize overhead in SolFS storage files?

Thank you!

Best regards,
Jure
#1766
Posted: 12/01/2006 05:40:41
by Alexander Plas (EldoS Corp.)

Hello

Some calculations for your storage: every directory take at least 2 pages and every file take 1 additional page for file information such as modification/access times etc. For your storage overhead should be 3729*3*4K = 44748K = 44M - it is near to your digits.

Ways to decrease overhead:
1) changing structure of files in storage to avoid directories with one file inside
2) decreasing page size
#1770
Posted: 12/01/2006 07:26:01
by Jure Bogataj (Priority Standard support level)
Joined: 05/23/2006
Posts: 35

One more question:

How does decreasing page size affect overall performance of SolFS storage? Does it show when writing or reading files from and to storage? Is it slower when using e.g. 512bytes/page than 4096bytes/page?

Thank you!

Best regards,
Jure
#1771
Posted: 12/01/2006 08:36:29
by Alexander Plas (EldoS Corp.)

For your storage structure I can suppose 20% slow donw after decreasing pages size from 4K to 512 bytes. You may prepare small test application to measure speed for different page sizes with you programing language, compiler, data access strategy etc.
#1780
Posted: 12/05/2006 06:57:30
by Jure Bogataj (Priority Standard support level)
Joined: 05/23/2006
Posts: 35

Hello!

I've done a little testing now and here are facts:
- I've created 133 SolFS storage files
- Total size (properties on FileSystem) is 43,20GB
- Total size of files inside SolFS storage file is 38,11GB
- calculating overhead is 5,09GB
- PageSize for all SolFS is 512 bytes
- number of folders inside all SolFS storages is 90.155
- number of all files inside is 1.269.499
- length of each directory name is 36 characters
- length of each filename is 21 characters (extension included)

By above calculation (1 page per folder and 2 pages per file; how can I measure number of pages for folder and files; based on which parameters?) the overhead should be approx. 1,25GB, but it is 5,09GB? How come? Does SolFS storage file itself needs a number of pages? How do transactions impact size of SolFS storage file (if at all)? The overhead in this case is 13,4% which is allot (we will have 1TB of storage filled so we can fill it only with approx. 886GB of data. The rest is for overhead.

Thanks you!

Best regards,
Jure
#1807
Posted: 12/08/2006 04:06:21
by Alexander Plas (EldoS Corp.)

Here is the small description of SolFS on disk format (full description will be available soon on the site).

1) Every page in SolFS storage have 32 byte header with internal page attributes and information.
2) Every PageSize / 2 page in storage used for internal purposes. For example in 512 bytes page storage every 256 page is used to store internal information.
3) Every file or directory takes at least one page for attributes, compression/encryption, user-defined tags and so on. This page is calling BasePage.
4) Not empty directory takes additional pages to store files information. File information is a 64 bytes structure where stored internal information and first 22 characters of file name. If file name is longer that 22 characters additional characters is storing on additional pages. For example 512 bytes page can store (512 - 32) / 64 = 7 files or directories.
5) If size of file data less that PageSize – 96 bytes it is storing on the BasePage. Otherwise additional pages allocating for file data. If file data takes more that 16 pages additional page is allocating to store numbers of data pages (4 bytes for each number). For example in 512 bytes page storage file with data size 83640 takes 83640 / (512 – 32) = 175 pages to store data and 175 * 4 / (512 – 32) = 2 pages to store index pages. Note: in calculations all results is rounded up because partially filled pages is also takes place.

I hope this description will help you to understand storage overheads on your data and calculate best PageSize value.
#1808
Posted: 12/08/2006 04:21:56
by Eugene Mayevski (EldoS Corp.)

BTW you should check efficiency of SolFS with so large sizes. While SolFS is suitable for holding the requested amounts of data, certain usage scenarios can affect such suitabilty. For example, if you write 2 files, each being 500 Gb, work with these 2 files will be *extremely* slow.


Sincerely yours
Eugene Mayevski
#1811
Posted: 12/08/2006 04:37:02
by Jure Bogataj (Priority Standard support level)
Joined: 05/23/2006
Posts: 35

First of all, thank you for this in-depth explanation. Now I will make my page size easier the right size. I just want to ask you one more question based on your last remark (storing 2x500GB files inside storage). Our average file size stored inside SolFS storage is somewhere between 15K and 100K (sometimes 1MB-10MB), but rarely. We are storing primary TIFF files and the reason for using SolFS is because then number of files is so high, we cannot backup all the files in reasonable time. So does this SolFS performance also apply to a lot of files with small sizes (as opposite to few files with large sizes)?

Thank you!

Best regards,
Jure
#1816
Posted: 12/08/2006 05:16:19
by Eugene Mayevski (EldoS Corp.)

It makes sense to arrange the files to subfolders (for example, create subfolders which denote filename's first two or three letters). Such structurizing will speed up file search and opening operation to certain extent. The problem with huge files is that the file is stored sequentially, ie. when you seek to the end of file, the whole page chain is read. For small files this is not a big problem. For files over 1 Gb this becomes a bit slow. But SolFS 3 speeds up this operation too.


Sincerely yours
Eugene Mayevski
#1817
Posted: 12/08/2006 07:32:59
by Jure Bogataj (Priority Standard support level)
Joined: 05/23/2006
Posts: 35

As I understand, SolFS 3 is already in release version, right?
Or are there any issues to resolve with SolFS3 before you will actually deploy it as release version?

Thank you!

Best regards,
Jure
Also by EldoS: SecureBlackbox
200+ components and classes for digital security, signing, encryption and secure networking.

Reply

Statistics

Topic viewed 12694 times

Number of guests: 2, registered members: 0, in total hidden: 0




|

Back to top

As of July 15, 2016 EldoS Corporation will operate as a division of /n software inc. For more information, please read the announcement.

Got it!