Jump to content

Auto Organize Moves Files Mid-Transfer in Linux


ryan.brownell
 Share

Recommended Posts

ryan.brownell

I have Emby running on a Synology NAS.

I have a folder that Auto Organize watches which is shared on the network via SMB. Often (but not always), when I am moving a file into that network share from another computer, Auto Organize grabs the file and moves it before it is done transferring. This results in partial/corrupt video files making it into my library.

I believe this is occurring because Linux generally does not lock files while they are in use. I never had this issue when I was doing this on my Windows desktop computer.

I tried looking for a setting that allows me to specify "how old a file is before organizing" or "only organize during certain times of the day" in hopes that I can use settings of this nature to reduce the occurrence but I can't seem to find anything that would help with this.

Any assistance would be greatly appreciated!

Thanks!

-Ryan

Link to comment
Share on other sites

Hi, instead of using SMB paths, what you could try is mounting those network paths to a local path in the Synology, and then use the local path within Emby. That may give you better results.

Link to comment
Share on other sites

ryan.brownell

Hello Luke;

Thanks for your reply.

Sorry I wasn't clear enough in my original explanation. I think that is how my current system is already set up. The folder that Auto Organize runs on, is a folder that is local to the Synology NAS. This folder is shared with other systems via SMB. The system that is transferring the data to the Auto Organize folder is also Linux.

Or is it the reverse that you're suggesting? I don't think Synology offers the option to mount network shares.

Thanks again!

-Ryan

Link to comment
Share on other sites

Hmm, I don't think auto-organize has this option. This may not be a perfect solution, but you could look at adjusting the scheduled task of the organization to a time when you know there won't be active transfers.

@chef may have other ideas.

Link to comment
Share on other sites

chef

In the expanded version of auto organize, it checks to see if the file is in use prior to trying to move or copy.

It will check the file size, then wait and check again. Any increase in the file size, and auto organize leaves the file alone until the next time the task runs. 

I had noticed some files where in a state of transition when being accessed by auto organize. 

This thread here.

For give give me if I'm absent for a couple days. I'm recovering from that famous nasty virus currently.

 

 

Link to comment
Share on other sites

ryan.brownell

Thank you both for your help!

@chef What is the delay between file size checks? I tested this further by triggered the plugin to organize manually while I was copying a large series into the folder and it didn't seem to wait and it did indeed cut off a video or two during this process. Is it possible to configure the delay between file size checks? When it retries later, does it perform this same check again or does it go straight to moving files?

My Synology NAS is using the BTRFS file system and, in using it, I think I've encountered delays in the file system's ability to report the size of a file. That is, I start a copying process, I checked the size of the file mid transfer, check again within seconds after that, and found the reported size did not change.

I did a little research into this and found a whitepaper that states "Btrfs, by default, delays writes for 30 seconds to increase disk efficiency, and metadata and data are in the same delay queue." (https://www.usenix.org/system/files/conference/atc13/atc13-ren.pdf). If the time between the check and re-check of the file size is less than 30 seconds, this may explain this problem.

@Luke Thanks for the suggestion adjusting the scheduled tasks. I reduced my auto organize to be hourly instead of every few...seconds(?). While I don't think this solution resolves the core problem, it will greatly reduce the probability of it happening.

  • Thanks 1
Link to comment
Share on other sites

chef
36 minutes ago, ryan.brownell said:

Thank you both for your help!

@chef What is the delay between file size checks? I tested this further by triggered the plugin to organize manually while I was copying a large series into the folder and it didn't seem to wait and it did indeed cut off a video or two during this process. Is it possible to configure the delay between file size checks? When it retries later, does it perform this same check again or does it go straight to moving files?

My Synology NAS is using the BTRFS file system and, in using it, I think I've encountered delays in the file system's ability to report the size of a file. That is, I start a copying process, I checked the size of the file mid transfer, check again within seconds after that, and found the reported size did not change.

I did a little research into this and found a whitepaper that states "Btrfs, by default, delays writes for 30 seconds to increase disk efficiency, and metadata and data are in the same delay queue." (https://www.usenix.org/system/files/conference/atc13/atc13-ren.pdf). If the time between the check and re-check of the file size is less than 30 seconds, this may explain this problem.

@Luke Thanks for the suggestion adjusting the scheduled tasks. I reduced my auto organize to be hourly instead of every few...seconds(?). While I don't think this solution resolves the core problem, it will greatly reduce the probability of it happening.

 

I have noticed a couple things happening with the size checks as well. If the file is being unzipped into the monitored folder, as opposed to the file being copied into the monitored folder.

File size is being reported to the database initially, but then doesn't update the next time the file is read in the monitored folder (prior to sorting, if it is being decompressed).

If that makes sense...

It is the same code handling the size checks,  that needs to be looked at.

I think the best thing to do is have the first scan add the file data to the database, and then the next time the task runs, it should look at the file size again to decide if it has gotten bigger, or is the same... Then sort the file.

That way,  the scan can run more often... More like a monitor... 

I think this is a good idea.

Edited by chef
Link to comment
Share on other sites

  • 2 weeks later...
Dusan78

Hello,

Was this ever resolved.  I have exactly the same issue with NAS and btrfs.  When copying files from my PC I first have to copy the file to another directory on NAS and then move it to the monitored folder so that it wouldn't grab it mid-copy. 

Link to comment
Share on other sites

ryan.brownell

Hello;

Yes, this is still an issue. On my setup, I've mitigated this problem reducing the likelihood that it could happen by reducing the frequency at which Auto Organize runs.

To do this:

  1. Go to Manage Emby Server ➡️ Scheduled Tasks ➡️ Organize new media files
  2. Remove the only Task Trigger in there
  3. Create a new Task Trigger with the following options
    • Trigger Type: On an interval
    • Every1 Hour

To be clear, this is not a real solution. It will not always work and will not prevent the problem from happening. It merely reduces the probability the problem will occur by reducing the number of times the Auto organize plugin does its work. I definitely believe that a more proper solution should be developed to resolve this issue.

Edited by ryan.brownell
Link to comment
Share on other sites

chef

I'll take a look at this tonight, and try to stop the process from acting on the source file if it is being copied into the monitored folder. 

  • Thanks 1
Link to comment
Share on other sites

Dusan78

Thank you that would be great.  I am sure there is way to just check if the file is currently in use? 

Currently, I wrote an app that after moving (or converting) files from pc to NAS then it just executes a move into the Monitored AutoOrganize folder.

  • Like 1
Link to comment
Share on other sites

chef

It is very strange. I have this check in the code:

        public static bool IsCopying(string source, FileOrganizationResult dbResult, IFileSystem fileSystem)
        {
            try
            {
                var sourceFile = fileSystem.GetFileInfo(source);
                if(dbResult.FileSize == sourceFile.Length) return false;

            } 
            catch (Exception) //The file is in use. Probabaly IOException is best, but for now, catch it all.
            {
                return true;
            }
            return true;
        }

 

This just checks that the current dbResult file size is the same as the size of the file that is moving.

We do this check prior to attempting to sort the file.

Just want to make sure, is everyone using the Extended Functionality Version found here?

https://emby.media/community/index.php?/topic/88730-auto-organize-expanding-its-functionality/page/13/

 

if not please delete the fileorganzation.db before loading side loading the plugin.

 

Let me know if the issue was actually resolved. I'm going to see if there is a better place to do a file size check.

 

 

EDIT:

Yes, upon further investigation.

I see this when an item is in the process of being moved into the monitored folder.

auto-organize_fileinuse.thumb.png.cfdfd20d17592231db2798e6b3c28618.png

 

Edit 2:

But, I do see that the file size reported in the plugin is incorrect...

auto-organize_fileinuse_2.thumb.png.434208ae5a1cd76cf2d4120cccc62271.png

 

The question is... did it move to early, or perhaps the database is not updated with the proper size value...??

 

BRB...

Edited by chef
Link to comment
Share on other sites

chef

I was able to correct the file size update, so the UI will show the proper file size if the item was read prior to it's completed move into the source folder.

However, I still want to make sure that accessing the file is done properly, and not to early.

 

My server is in use, so it will have to wait until tonight to test it out. 

Edited by chef
Link to comment
Share on other sites

chef

Ha! Yeah! Got it!

auto-organize_fileinuse_6.thumb.png.c5e53dfdd2bdff8bf58acab4a4351331.png

Best thing is, we don't even try to process the file while it is in use. 

 

And the moment it becomes available in the file system:

auto-organize_fileinuse_7.thumb.png.1bfc4cdf5e08e848d0b4d105e9904ffb.png

 

I'll update the Expanding Functionality thread. :)

Edited by chef
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

ryan.brownell

@chef Thank you so much for this!

I do have a few concerns though...

  • Linux behaves differently than Windows when it comes to file locking. Is it possible that this fix will only work in Windows?
  • BTRFS, by default, delays writes for 30 seconds to increase disk efficiency, and metadata and data are in the same delay queue. (See comments and reference above) Using Task.Delay() for only 2 seconds likely won't be enough time to determine if a file is available on the BTRFS file system. Is it possible to implement some sort of configuration where users could set the value provided to Task.Delay()?
Edited by ryan.brownell
Link to comment
Share on other sites

chef
30 minutes ago, ryan.brownell said:

@chef Thank you so much for this!

I do have a few concerns though...

  • Linux behaves differently than Windows when it comes to file locking. Is it possible that this fix will only work in Windows?
  • BTRFS, by default, delays writes for 30 seconds to increase disk efficiency, and metadata and data are in the same delay queue. (See comments and reference above) Using Task.Delay() for only 2 seconds likely won't be enough time to determine if a file is available on the BTRFS file system. Is it possible to implement some sort of configuration where users could set the value provided to Task.Delay()?

A config option could be added.

I'm just wondering if there is another way to handle this then.

Waiting 30 to access a file which might be accessible already seems a bit problematic.

 

I'm going back to the drawing board.

I've got a couple tricks up my sleeve still. 😉😆

Library monitor checks for a lock.

I'm going to try to use that.

Link to comment
Share on other sites

chef

Ryan, does the btfs show any kind of file size increment at all during those 30 seconds?

Or, is the file size remain 0. Because we can check if it is 0, and if it is still 0, then consider it in a "InUse" state as well.

Edited by chef
Link to comment
Share on other sites

ryan.brownell
26 minutes ago, chef said:

Ryan, does the btfs show any kind of file size increment at all during those 30 seconds?

I don't think so... Speaking from personal experience, I remember doing several large file transfers and checking the progress by looking at the file size, and then looking at it again in rapid succession to discover that, somehow, the file size appeared the same as the first time I checked it. (The measurements were in bytes, so this was not related to rounding into larger units.)

I agree that measuring the file, waiting 30 seconds, and then comparing that value against a fresh reading is problematic, as it would really slow down the process. It would result in the organization of large series, or several large series being very very slow. With it being configurable, or specific to certain file systems, only the users who it is relevant to would experience this performance impact. The only other option I can think of would be to thread the process for each file that was being checked. Granted, I am a lowly PHP web developer and don't often have to thread things, so my experience there is a bit rusty.

Honestly, I would take the performance impact over the problems that a partial-video file creates. These interrupted file transfers result in a playable file that can't be fast forwarded or rewound, and usually ends pre-maturely (Often just before the good part! 😅).

It is my understanding that Linux generally does not lock files... This is why the command rm -rf / is so dangerous, particularly if you are the root user. With this command the file system will try to delete everything without prejudice, usually deleting things that should never have been deleted because they are required for the system to operate.

Edited by ryan.brownell
  • Thanks 1
Link to comment
Share on other sites

Dusan78

Oh my! This is absolutely terrible news.  I just confirmed that btrfs right away labels file at full size of the original file while it's copying.  I specifically remember numerous times when the file was moved prematurely and then was finished copying so that there were actually two files.  I sometimes imported the newer(2nd) file over the first but, I checked the first one and it seemed to play ok so I sometimes just delete the 2nd file from the monitored folder.  I always thought that's btrfs very smart and robust file system that keeps the files intact. 

Now from your message I guess there could be tons of files which only start to play ok but, are in reality corrupt?  Is there a way to check this?  I chose btrfs specifically to avoid this type of situation as it was say to be the best for file integrity.

However, as of while back I always first copy or convert files to parent directory and then just move the folder.  I've incorporated it into the app I made for automatically running CLI handbrake.

So yea, sadly size isn't 0 it shows as full size from the start.  I guess it pre-allocates the space which again, makes me believe it might hold that whole area in memory and correctly moves the whole file when finished.  I'm a little freaked out now about having tons of corrupted tv show episodes (thankfully I never used it for movies) so I'm going to do some tests and watch the whole mid-moved episode.

CHeers!

Link to comment
Share on other sites

Dusan78

Ok, it seems that my initial impressions were correct about the file system.

I just tested moving the file and as soon as I initiated the move and auto organize detected the file I ran "organize" and the file was moved instantly and the monitored folder was empty even thought the file did not finish moving.  Also, widows wouldn't delete the original file unless it had confirmation that file was fully copied.  

So it seems that btrfs is pretty great as I originally thought.  I remember when I had this situation before I thought it had screwed up and got the file too early but, it knew what it was doing this whole time! lol

So there isn't really anything to fix, at least not for me.  But, thank you for the hard work! :)

 

Link to comment
Share on other sites

ryan.brownell

@Dusan78

In your particular scenario, do the files already exist on the drive in their entirety? (I.e. No data is being written--the file doesn't need to be unzipped, is not being copied from the same drive to a different folder, or is not being copied/moved from a different drive or network share.)

If the files already exist on the drive moving them to a different folder is just literally updating the allocation table which takes a fraction of a second.

The problem described here is specific to when the files data is being actively written onto the drive. Testing this out by moving the same file around the same drive would not be an accurate test.

What's the setup that brought you to this thread originally?

Edited by ryan.brownell
Link to comment
Share on other sites

chef

Would you guys like to try this:

Emby.AutoOrganize.zip

 

If you are using the version from the catalog, you'll have to delete "programdata/data/fileorganization.db" before you restart the server.

Also, clear browsing data so it will load the new javascript.

 

Let me know if this is able to correct the sorting issue.  :)

This covers both an IOException that would get thrown when trying to access a file that is in use, as well as checking the size of the file within a moderate time frame.

 

Edited by chef
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Dusan78
2 hours ago, ryan.brownell said:

@Dusan78

In your particular scenario, do the files already exist on the drive in their entirety? (I.e. No data is being written--the file doesn't need to be unzipped, is not being copied from the same drive to a different folder, or is not being copied/moved from a different drive or network share.)

If the files already exist on the drive moving them to a different folder is just literally updating the allocation table which takes a fraction of a second.

The problem described here is specific to when the files data is being actively written onto the drive. Testing this out by moving the same file around the same drive would not be an accurate test.

What's the setup that brought you to this thread originally?

Hi Ryan,  Yes, of course I am moving the files over the WIFI from PC to NAS.  

If the file is whole and just being copied/moved btrfs reports the full size right away as the size of file.  However, if I run handbrake from PC and output on NAS and the file is open and size undetermined, then the btrfs is definitely showing the growing file size as the file is created.  What's interesting is that while this is happening I moved and renamed the file on NAS and handbrake continued to work and complete the file and as MKV or MP4 file was able to play no problem and was whole.  However, doing the same and choosing "Web Optimized" MP4 which is suppose to change and optimise the file after completion, it seems to corrupt the file as there seems to be a closing(completing created file) and then opening it again to optimise.

So for the purpose of Auto Organize you can definitely do a file increment check and it will work...as for just moving or copying the file, you don't have to worry about this, the file will be fine.  

This is why I came here, because I thought it's taking my file before it completes copying but, that's not an issue at all with btrfs.  

 

Summary,

After starting copy/move in windows explorer you can move, rename file without worry on Linux btrfs NAS.

You can even run handbrake conversion from PC to output to NAS btrfs and then move/rename file without worry as long as you don't have the "Web Optimized" feature selected.  

These were my findings. :)

Link to comment
Share on other sites

ryan.brownell

@Dusan78 This is not my experience.

There are a large number of factors that could be in play here. BTRFS's default is that it updates the metadata for files every 30 seconds. Though this is just a default and could differ on different NASs and OSes.

I find it tremendously difficult to comprehend how moving a file onto the NAS via a network share would result in BTRFS immediately reporting the full file size before the transfer is complete. If this was the case, this would mean this is a problem without a solution.

Have you tested it on larger files that take several minutes to copy?

When transferring the file, how are you checking its size? Through Windows or in the NAS UI itself?

I have a Linux Mint machine that downloads files, and then moves them via a SMB network share to the NAS when the download is complete. For me BTRFS does not report the full file size until the transfer is complete. This is particularly problematic if the file is larger and takes longer than 1 minute to copy over.

Edited by ryan.brownell
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...