problems with library scanning, request more info in debug logs please

August 26, 2020

I've got loads of movies with nfo files which I'm trying to move into Emby. it mostly works but I have an issue with getting libraries to scan, which isn't a fault with Emby as such, but I'm struggling with it and wondering what can be done, and I have some suggestions that would probably really help me

all this stuff lives in cloud storage which mounts as a mapped network drive, I have a fast internet connection and it basically works okay - I have libraries that have scanned correctly, and the content of those libraries all plays correctly, etc. emby is happy to stream this content from the cloud and it's fast enough to work, and it saves me running a power hungry RAID array for all my movies

all this stuff has also previously correctly scanned and is working in Kodi, but I want the server elements of Emby for it (so I can watch from any device etc) which is why I'm moving it, and this is where I'm having a problem

the problem is that this cloud storage can be a bit flakey sometimes (transient "access denied" errors when accessing a file or folder being the main issue, which happen once but usually don't happen again if you retry, also sometimes opening a directory can take a long time which can lead to timeouts etc). none of this has caused a problem in Kodi, and I've managed to get the first couple of libraries to scan correctly into Emby, but now it's hitting a wall. if I add a library, it appears and it says it's scanning, but nothing appears in that library. if I manually refresh that library, it says it's scanning, and I see that it's started in the log, but then no further log entries relating to the scan appear at all, and the progress circle that normally appears on a library scan does not appear. I've got debug logging turned on, and I'll go back to check on it say an hour later, and there's literally nothing in the logs relating to that library scan beyond the couple of entries where I first started it. the thing is, Emby can browse to the location just fine (so it's not a permanent access error), and what usually happens is that after a few days or a week or so of nothing appearing, the overnight library scan will suddenly pick it up and all the contents suddenly appear (so there's no issues with file naming or anything either). I've managed to go from 3 libraries to 6 libraries that way - the last 3 all failed to scan for several days, and then suddenly they just filled up on the overnight scan a few days later.

I have a feeling that it's hitting a transient error when it tries to access that library, and it's just a matter of chance when that works or does not. but because no entries appear in the log to say what the scan is doing, it's impossible to know what's actually happened. generally these errors are more frequent if there's a lot of rapid file access requests, which would explain why the first few libraries worked okay and then as I've added more libraries, it's become more broken - if it's doing a full scan, then it's already done thousands of file requests by the time it reaches the new library. but this doesn't explain why individually refreshing the library doesn't do anything.

it would be great if the debug logs could contain a lot more information for library scanning, and detail every step of what the scanner is actually doing - so log entries for every folder or file located, and how that folder or file is being treated (i.e is a file being opened or is it being ignored because it hasn't changed since the last scan, is this a folder, are we scanning inside that folder, etc). as this would all be debug information it would normally be turned off anyway. it seems a bit odd that we can have "debug logging" turned on, but then get absolutely no information as to what the scanner is doing.

it would also be really helpful if the scanner could be more robust to network problems - other software I'm using with this cloud storage is configured to retry network errors 3 times with a 3 second delay for example, which robustly deals with any transient problems - but it's also possible that it's timing out on something that just needs a bit more time. but then Kodi isn't configured (as far as I know) to do any of this, and it all works first time in Kodi. of course without log entries to say what it's doing, this is all speculation!

I realise this is a really niche issue that isn't even a fault in Emby, and I'm not really expecting much to come from this request, but it would be super great if I could just scan this stuff into Emby and it just worked first time same as in Kodi!

August 27, 2020

Hi, we don't actually swallow errors without logging them, so i might actually need to see a specific example along with a log file so that I can see where the logging is falling short.

What might be happening is that something is timing out in the. Net core file system methods and short of writing our own file system layer there might not be much we can do about that. Thanks.

September 29, 2020

I tried adding some more libraries, in order to show a complete log of this, and of course they all worked first time

the problems I'm getting are in home photo/video collections, so it's just nested subfolders with various content, that all look a bit like this :

Holidays 2020
Holidays 2020\England
Holidays 2020\England\London
Holidays 2020\England\London\Photos
Holidays 2020\England\London\Photos\DCIM0001.jpg
Holidays 2020\England\London\Videos
Holidays 2020\England\London\Videos\video1234.mp4
Holidays 2020\England\Cornwall
Holidays 2020\Scotland

interestingly, I don't see problems on movie, tv, music libraries

the problem is that sometimes, adding a library will result in a completely empty library, and it will take many refreshes before the content is found and added, despite the fact that I can browse these folders from Emby (when adding a library) without any problems

what makes it more difficult is :

- the scanner does not report what folders it is looking in, or what it finds, even when on "debug" level. when I refresh a library, it doesn't say "scanning Holidays 2020" and then later "0 folders and 0 files found" and finally "scan completed" for example. it just doesn't report anything at all, unless it makes changes or encounters an exception. this isn't what I would call "debug" level of information. a debug log is supposed to basically tell you everything. but the only info I get in the log that shows anything beyond making a change, or encountering an exception, is the one line "deserialising" the .xml file for the library. that's it. nothing else appears that tells me what the scanner is actually doing, what it's seeing, if it's choosing to ignore any folders, etc, which in my mind is something a debug-level log should tell you. instead, I can refresh the log during a lengthy scan and see consecutive entries an hour or more apart, entries that appear where something has changed and needs updating, but with absolutely nothing in between. obviously the scanner is busy doing something in between those times - busy going through and checking each item - but it doesn't report this. if I'm looking at a debug log, I shouldn't see such long periods of time with busy activity that isn't being logged. if it looks in a folder, it should report what it finds. if it then chooses to ignore something, it should report that. for example, if the scanner encounters a ".nomedia" file, or a folder named with a "." at the start - the scanner makes a decision to ignore these, yet this decision isn't reported in a debug log. is it choosing to ignore or bypass any other folders or files, based on naming? we have no way of knowing. one thing I do know is that I had a library that did not work (showed no contents despite many refreshes) for over a month, until I renamed the root folder and removed a 2-digit number from the start of the name - after that, the library filled with content. maybe a coincidence, or maybe the scanner didn't like the name. I have no way of knowing, because the log doesn't tell me anything, and it was only by pure guesswork that I tried renaming it and found that to work.

- when scanning individual libraries (by using the menu on one library), the only feedback we get is a pop up "scanning library files", and then maybe, some time later, we might get the green circle indicator at some point. but it takes a long time for this positive feedback, the green circle that shows the scanner is doing something, to appear, if it appears at all. sometimes, the green circle can appear then disappear then come back again. there is no reliable way to actually know if the library scan is in progress or not - we don't get any indicator that it has stopped or failed, and nothing appears in the log either. I sit there looking at it wondering "is this still scanning, or do I need to try again?". obviously I don't want to keep restarting the scanner if it is working, but I have no way of knowing. it would help to have the green circle appear at 0% as soon as you click "scan", and to remain visible for the entire time the scanner is operational, in the same way that the progress bar that appears on the dashboard for a full library scan does. in fact - it might be a good idea to put individual library scans on the dashboard as well, with progress bars just like the full library scan. that way you can always see if the scanner is busy. it would also be helpful to have log entries that actually indicate the scan process is starting, and when it is ending.

- a full library scan takes me about 18-20 hours. it would be really, really good, if we could add a new library without forcing a whole scan on everything. I currently have the full scan scheduled to happen once a week anyway, and I have watchers on the stuff that updates frequently. what I would really like is an option, when adding a new library, to only scan that library and not everything, as it's not much fun waiting a whole day to see the content appear

- finding scan-relevant log entries on a busy server can be difficult - log entries for *everything* are all going into the same log, and it seems that page refreshes and the like are very verbose in comparison to library scans. it might be useful to consider moving the scan log to a different log file entirely, like is already the case with the hardware detect log, transcoding log, that way library scans don't get in the way of debugging other problems, and page refreshes and the like don't get in the way of debugging scans

Edited September 29, 2020 by Statick

September 29, 2020

Hi,

Can you do another scan and provide the embyserver.log for us to take a look at. Also let us know the name of the problem library.

Thanks

September 29, 2020

the one that was a problem was called "19 holidays" I renamed it to "holidays 2019" and that seemed to fix it, although it could also have simply been a complete coincidence

I don't currently have a library that isn't working, they are all working now - but this has been an ongoing issue and I'm sure it won't be long before I add another and see the problem again. when I do I'll return with a log, although having watched this happen to multiple libraries in the past I can promise you that the only entry that appears is the "deserialising options.xml" line, and then that's it, I just have to wait until I think a suitable amount of time has passed and then refresh the library again.

I've also had it happen where one or more of the subfolders inside the root folder will show up as empty, which seems to be the same problem (the scanner simply ignoring or not seeing the contents of a folder, but with no indicator as to why), and again after multiple refreshes the content will suddenly all appear

(edit: I know we get a log entry when the full library scan is completed - what we don't get is a similar entry when an individual library scan is completed)

Edited September 29, 2020 by Statick

September 30, 2020

okay so as a demonstration of what I'm talking about, I've clicked "scan library files" on an existing library, one that is already populated and working

this library has hundreds of sub folders and photos inside. nothing has changed, so I'm not expecting much to happen, but :

I started scanning it over an hour ago, and the only thing in the log that relates to this is the "deserialised options.xml" line. I've left it for about 80 minutes and come back to it.

there's still no green circle on the library page to indicate progress on the library scan, and there's no further entries in the log. literally nothing. I refreshed the log, and got 2 more lines generated by refreshing the log. I have absolutely no way of knowing what is happening here. is it scanning? has it completed? what is it doing?

I decided to instead monitor network file access activity, and sure enough - I can see access taking place to every file and folder in this library, one by one. the scanner is doing its job, recursing through everything, but there is literally nothing in Emby to indicate this, there is no feedback on the dashboard, there is nothing in the log, there is not even anything in the debug log. nothing appears under "activity" or "alerts". like there are so many places this information could be getting shown, yet there is absolutely no indication anywhere that the scanner is scanning.

please, I beg you, do something about this. at least give us a working progress indicator on the dashboard or library page, that appears when you start scanning, and remains there until the scan is completed, just like with the full scan. please. the green circle would be enough, if it actually appeared when you clicked scan, and stayed there until completion. entries appearing under "activity" or "alerts" would be nice as well - "scan of XYZ library started", "scan of XYZ library completed". just, something, *anything*, please!

Edited September 30, 2020 by Statick

October 3, 2020

We'll look at improving it. Thanks for the feedback.

October 7, 2020

okay I've got more info

I added another library and it didn't fill with content, and after repeatedly re-scanning it over several days, nothing changed, and as before, nothing in the logs. after some time I realised that there was a corrupt video file in the target, it was a file I knew about (and had marked "CORRUPT - TRY TO FIX", it contains data but cannot be opened by any video software) but had forgotten was there, I removed this file and upon the next scan the library completely filled with content, several thousand images and videos in multiple subfolders

it would appear that your earlier point "we don't swallow any errors" isn't true for the entire scanning process - as apparently this file caused an error that caused the entire scan process to fail without logging anything at all. ideally this error shouldn't cause the process to fail, and it should be logged as well

as far as I can see, there are mulitple stages to the scan process, the very first stage iterates through the folders and files on the target, looking for new files. during this stage, there is no progress indicator or any indication at all that a scan is taking place (no green circle on the library), and it seems that certain errors encountered during this stage can cause the thread to crash, and nothing will be logged, and no data will be added to the database. the combination of no activity indicator and no logging means no way of knowing if it's still actively scanning or if it has failed.

only once this stage is completed does anything get added to the database, then the next stage iterates through the database to check if any items need refreshing, during this stage is when the green circle appears, and it's during this stage that more complete information becomes available, online scrapers are used, etc. I have seen errors in the log that take place during this second stage of the scan, and I've seen entries relating to new items being scraped etc, tbh the log during this stage is pretty verbose. but I have never seen any logging at all during the first part of the scan where it simply iterates through files and folders

what would be ideal as a solution to all of this would be :

- some visual indication of activity while files/folders are being searched, even if it's just the green circle sitting at 0% (or without a % at all, obviously there's no way of knowing how many files/folders there will be, which is probably why there's no indicator at the moment). it might seem trivial if you're scanning on a local hard drive and it takes ~10 seconds, but if you're scanning 1000s of folders over a slow network and it takes an hour or more, it's important to know that it's actually doing something, or if it's stopped for any reason

- better error handling during this first stage, so errors with opening files or folders, in particular with inspecting a media file (as this appears to be the cause of the problem I was having) are caught and logged and do not cause the scan thread to crash

- more comprehensive debug logging of activity during this first stage generally, so for example when a subfolder is being scanned, tell us, and when that subfolder has been scanned then tell us how many files/folders were found inside for example. this doesn't need to appear in general logs but the debug log should tell us what it's doing.

- ensure that data found during the first stage of the scan makes it to the database no matter what, even if it's incomplete data because the scan crashed half way through, as there's always possibilities for new code in the future to result in unhandled exceptions. so for example, a separate database insert for each item as it's found might well take a little longer, but will be a lot more resilient than saving everything until the end and doing one big database insert at the end, as even if the thread crashed the data found so far will have been added to the database

October 7, 2020

@Statick Tell me you still have the damage file so we can use it for testing?

October 7, 2020

ahh sorry, I finally figured out what was wrong with the file, it was 500mb of zeros with a .mp4 ending, so I realised there was no fixing it and deleted it. so I don't have the file but you can probably recreate one.

Edited October 7, 2020 by Statick

Sign In

problems with library scanning, request more info in debug logs please

Recommended Posts

Statick 15

Luke 42208

Statick 15

Carlo 4561

Statick 15

Statick 15

Luke 42208

Statick 15

Carlo 4561

Statick 15

Create an account or sign in to comment

Create an account

Sign in

Activity