Loke 4 Posted August 31, 2024 Posted August 31, 2024 (edited) Version 4.8.8.0 Anime is a PITA due to the various naming conventions. One issue I have recently seen is that emby can get confused by the CRC32 in the filename and parse it as S[x]E[y]. As you can see on the lower left corner the file is stored with a path matching <Series>/<Season X> I assume that renaming the file to [SubsPlease] Karasu wa Aruji wo Erabanai - S01E09 (480p) [4E28D06D].mkv might solve the issue but I can't do that. Is there something else I could do (excluding manually modifying the metadata per episode)? Would it be possible to ignore the CRC32 field when trying to figure out the episode seeing as it is always 8 alphanumeric characters? In addition how does the episode naming compare to the season directory naming? Is the naming of the dir, in this case season 1, ignored due to the filename? I would expect that the directory name would take priority. Thanks Edited August 31, 2024 by Loke
Happy2Play 9780 Posted August 31, 2024 Posted August 31, 2024 Yes digit number can throw the parser off as to a point they are for absolute numbering. But with parser changes no bracketed info is ignored. But dev will have to comment as it is listed as a supported naming scheme but think additional conditions have to be applied to it. 02 Episode Name.ext But there are a couple of topics on this.
Loke 4 Posted August 31, 2024 Author Posted August 31, 2024 Since the ordering is set to "Aired" shouldn't it not try to use absolute numbering? It is listed indeed as a supported scheme, another case where anime is just annoying for media managers. Could you share keywords to find some of the other topics? My short search did not provide much so I must be using wrong keywords
Happy2Play 9780 Posted August 31, 2024 Posted August 31, 2024 Here you go but not neccessarily specific to digit naming as the parser looks at entire naming scheme. Personally think all bracket conditions should be completely ignore. As to me more and more info is trying to be added to filename.
pwhodges 2012 Posted August 31, 2024 Posted August 31, 2024 For anime the CRC (when present) is universally 8 hex digits in square brackets precisely at the end of the name; not hard to recognise and separate from any other information in brackets you may want to handle. I went through this a fair while ago with Luke in another thread, and had the impression he got it, but apparently not... Mind you, my personal solution is that I sanitise all file names, and so only have problems in the recognition area that are particularly pathological - like the show named "3x3 eyes" (which was easily resolved by using the Japanese name instead), or films with the same name and year (easy again, just add the TMDB id to the name). Paul
Loke 4 Posted September 2, 2024 Author Posted September 2, 2024 Parsing can be tricky, there is always going to be some exception to the rule. But for anime indeed CRC32 is almost always surrounded by square brackets. Not so often it is surrounded with parenthesis. In both cases it's fairly safe to exclude it from the filename when parsing. Lets see what Luke has to say.
ebr 16184 Posted September 2, 2024 Posted September 2, 2024 5 hours ago, Loke said: In both cases it's fairly safe to exclude it from the filename when parsing. How do we know that? How do we know some people don't have their S/E numbers inside these same "containers"?
Loke 4 Posted September 2, 2024 Author Posted September 2, 2024 29 minutes ago, ebr said: How do we know that? How do we know some people don't have their S/E numbers inside these same "containers"? Inside brackets with 8 alphanumeric characters? like "My series Woah Title [S04E0608].mkv ? We don't, that's why I opened with "anime is a PITA". But you reckon that the above is a likely scenario? Then how do you propose to correct the parsing of Anime episodes that mostly come with CRC32 in their filenames? To be honest I wouldn't mind if there was an option to mark a library as Anime where we can enforce some assumptions...
pwhodges 2012 Posted September 2, 2024 Posted September 2, 2024 1 hour ago, ebr said: How do we know some people don't have their S/E numbers inside these same "containers"? Probability. A considerable proportion of anime has the field; I have never anywhere seen S/E numbers presented like that - in any case S/E numbers are not typically eight hex digits. Paul
ebr 16184 Posted September 2, 2024 Posted September 2, 2024 1 hour ago, pwhodges said: in any case S/E numbers are not typically eight hex digits. Right but determining that will require examining every single file name in the system for that exact pattern - adding time (very small on an individual file basis but there are a lot of files and all of this adds up). When one of the biggest complaints is length of initial scan, we have to weigh things like this very carefully. So, something like excluding things in brackets would be quicker (although still another test) but then we are much more likely to run into someone who has put their S/E numbers in brackets for some reason. And then there is why does someone need a CRC check in their personal rips of episodes....?
Loke 4 Posted September 2, 2024 Author Posted September 2, 2024 18 minutes ago, ebr said: Right but determining that will require examining every single file name in the system for that exact pattern - adding time (very small on an individual file basis but there are a lot of files and all of this adds up). When one of the biggest complaints is length of initial scan, we have to weigh things like this very carefully. So, something like excluding things in brackets would be quicker (although still another test) but then we are much more likely to run into someone who has put their S/E numbers in brackets for some reason. And then there is why does someone need a CRC check in their personal rips of episodes....? Because that is how the main groups release them and modifying it affects for example seeding back and requires some post download modification. As pwhodges said the probability of someone using S/E format that spans exactly 8 characters inside [] or () is low. I would add the probability of emby parsing the CRC32 as episode is higher if you are downloading from example SubsPlease which afaik is the main release group for Anime nowdays and they add CRC32 on their releases. As for the performance hit, I guess it depends on the lang and if there is a func to check if there are brackets with any content and discard it without using regex. Anyway not something I can test Another issue with dropping things in brackets might be what happens if there are multiple square brackets will you drop all of them?
Happy2Play 9780 Posted September 4, 2024 Posted September 4, 2024 10 minutes ago, Luke said: Hi, what was the original file name? Image shows Karasu wa Aruji wo Erabanai/Season 1/[SubsPlease] Karasu wa Aruji wo Erabanai - 09 (480p) [4E28D06D].mkv
pwhodges 2012 Posted September 4, 2024 Posted September 4, 2024 Just to be clear, common anime naming may also use square brackets for other file properties, as in: [Erai-raws] Murai no Koi - 01 [480p][Multiple Subtitle][6F878216].mkv Or it may not use brackets at all - there is no consistency! Murai.In.Love.S01E01.Episode.1.1080p.HULU.WEB-DL.JPN.AAC2.0.H.265.10bit.ESub-ToonsHub.mkv In the worst cases, everything - even the name and season and episode numbers - is in square brackets This example has the name in two languages, and the season and episode numbers separated (it's s02e14, not yet listed in TVDB in this instance). I do not expect Emby to handle junk like this, which is why I have a policy of sanitising all file names: [Billion Meta Lab][风灵玉秀S2 特别篇][Soulmate Adventure Special Lower][14][1080p][HEVC 10bit][CN][END].mkv Paul
ebr 16184 Posted September 4, 2024 Posted September 4, 2024 On 9/2/2024 at 2:57 PM, Loke said: Because that is how the main groups release them and modifying it affects for example seeding back and requires some post download modification. Like I said... On 9/2/2024 at 2:22 PM, ebr said: And then there is why does someone need a CRC check in their personal rips of episodes....?
pwhodges 2012 Posted September 4, 2024 Posted September 4, 2024 On 02/09/2024 at 19:22, ebr said: And then there is why does someone need a CRC check in their personal rips of episodes....? They don't; but that's not the issue. The files are distributed like that, whether we want a CRC or not, and so we need to handle them one way or another (either by recognising the layout of the extended name, or by manually modifying the name to remove extraneous elements). On 02/09/2024 at 19:57, Loke said: Because that is how the main groups release them and modifying it affects for example seeding back Seeding from the same storage as Emby streams from is often going to involve severe compromises in naming and especially in folder structure; I would keep them strictly separate. Paul
ebr 16184 Posted September 4, 2024 Posted September 4, 2024 Folks - we do not seek to support the illegal re-distribution of content via torrents or other means. Please keep the discussion on topic for our usage for personal media. Thanks. 1
Loke 4 Posted October 10, 2024 Author Posted October 10, 2024 (edited) And back at this. So for cases like mine where my archive does contain CRC and modifying that is quite hard, it's not possible to implement some measure? Since you will argue it affects a small % of users, couldn't it be something "small" that is enabled per series by the user? That way it doesn't affect initial scanning. Now I don't know what the proper solution would be, I wonder if it could be something as "simple" as selecting manually the parsing method from the list of supported naming formats if any of those would work for the example name I provided. Or the format/mode would have to be a new one that ignores 8 alphanumeric chars surrounded by () or [] anywhere in the filename. Again since that action is configured by the user per series also solves what @ebr mentioned as a potential problem, filenames using as part of the name something that would match the CRC check (regardless how probability unlikely). This option could be enabled per library as an advanced setting, disabled by default. Thoughts on that? @ebr@Luke Edited October 10, 2024 by Loke
Luke 42079 Posted October 11, 2024 Posted October 11, 2024 12 hours ago, Loke said: And back at this. So for cases like mine where my archive does contain CRC and modifying that is quite hard, it's not possible to implement some measure? Since you will argue it affects a small % of users, couldn't it be something "small" that is enabled per series by the user? That way it doesn't affect initial scanning. Now I don't know what the proper solution would be, I wonder if it could be something as "simple" as selecting manually the parsing method from the list of supported naming formats if any of those would work for the example name I provided. Or the format/mode would have to be a new one that ignores 8 alphanumeric chars surrounded by () or [] anywhere in the filename. Again since that action is configured by the user per series also solves what @ebr mentioned as a potential problem, filenames using as part of the name something that would match the CRC check (regardless how probability unlikely). This option could be enabled per library as an advanced setting, disabled by default. Thoughts on that? @ebr@Luke Hi, yes we do plan to look at this. Thanks for reporting.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now