pwhodges 2012 Posted January 2, 2021 Posted January 2, 2021 6 hours ago, crusher11 said: I gave it a go with a couple of discs. With one, it was still asking me to manually identify about half the characters in each subtitle well over half an hour into the film. With the other, it assumed all commas were apostrophes and left weird spaces in the middle of a bunch of words so I had to go back over it in Subtitle Edit anyway. But once it's learnt, it's learnt. And it's very easy to find and remove incorrect recognition of individual characters. There's also spacing adjustment available. My experience was that after the initial learning it required far less intervention than the OCR in Subtitle Edit - but that's largely because my subs often contain Japanese names which it doesn't know. It also put in far less unwanted spaces for me than SE. I guess it's very source dependent, and I've been lucky, maybe? 6 hours ago, Ronstang said: Yeah, I found I actually had this program on my system and once I opened it I remembered I abandoned it because I don't have the time to make my own SRT subs unless the program is smart enough to make them without me manually editing them.....who has the time or patience for that? Well, as I said, I did 55 (25 minute) episodes of a series word-perfect in two hours, which I felt was good use of time. Again, I suppose that as they were all the same series, the learning was that much more effective, as the font remained consistent, for instance. Paul
crusher11 1101 Posted January 2, 2021 Author Posted January 2, 2021 2 hours ago, Ronstang said: And I just checked and if you have and SRT file like the one generated by MCEBuddy when it extracts it with CCExtractor and you put it into the MKV or other container then that becomes the default....so essentially subs in the container come first and the default in the container becomes the default in Emby. This doesn't seem ideal. I leave all the PGSSUBS in the file, because they're better than SRT subs and some devices can play them so I want them available. But I also have SRT subs for devices that can't. Therefore, in a situation where two subs match the criteria set, it would make more sense for it to pick the subtitle track that can direct play.
Ronstang 293 Posted January 2, 2021 Posted January 2, 2021 7 hours ago, pwhodges said: Well, as I said, I did 55 (25 minute) episodes of a series word-perfect in two hours, which I felt was good use of time. Again, I suppose that as they were all the same series, the learning was that much more effective, as the font remained consistent, for instance. Paul I never tried it with a series and it was probably PGS subs so I will try it again with one of my TV series on DVD where I have 50-100 episodes as your point is well taken. This time I will load up the beta as it is the last available build.
Q-Droid 989 Posted January 2, 2021 Posted January 2, 2021 Sorry to jump in. My question, maybe for the devs, is that since the Emby built-in convert feature already does extract subtitles from media would it be possible to add a separate option to only check for and extract subtitles to create the SRT/VTT files.
Ronstang 293 Posted January 2, 2021 Posted January 2, 2021 3 minutes ago, Q-Droid said: Sorry to jump in. My question, maybe for the devs, is that since the Emby built-in convert feature already does extract subtitles from media would it be possible to add a separate option to only check for and extract subtitles to create the SRT/VTT files. That would be quite awesome since the people developing this project have a much better understanding of what people like us want and need.
pwhodges 2012 Posted January 2, 2021 Posted January 2, 2021 But the extraction doesn't deal with the need for OCR, which I doubt would be reliable enough as a server background process. Paul
crusher11 1101 Posted January 3, 2021 Author Posted January 3, 2021 (edited) Tried that program with something else, and it again put every single comma on its own line because it thinks it's an apostrophe. So I get things like: Quote Well we've come to the halfway point ' of our senior seminars. Manually editing all those back is no doubt going to take longer than if I'd just gone through the Subtitle Edit OCR. Edited January 3, 2021 by crusher11
pwhodges 2012 Posted January 3, 2021 Posted January 3, 2021 Have you deleted its memory of apostrophe and comma and retaught it? A single mistake in initial training can lead to that result, but it is easy to correct by deleting the previous learning and rerunning the (very fast) program. Paul
crusher11 1101 Posted January 3, 2021 Author Posted January 3, 2021 Can't have decimal points in SRT filenames either. I have the 4k77 version of Star Wars, which has different soundtracks with varying dialogue, as some tweaking was made between the mono and six-track releases. So I had "5.1 SDH" and "Mono SDH" SUBRIP tracks. Can't do that in SRT.
pwhodges 2012 Posted January 3, 2021 Posted January 3, 2021 (edited) 2 hours ago, crusher11 said: I have no idea how to do that? At the end of the initial scan, click the "Review" button at the bottom right. You can then see each character match and remove those which are wrong. In this example I had clicked C instead of c for one match, so I deleted it ("Remove a Training"): When you close the review window after a correction the scan is automatically repeated, and obviously the previously mismatched characters are queried again, and you can give the correct match. Paul Edited January 3, 2021 by pwhodges
crusher11 1101 Posted January 3, 2021 Author Posted January 3, 2021 Yep, that's fixed it. Although trying to navigate through that list is a pain and it's impossible a lot of the time to even tell which character is which when it comes to various apostrophes, commas, dashes, underscores, etc. It also didn't even bother to query the unicode music symbol character, and just listed it as "?" with no way to correct it that I could find.
pwhodges 2012 Posted January 3, 2021 Posted January 3, 2021 It doesn't do Unicode - that's a big blot, and shows its age. It clearly doesn't suit everybody or every situation, but I found it immensely effective with a block of OCR which I wanted to do on 55 episodes (it was a situation I've alluded to in another thread, where cropping the black bars of a Blu-ray rip of a 4:3 video cause havoc with the PGS subtitles which used some of that space). I've just got the next 28 episodes of the same anime to process, so I'll be exercising it again this week. Paul
crusher11 1101 Posted January 3, 2021 Author Posted January 3, 2021 But it doesn't even go "uh, no idea what this is" or anything, just throws in a "?" and carries on. It was only because it was the third line of the subtitles that I spotted it at all.
pwhodges 2012 Posted January 3, 2021 Posted January 3, 2021 As it doesn't know about Unicode, it doesn't know to do that. Anyway, if you don't like it, don't use it. I don't mind, because it's not my program, or anything! Paul
crusher11 1101 Posted January 3, 2021 Author Posted January 3, 2021 Surely it still sees a shape it doesn't recognise, though?
pwhodges 2012 Posted January 3, 2021 Posted January 3, 2021 <shrug> I didn't write it, so I don't know the logic. Maybe it recognises it, but knows it is unable to encode it, so uses "?". Is it in the Review window? Paul
crusher11 1101 Posted January 3, 2021 Author Posted January 3, 2021 Nope. There's no sign at all that there's a problem.
pwhodges 2012 Posted January 3, 2021 Posted January 3, 2021 Well, now you know at least as much about this program as I do... Paul
crusher11 1101 Posted January 6, 2021 Author Posted January 6, 2021 It just successfully identified the music symbol. So, clearly it supports it. But it only recognises it in italics, everything else gets a question mark. Bizarre. Easy enough to drag it into Subtitle Edit and do a find-and-replace for "[?" and ">?", but still. Weird. Slashes are another thing that I can't have in file names. The ASCII workaround works for spaces, so I can get my English SDH subs working, but my Producer/Director commentary subs and my "5.1 Track" subs are having to use different phrasing. It would be even better if Emby would actually support a filename like "Tenet (2020) - 4K HDR.en.sdh.srt", and be aware both that it's in English but also that it's an SDH subtitle and not just dialogue. Right now, Emby knows neither thing. I believe Plex supports this.
pwhodges 2012 Posted January 6, 2021 Posted January 6, 2021 Well, you wouldn't expect a slash to be allowed in file names, given its near-universal use as a file name separator. My approach to such things (if it really, really matters to me) is to look for another unicode character that is similar but distinct in coding, and use that. In this case, try this little table from Wikipedia: as a source for ideas - you could even copy and paste the character you choose from the linked article rather than typing the unicode encoding. Paul
crusher11 1101 Posted January 7, 2021 Author Posted January 7, 2021 I'm not surprised I can't use a slash, I just came across a file that needed one and had to work around it. The program you mentioned cannot handle closing single quotes. It just assumes all commas are misaligned closing single quotes.
crusher11 1101 Posted January 9, 2021 Author Posted January 9, 2021 I've figured out the music symbol thing. The program can identify unicode, and put it into the subtitle, it just saves it in a non-unicode format so the unicode symbols get converted to question marks at that stage. Which seems like a dumb way of doing things, and unfortunately there's no option I can see to save it in a different format or even access the output directly to copy and paste it or something.
pwhodges 2012 Posted January 9, 2021 Posted January 9, 2021 Just to complete information on this (old and no longer developed) program, I remembered another specific reason I use it which I didn't pass on. Whereas Subtitle Edit just OCRs words and creates a SRT file, DVD Sub Extractor can capture the positions of text on the screen and write an ASS file which preserves that information with the text. When converting PGS subs from something like anime which often has a lot of translations of signs and other screen text, this is invaluable. Paul
crusher11 1101 Posted January 9, 2021 Author Posted January 9, 2021 There isn't as much support for ASS as for SRT though, is there?
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now