Why is OCR so Bad?

The plans we received from the EOR are a mess when it comes to hidden text, so doing things like creating page labels, bookmarks, or Set labels from Page Regions doesn't work and we basically have to manually edit everything, which can be a pain on a 770+ page plan set. At support's suggestion, they had me Print to Bluebeam PDF to clean all that up and then run OCR. It did improve things, but I still need to correct a lot of sheet numbers. It thinks 1's are 7 or I, it thinks B is 8, thinks W is V \ /, and S is 5. That's just the sheet numbers I'm working with. I can't image how it's messing up numbers on the sheets in general. Here are just a few examples, I fixed most of them before thinking to post.

Find more posts tagged with

Comments

Scott Cavendish

Luke Shiras

That is frustrating. I've had issues with this before but luckily for me it's only been a couple sheets per set. Honestly, I just kind of assumed that OCR was a third-party tool, kind of like auto-correct on phones and that Bluebeam just licensed it.

Welcome!

It looks like you're new here. Sign in or register to get started.