Part of my day-job involves reviewing and searching through PDFs of old documents, most commonly WA government gazettes. The supplied PDFs were OCRed back in 2013. But I have never been sure how accurate the original OCR was, or if modern technology could do a better job. I went through a brief project to see what OCR options are presently available, and if any of them produce a sufficiently better OCR than what is already there.
To be clear, I did not do a thorough analysis of how the different OCR options handled various images. I compared how accurately the different OCR options OCRed a particular random page of the government gazette. Which page?

The page has two columns, and in each column, there are several consecutive blocks of text.
Executive summary
The best OCR came from Google docs transformation. More or less equal second is the existing OCR (apparently done in 2013 by Paperport 11) and the free online Onlineocr.net. The free tesseract OCR project did a cromulent job.
On the other end of the scale, Adobe Acrobat Pro did surprisingly badly. Also, while the (presumably unintelligent) Google docs OCR did wonderfully, Google’s (free) AI OCR was a mess. It hallucinated large parts of the page making it catastrophic. Open AI’s Chat GPT 4o did much better, but it also hallucinated, and skipped parts of the page. I am frankly worried about people using AI products for OCR.
Methodology
I stripped the original OCR from the page, but otherwise provided each OCRing software the same image. Some required the image be in a PDF format, while others were happy with a PNG. Once I got the OCRed text, I put it into a text editor (BBedit) to strip any formatting, and make the line breaks line up consistently. I then put it through a random online text compare website to compare it with a version of the text I manually prepared.
Some software mixed up the order of the blocks of text, going across the page, instead of down by column. What I care about is that the text is sufficiently accurately recognised that the search function is likely to return an accurate result. If the software made this mistake, I manually re-ordered the paragraphs so the compare function was effective.
Where the OCR was not too bad, I counted the minor errors, where a character was mis-read, and the major errors, where whole words or lines were omitted. I did not record mistakes with punctuation as it has not effect for my purposes.
The original OCR by Paperport
The original PDFs were created by the software Paperport 11 back in 2013. The OCR text embedded in the PDFs were presumably created by that software.
The Paperport OCR made the common error of alternating blocks of text. Other than that, it made 15 minor errors, and two major errors.
Adobe Acrobat Pro 24.05
Adobe invented the PDF back in the day. I was hopeful that the OCR from the king would produce a better result. I was disappointed.
Acrobat Pro also alternated blocks of text, but it also produced a much worse text output. Somehow it read “Municipality of Broad Arrow-Paddington” as “Jfunicipalit11 of B1’oacl Arrow-Pac1clington”. The rest was not much better. I did not try to count errors as there were simply too many.
Apple OCR
All recent major Apple OSs automatically OCR text in images displayed on the system.
It may be fine to use to select text in images, but it is useless as a replacement for a proper OCR. When tested for this purpose, it missed many lines of text, and also made many errors with individual characters.
Tesseract
Tesseract is an open source OCR software. I used version 5.5 on the command line.
Tesseract was the first OCR software to correctly place all of the text in the first column, followed by the text in the second column. It was far from perfect though. I counted about 30 minor errors in the OCR output, and no major errors.
Google Docs
If you upload an image PDF to google docs, it will OCR the text, and allow you to edit it.
Then resulting google doc was a complete mess of formatting, but the text was remarkably accurate. Other than swapping around a couple of lines (which in the context was not that bad) and mixing up a few punctuation marks, it made no errors.
Google Gemini 2.5
AI is all the rage apparently. Google exposes 2 versions of Google Gemini to people too cheap to pay for more.
I first started with Gemini “2.5 Flash”. It correctly outputted the text by column, but its OCR output was truely bizarre. It OCRed this passage:

as:
Present:
His Excellency the Governor,
The Honourable--the Colonial Treasurer,
The Colonial Secretary.
1820/10.
WHEREAS by Section 18 (2) of " The Municipal
Corporations Act, 1906," the Governor may, by Order
in Council declare any new locality and include the dis-
trict thereof in the Municipal. Now, therefore, His
melcomes a petition has been prolonged to the Excellency
the said Act and Municipal and General Act, and this
mis A.L. signed, with the coming seal, at the steadily
purse, and the area of Municipality of Bunbury, so far
as it affects the present. The Governor by and with of
Its published advertised in the Government Gazette on
and the previous year, 1908. Now, therefore, by the
rollogy the Governor, with the advice of the Executive
Council, doth hereby proclaim and set apart, under the
Bame Act, do hereby dissolve the Municipality of
Bunbury and Proclamation previously made thereof passed
in the Broad Arrow Road District;
For those uninterested in reading the text, somehow it demoted “The Premier” to “the Colonial Treasurer”, and changed “Municipality of Broad Arrow-Paddington” to “Municipality of Bunbury and Proclamation”. Bunbury is another town in WA, so it is not implausible that a WA gazette may refer to Bunbury, but that is not what the image showed. Its hallucination engine was running overtime, and changed things throughout the document to the point that I went back to confirm I gave it the correct image to OCR… (I had)
Gemini “2.5 Pro” was next. It output nice, clean, well wrapped text, but it also seemed intent on hallucinating. The above passage was OCRed as:
Present:
His Excellency the Governor.
The Honourable the Premier.
The Colonial Secretary.
1820/10.
WHEREAS by Section 10 (1) of "The Municipal
Corporations Act, 1906,'' the Governor may,
by Order in Council, declare any municipality
and define the boundaries thereof, and may,
from time to time, alter such boundaries:
And whereas it is deemed expedient to alter
the boundaries of the Municipality of North
Fremantle: Now, therefore, His Excellency the
Governor, by and with the advice and consent
of the Executive Council, doth hereby declare
that the boundaries of the said Municipality
of North Fremantle shall be as described in
the Schedule hereto, and that the boundaries
of the said Municipality as defined by Order
in Council dated the ninth day of October,
one thousand nine hundred and one, and
published in the Government Gazette of the
eleventh day of October, one thousand nine
hundred and one, are hereby amended accordingly.
This time the premier avoided demotion, but now the Order in Council is changing the boundaries of North Fremantle, not dissolving the Municipality of Broad Arrow.
Chat GPT 4o
With Google’s AI just making stuff up, can Chat GPT do any better? The free version of Chat GPT presently available is 4o.
It did much better, the only thing it hallucinated was it replaced “Mount Magnet” with “Mukinbudin” (a different small town in WA). It also arguably OCRed pretty well with only 7 minor errors in the OCR. However it made some unusual “choices”. It completely omitted the incomplete block on the top right of the page, and the incomplete block on the bottom left of the page. It might do better with a complete document, but I am not inclined to trust it…
Free Online OCR websites
I also tested some free online OCR websites. There seems to be endless options. With my enthusiasm and energy waning, I picked three at pretty much random.
Onlineocr.net did a remarkably good job. I counted 16 minor errors, and 1 major error, which is better than most options on this list.
free-online-ocr gives the option of creating a Word document or just text. The produced word document is useless. large parts of it are just images. The text parts are a mess. The text output is not very useful either. It skipped whole sections. The parts it did OCR, it made many errors.
ocr.space has two “engines”. Neither does a great job. Both are better than free-online-ocr, but both also fall behind Onlineocr.net