Best way to digitize screen caps of BASIC programs

falter · Oct 4, 2023

Challenge: I want to 'capture' and digitize the listings of Tiny BASIC programs on my digital group cassettes. I don't presently have a way to interface a printer with that machine, so I'm looking for some "high tech" way to do it. My original thought was to take video footage I had of listing the programs, make still images of it, and then use Acrobat to OCR them and copy that into a Word file. I still don't understand how anyone gets anything done with Acrobat.. it's one of the most complicated and obtuse programs I've ever tried to work with. I did manage to export a screen-full of program but only after 15 minutes of fiddling and still requiring major corrections.

I just want to put these listings in a Word format so people can easily copy them if they want to into other Tiny BASIC interpreters, or just examine how they were done for posterity.

Many thanks!

furball1985 · Oct 4, 2023

can you interface the printer serial or parallel port itself?

If you have a serial port then you can redirect output to putty or TERATERM on a pc or mac.
if you have a parallel port you can get a parallel to serial converter

then you can just LLIST or LPRINT output and capture it through the printer port.

with my Altair running cp/m i can change the Console to redirect to my printer port. i have a serial 4 way switch. one of my connections on the switch is a Serial to USB converter.
i run TeraTerm under windows 10 and then i can capture output or i can use it as a VT-100 terminal from my pc.

allows me to quickly copy and paste text or capture output for Excel processing.

another position on my serial switch is for my dot matrix printer using a Serial to Parallel converter.(B&B with external 12V power)

then i also have my WIFI modem on another position where i can again dump via teraterm

with acrobat to make it useful you need aftermarket plugins which can cost a bit of money.

1944GPW · Oct 4, 2023

No need for Acrobat, there are a few free OCR to text converters on the web, I have used this one from Aspose (an Australian company) via their cloud API but they have this page where you can drop your image on and it will give you text.
https://products.aspose.app/ocr/scan-image

I tried a screengrab of OP's text (this is a png image below):

and it generated this (reasonable) text:

Challenge: I want to 'capture' and digitize the listings of Tiny BASIC programs on my digita
group cassettes.I don't presently have a way to interface a printer with that machine, so l'm
looking for some ""high tech" way to do it. My original thought was to take video footagel
had of listing the programs, make still images of it, and then use Acrobat to OCR them and
copy that into a Word file. I still don't understand how anyone gets anything done with
Acrobat..it's one of the most complicated and obtuse programs I've ever tried to work with.
did manage to export a screen-full of program but only after 15 minutes of fiddling and stl
requiring major corrections.
l just want to put these listings in a Word format so people can easily copy them if they want

Not sure if there's any limit on this drag-and-drop page but using it programmatically through the API it's quite generous.

falter · Oct 4, 2023

Thank you. I will try that tomorrow. This is a sample screen grab from the actual machine.

falter · Oct 4, 2023

furball1985 said:
can you interface the printer serial or parallel port itself?

If you have a serial port then you can redirect output to putty or TERATERM on a pc or mac.
if you have a parallel port you can get a parallel to serial converter

then you can just LLIST or LPRINT output and capture it through the printer port.

with my Altair running cp/m i can change the Console to redirect to my printer port. i have a serial 4 way switch. one of my connections on the switch is a Serial to USB converter.
i run TeraTerm under windows 10 and then i can capture output or i can use it as a VT-100 terminal from my pc.

allows me to quickly copy and paste text or capture output for Excel processing.

another position on my serial switch is for my dot matrix printer using a Serial to Parallel converter.(B&B with external 12V power)

then i also have my WIFI modem on another position where i can again dump via teraterm

with acrobat to make it useful you need aftermarket plugins which can cost a bit of money.

I believe the IO board has open sockets you can wire up parallel devices to. I should really study how that works and learn how to interface things with it. I have that PR40 SWTPC printer I could use here..

1944GPW · Oct 4, 2023

Using your screen grab I got this using "Convert image to text online":

@20@12@021@ LET R=
@20122@01@2 DIM X(2)
@2112@2112 LET T=@
@20212220122 LET R=RN/12
Ø2132 FOR I=l TO 16 $ PR $ #XT I22132
Ø2142 PR ""OK I HRUE R NLJMBER""Ø2140
Ø21582 PR22150
ØØ1S2 PR""ETER FIRST HUMBER "";Ů2162
Ø2172 IN X(1)Ø2172
ØØ182 PR''ENTER SECOND NUMBER "";Ø2180
@@192 IN X(2)@t192
@@2@2 LET T=T+1@@2@2
@@212 GOSLB 80202@20212
@0220 IF Z=1 GOT0 252@0222
@8232 PR'"GLESS RGRTN""@8232
@@242 G0TO 152@2224@
ØØ2S2 PR"YO GLESGș IT Iiį ";T; ""
|RIES""
ØØ282 PR'"RGRIH? Y=i,' N=2 "";
22272 IN R
ØØ282 IF R=2 G0TO 33933
002232 G0T0 110
Ø3ØØØ LET Z=0
08102 FOR Y=1 T0 2
Ø82ØØ PR X(Y); "" "";
Ø8ƏØØ IF X(Yj{R PR"(NUMBER"" $ G0
TO 8822
Ø84ØØ IF XťY)yR PR ">NLMBER"" $ G
OTO 8820
3502 LET Z=l
286Ø2 PR ""= "";
38702 PR R
388Ø2 NT Y
383Ø2 RET

But I had to invert the image, boost contrast and flatten to black and white (using Paint.NET) before Aspose could recognise anything at all. Imagemagick could most likely do these operations on a batch of screenshots.

Chuck(G) · Oct 4, 2023

That's the problem with simple OCR. It might be correct 95% of the time, but it will take a lot of hard work to find and correct the 5% gotten wrong.

djg · Oct 4, 2023

Not familiar with your particular machine so this is generic possible option to the screen digitization method. If you can capture the cassette audio it can be decoded in software. The audio files can also be used directly to let others load the programs.

This page indicates two different audio formats were used.
https://www.retrotechnology.com/restore/dig_grp.html

If you capture the audio you can view the spectrum in an audio tool to see which encoding your machine uses and see if any software is available.

First discussion I saw on decoding tarbell.
https://forum.vcfed.org/index.php?threads/audio-cassette-formats.80877/
Think this is the final code.
https://github.com/nippur72/tarbell-decoder

If the computer stores the program tokenized then additional conversion may be needed to get readable text.

whartung · Oct 4, 2023

That OCR is arguably not particularly useful. Better off just typing the stuff back in by hand from the screen grabs, easier than correcting the code. You're going to have to eyeball every single character anyway.

IF you can find a suitable OCR tool, however, and IF your scroll rate is slow enough, you MAY be able to capture a video of the LIST command, then go back through it and do select frame grabs. If everything is synced up well enough, you should be able to get sharp images from it. I'm guessing this might be faster at capture than listing, pausing, snapshotting.

Regarding the decoding of the tapes, someone here may be able to digitize your cassettes for you if you can't do it yourself. That would likely be the most accurate technique if you can't print the text out to, well, anything.

1944GPW · Oct 5, 2023

I agree, *that* particular OCR wasn't all that successful. If the camera was set up squarely to the screen and a few takes were done whilst experimenting with contrast, brightness etc and submitted to the OCR page it might produce a slightly improved result. Maybe.
As also mentioned above if the cassettes were listed on a Digital Group machine with a serial port would be much better, or dumping the tapes (are they KCS format?). But that might involve the mundane sending of the tapes to someone else with a suitable machine to do, which perhaps wouldn't make for as much of a Youtube episode as opposed to OP doing it himself.

Plasma · Oct 5, 2023

Seems like it should be not too difficult to roll your own OCR for this particular case, because so much is controlled. You know the background and text colors, character font, size and spacing, and distortion from the monitor. Should be possible to use OpenCV to rectify the image, find the upper left character, then do pattern matching for each character on the screen. I did something similar in the early days of captchas when they were just monospaced text in an image.

whartung · Oct 6, 2023

Yea I think the problem with most modern "off the shelf" OCRs is that they're really designed for a) printed text (i.e. books, etc.) and b) english (or, at least, a natural language). The stark pixel characters don't look much like "real" letters and numbers, and the words don't look like anything. To the OCR it's line noise in more ways that one.

The OpenCV idea is probably sound, however, the characters are quite regular. The question is whether getting that set up is faster than just grabbing a drink, firing up some music, hunkering down, and typing the stuff back in by hand.

Best way to digitize screen caps of BASIC programs

falter

Veteran Member

furball1985

Experienced Member

1944GPW

Veteran Member

falter

Veteran Member

Attachments

falter

Veteran Member

1944GPW

Veteran Member

Chuck(G)

25k Member

djg

Experienced Member

whartung

Veteran Member

1944GPW

Veteran Member

Plasma

Veteran Member

whartung

Veteran Member