Recreating uncensored Epstein PDFs from raw encoded attachments

apftwb@lemmy.world · 20 hours ago

Recreating uncensored Epstein PDFs from raw encoded attachments

hodgepodgin@lemmy.zip · edit-2 13 hours ago

I tried to leave a comment, but it doesn’t seem to be showing up there.

I’ll just leave it here:

too tired to look into this, one suggestion though - since the hangup seems to be comparing an L and a 1, maybe you need to get into per-pixel measurements. This might be necessary if the effectiveness of ML or OCR models isn’t at least 99.5% for a document containing thousands of ambiguous L’s. Any inaccuracies from an ML or OCR model will leave you guessing 2^N candidates which becomes infeasible quickly. Maybe reverse engineering the font rendering by creating an exact replica of the source image? I trust some talented hacker will nail this in no time.

i also support the idea to check for pdf errors using a stream decoder.

Rioting Pacifist@lemmy.world · 12 hours ago

How big is N though?

Qwaffle_waffle@sh.itjust.works · 7 hours ago

mEEGal@lemmy.world · 9 hours ago

Asking the real questions