• CCRhode@lemmy.ml
    link
    fedilink
    arrow-up
    0
    ·
    2 days ago

    Boy howdy, do I have just the script for you!

    https://pypi.org/project/clanker_score/

    Full disclosure: It doesn’t work. But the idea is nice: … that you could — perhaps in real life — identify AI-generated content. … so I wrote a framework that purports to do that.

    Keyword density is not the only measure of gloss. There are others that have been developed to measure ratios between parts of speech. Unfortunately none of these distinguish sharply between pages that naturally convey genuine information and pages that have been designed to convey fluff for ulterior purposes. It is unlikely that combining measures of gloss will result in a tool that discriminates much better than keyword density by itself.

    • Piskorski, Jakub, Marcin Sydow, and Weiss Weiss. “Exploring Linguistic Features for Web Spam Detection: A Preliminary Study.” Airweb '08: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. Ed. Carlos Castillo, Kumar Chellapilla, and Dennis Fetterly. New York: ACM, Apr. 2008. 25-28. ISBN:9781605581590. DOI:10.1145/1451983. 09 Nov. 2025 https://users.pja.edu.pl/~msyd/lingFeat08draft.pdf.