This is technically true, but the character in the other version is a hyphen instead of a dash (though given the absences of dashes in ASCII, one, two, or three ASCII hyphens are often used in place of dashes in environments constrained to ASCII.)
And while AI watermarking and fingerprinting is real, using typographically-correct Unicode instead of base ASCII isn't really it (though I guess anything that transforms text in a way which reduces variety like this does will make some of it less effective.)
> This isn’t a claim that major LLMs do all (or any) of these tricks. That said, I started working on this because I accidentally discovered an instance of text fingerprinting while debugging a byte-sensitive bug. That’s when I realized: it’s time to say goodbye to (at least these kinds of) fingerprints for good.
"- I'm AI. (Normal text) − І’m󠅘󠅟󠅜󠅑 ΑІ.󠅓󠅙󠅑󠅟 (AI-tainted text)
’ (U+2019) is a right single quotation mark instead of a regular quote"
I think AI just uses the correct apostrophe, isn’t it?
https://dictionary.cambridge.org/ja/grammar/british-grammar/...
"− (U+2212) is a minus sign instead of a dash"
This is technically true, but the character in the other version is a hyphen instead of a dash (though given the absences of dashes in ASCII, one, two, or three ASCII hyphens are often used in place of dashes in environments constrained to ASCII.)
And while AI watermarking and fingerprinting is real, using typographically-correct Unicode instead of base ASCII isn't really it (though I guess anything that transforms text in a way which reduces variety like this does will make some of it less effective.)
> This isn’t a claim that major LLMs do all (or any) of these tricks. That said, I started working on this because I accidentally discovered an instance of text fingerprinting while debugging a byte-sensitive bug. That’s when I realized: it’s time to say goodbye to (at least these kinds of) fingerprints for good.
Are there any examples of this being used?
I don't think this has any legitimate use, does it?
It seems this is just to support cheating, misinformation and to generally make the web worse.