i am using this text [1] to test UTF-8 character counting. Does somebody know how to get an authoritative count of how many that should actually be? Mines possible invalid ones, should they be in that text?