Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.
-
Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.
-
@tek I have complaints about recoverability on a mildly corrupted bitstream, but it's much too late in the evening to articulate this well.
-
@tek I have complaints about recoverability on a mildly corrupted bitstream, but it's much too late in the evening to articulate this well.
@vathpela IMHO, redundancy and/or checksums should be implemented on different layer, not in text encoding
Like, there's many, many ways to keep bits from corrupting, which are applicable in different cases
And forcing one particular inside of text encoding itself is...mehSame for compression btw. For some texts (CJK in particular) UTF-8 is sub-optimal, but even basic deflate makes it compact enough
TL;DR: UTF-8 is not perfect, but having one encoding for every text outweighs
-
@vathpela IMHO, redundancy and/or checksums should be implemented on different layer, not in text encoding
Like, there's many, many ways to keep bits from corrupting, which are applicable in different cases
And forcing one particular inside of text encoding itself is...mehSame for compression btw. For some texts (CJK in particular) UTF-8 is sub-optimal, but even basic deflate makes it compact enough
TL;DR: UTF-8 is not perfect, but having one encoding for every text outweighs
@mo @vathpela Also, UTF-8 is trivially easy to synchronize. If you delete a byte out of the middle of a file, at most you’ll lost the one affected character (well, code point). The ones before and after it will be fine. That’s not true of some other Unicode encodings, like double width ones where everything after would be out of sync.
-
Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.
@tek Still I am regularly confronted with IT systems that do not (properly) support it and display my name with an umlaut wrong.
-
Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.
@tek and it is still being handled wrongly in many places
-
@mo @vathpela Also, UTF-8 is trivially easy to synchronize. If you delete a byte out of the middle of a file, at most you’ll lost the one affected character (well, code point). The ones before and after it will be fine. That’s not true of some other Unicode encodings, like double width ones where everything after would be out of sync.
@tek This! UTF-8 is a great encoding. Unicode can be a mess at times though.

-
Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.
@tek But UTF-EBCDIC is still younger than EBCDIC was when UTF-EBCDIC was invented.
-
@tek and it still sucks
-
-
Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.
@tek Every now and then the Cambridge CST exam papers include a question like "explain why even experienced programmers sometimes have problems with character codes".
You could write pretty well anything you liked.
Originally what was expected was an essay about things like escape sequences on Flexowriter tapes; in my day it was about conversion between EBCDIC and ASCII; these days it might be about obscure characters in URLs.
-
@vathpela IMHO, redundancy and/or checksums should be implemented on different layer, not in text encoding
Like, there's many, many ways to keep bits from corrupting, which are applicable in different cases
And forcing one particular inside of text encoding itself is...mehSame for compression btw. For some texts (CJK in particular) UTF-8 is sub-optimal, but even basic deflate makes it compact enough
TL;DR: UTF-8 is not perfect, but having one encoding for every text outweighs
-
Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.
@tek
And yet, my bank still won't let me add a contact (for etransfers) with an accent in their name. -
-
Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.
@tek MySQL will still happily mangle it.
-
-
-
-
-
