Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.

Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.

Scheduled Pinned Locked Moved Uncategorized
27 Posts 18 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • tek@freeradical.zoneT This user is from outside of this forum
    tek@freeradical.zoneT This user is from outside of this forum
    tek@freeradical.zone
    wrote last edited by
    #1

    Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.

    avatastic@avatastic.ukA vathpela@infosec.exchangeV sikorski@mstdn.scienceS fabian@mainz.socialF madduci@mastodon.socialM 9 Replies Last reply
    1
    0
    • tek@freeradical.zoneT tek@freeradical.zone

      Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.

      avatastic@avatastic.ukA This user is from outside of this forum
      avatastic@avatastic.ukA This user is from outside of this forum
      avatastic@avatastic.uk
      wrote last edited by
      #2

      @tek and yet slashdot still can't display it.

      1 Reply Last reply
      0
      • tek@freeradical.zoneT tek@freeradical.zone

        Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.

        vathpela@infosec.exchangeV This user is from outside of this forum
        vathpela@infosec.exchangeV This user is from outside of this forum
        vathpela@infosec.exchange
        wrote last edited by
        #3

        @tek and it still sucks

        djl@mastodon.mit.eduD 1 Reply Last reply
        0
        • vathpela@infosec.exchangeV This user is from outside of this forum
          vathpela@infosec.exchangeV This user is from outside of this forum
          vathpela@infosec.exchange
          wrote last edited by
          #4

          @tek I have complaints about recoverability on a mildly corrupted bitstream, but it's much too late in the evening to articulate this well.

          vathpela@infosec.exchangeV loke@functional.cafeL mxk@hachyderm.ioM mo@mastodon.mlM 4 Replies Last reply
          0
          • vathpela@infosec.exchangeV vathpela@infosec.exchange

            @tek I have complaints about recoverability on a mildly corrupted bitstream, but it's much too late in the evening to articulate this well.

            vathpela@infosec.exchangeV This user is from outside of this forum
            vathpela@infosec.exchangeV This user is from outside of this forum
            vathpela@infosec.exchange
            wrote last edited by
            #5

            @tek (don't get me wrong, I have to use UCS-2 often enough to know real pain...)

            1 Reply Last reply
            0
            • vathpela@infosec.exchangeV vathpela@infosec.exchange

              @tek I have complaints about recoverability on a mildly corrupted bitstream, but it's much too late in the evening to articulate this well.

              loke@functional.cafeL This user is from outside of this forum
              loke@functional.cafeL This user is from outside of this forum
              loke@functional.cafe
              wrote last edited by
              #6

              @vathpela @tek Given how much worse the alternatives are, and how impossible it would have been to get people to move off of encodings, I'm glad UTF-8 exists.

              Don't take me wrong, I'm quite aware of the issues with UTF-8, but I (choose to) believe that if it wasn't for UTF-8 we'd still be drowning in ASCII, and it would be impossible to tell the English-only speaking minority that supporting letters other than what was used to write inscriptions in ancient Rome might actually be useful.

              1 Reply Last reply
              0
              • tek@freeradical.zoneT tek@freeradical.zone

                Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.

                sikorski@mstdn.scienceS This user is from outside of this forum
                sikorski@mstdn.scienceS This user is from outside of this forum
                sikorski@mstdn.science
                wrote last edited by
                #7

                @tek

                1 Reply Last reply
                0
                • vathpela@infosec.exchangeV vathpela@infosec.exchange

                  @tek I have complaints about recoverability on a mildly corrupted bitstream, but it's much too late in the evening to articulate this well.

                  mxk@hachyderm.ioM This user is from outside of this forum
                  mxk@hachyderm.ioM This user is from outside of this forum
                  mxk@hachyderm.io
                  wrote last edited by
                  #8

                  @vathpela @tek I would argue that in modern times this really shouldn't be an issue to be concerned about. It's not like telnet and plain serial connections are still most central communication protocols. And if your storage is causing bit flips you have other issues than readable plain text.

                  ahltorp@mastodon.nuA 1 Reply Last reply
                  0
                  • vathpela@infosec.exchangeV vathpela@infosec.exchange

                    @tek I have complaints about recoverability on a mildly corrupted bitstream, but it's much too late in the evening to articulate this well.

                    mo@mastodon.mlM This user is from outside of this forum
                    mo@mastodon.mlM This user is from outside of this forum
                    mo@mastodon.ml
                    wrote last edited by
                    #9

                    @vathpela IMHO, redundancy and/or checksums should be implemented on different layer, not in text encoding

                    Like, there's many, many ways to keep bits from corrupting, which are applicable in different cases
                    And forcing one particular inside of text encoding itself is...meh

                    Same for compression btw. For some texts (CJK in particular) UTF-8 is sub-optimal, but even basic deflate makes it compact enough

                    TL;DR: UTF-8 is not perfect, but having one encoding for every text outweighs

                    @tek

                    tek@freeradical.zoneT mansr@society.oftrolls.comM 2 Replies Last reply
                    0
                    • mo@mastodon.mlM mo@mastodon.ml

                      @vathpela IMHO, redundancy and/or checksums should be implemented on different layer, not in text encoding

                      Like, there's many, many ways to keep bits from corrupting, which are applicable in different cases
                      And forcing one particular inside of text encoding itself is...meh

                      Same for compression btw. For some texts (CJK in particular) UTF-8 is sub-optimal, but even basic deflate makes it compact enough

                      TL;DR: UTF-8 is not perfect, but having one encoding for every text outweighs

                      @tek

                      tek@freeradical.zoneT This user is from outside of this forum
                      tek@freeradical.zoneT This user is from outside of this forum
                      tek@freeradical.zone
                      wrote last edited by
                      #10

                      @mo @vathpela Also, UTF-8 is trivially easy to synchronize. If you delete a byte out of the middle of a file, at most you’ll lost the one affected character (well, code point). The ones before and after it will be fine. That’s not true of some other Unicode encodings, like double width ones where everything after would be out of sync.

                      root42@chaos.socialR 1 Reply Last reply
                      0
                      • tek@freeradical.zoneT tek@freeradical.zone

                        Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.

                        fabian@mainz.socialF This user is from outside of this forum
                        fabian@mainz.socialF This user is from outside of this forum
                        fabian@mainz.social
                        wrote last edited by
                        #11

                        @tek Still I am regularly confronted with IT systems that do not (properly) support it and display my name with an umlaut wrong.

                        1 Reply Last reply
                        0
                        • tek@freeradical.zoneT tek@freeradical.zone

                          Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.

                          madduci@mastodon.socialM This user is from outside of this forum
                          madduci@mastodon.socialM This user is from outside of this forum
                          madduci@mastodon.social
                          wrote last edited by
                          #12

                          @tek and it is still being handled wrongly in many places

                          1 Reply Last reply
                          0
                          • tek@freeradical.zoneT tek@freeradical.zone

                            @mo @vathpela Also, UTF-8 is trivially easy to synchronize. If you delete a byte out of the middle of a file, at most you’ll lost the one affected character (well, code point). The ones before and after it will be fine. That’s not true of some other Unicode encodings, like double width ones where everything after would be out of sync.

                            root42@chaos.socialR This user is from outside of this forum
                            root42@chaos.socialR This user is from outside of this forum
                            root42@chaos.social
                            wrote last edited by
                            #13

                            @tek This! UTF-8 is a great encoding. Unicode can be a mess at times though. 🙂

                            1 Reply Last reply
                            0
                            • tek@freeradical.zoneT tek@freeradical.zone

                              Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.

                              debaer@23.socialD This user is from outside of this forum
                              debaer@23.socialD This user is from outside of this forum
                              debaer@23.social
                              wrote last edited by
                              #14

                              @tek But UTF-EBCDIC is still younger than EBCDIC was when UTF-EBCDIC was invented.

                              1 Reply Last reply
                              0
                              • vathpela@infosec.exchangeV vathpela@infosec.exchange

                                @tek and it still sucks

                                djl@mastodon.mit.eduD This user is from outside of this forum
                                djl@mastodon.mit.eduD This user is from outside of this forum
                                djl@mastodon.mit.edu
                                wrote last edited by
                                #15

                                @vathpela @tek

                                Nah. It stopped sucking when Unicode became variable-width even in a 32-bit encoding. Or at least it no longer became valid to correctly point out that it sucks, since there now isn't anything that doesn't.

                                1 Reply Last reply
                                0
                                • mxk@hachyderm.ioM mxk@hachyderm.io

                                  @vathpela @tek I would argue that in modern times this really shouldn't be an issue to be concerned about. It's not like telnet and plain serial connections are still most central communication protocols. And if your storage is causing bit flips you have other issues than readable plain text.

                                  ahltorp@mastodon.nuA This user is from outside of this forum
                                  ahltorp@mastodon.nuA This user is from outside of this forum
                                  ahltorp@mastodon.nu
                                  wrote last edited by
                                  #16

                                  @mxk @vathpela @tek I don’t know any way to run telnet over a non-checksummed connection.

                                  1 Reply Last reply
                                  0
                                  • tek@freeradical.zoneT tek@freeradical.zone

                                    Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.

                                    timwardcam@c.imT This user is from outside of this forum
                                    timwardcam@c.imT This user is from outside of this forum
                                    timwardcam@c.im
                                    wrote last edited by
                                    #17

                                    @tek Every now and then the Cambridge CST exam papers include a question like "explain why even experienced programmers sometimes have problems with character codes".

                                    You could write pretty well anything you liked.

                                    Originally what was expected was an essay about things like escape sequences on Flexowriter tapes; in my day it was about conversion between EBCDIC and ASCII; these days it might be about obscure characters in URLs.

                                    1 Reply Last reply
                                    0
                                    • mo@mastodon.mlM mo@mastodon.ml

                                      @vathpela IMHO, redundancy and/or checksums should be implemented on different layer, not in text encoding

                                      Like, there's many, many ways to keep bits from corrupting, which are applicable in different cases
                                      And forcing one particular inside of text encoding itself is...meh

                                      Same for compression btw. For some texts (CJK in particular) UTF-8 is sub-optimal, but even basic deflate makes it compact enough

                                      TL;DR: UTF-8 is not perfect, but having one encoding for every text outweighs

                                      @tek

                                      mansr@society.oftrolls.comM This user is from outside of this forum
                                      mansr@society.oftrolls.comM This user is from outside of this forum
                                      mansr@society.oftrolls.com
                                      wrote last edited by
                                      #18

                                      @mo @vathpela @tek Variable length encoding adds a little complexity at the input and output stages, but I think the benefits outweigh that, especially the 8-bit compatibility that allows a lot of software to work (at least to some extent) unmodified.

                                      1 Reply Last reply
                                      0
                                      • tek@freeradical.zoneT tek@freeradical.zone

                                        Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.

                                        jaddle@toot.communityJ This user is from outside of this forum
                                        jaddle@toot.communityJ This user is from outside of this forum
                                        jaddle@toot.community
                                        wrote last edited by
                                        #19

                                        @tek
                                        And yet, my bank still won't let me add a contact (for etransfers) with an accent in their name.

                                        1 Reply Last reply
                                        0
                                        • enno@mastodon.gamedev.placeE This user is from outside of this forum
                                          enno@mastodon.gamedev.placeE This user is from outside of this forum
                                          enno@mastodon.gamedev.place
                                          wrote last edited by
                                          #20

                                          @tek @loke @vathpela there is a BOM defined for UTF-8, as pointless as that may seem, and it's screwing up that whole beautiful ASCII compatibility whenever someone uses it.

                                          loke@functional.cafeL 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups