Yeah, let's have a go with the ACI (anti-coercion instrument) and see if we can't make their patents free game. Playing to Trump's tune is unlikely to work out well
Yes I'm being sarcastic, but I also think utf-8 is plaintext these days. I really can't spell my name in US ASCII. Like the other commenter here went into more detail on, it has its history, but isn't suited for today's international computer users.
It's also some surprise internal representation as utf-16; that's at least still in the realm of Unicode. Would also expect there's utf-32 still floating around somewhere, but I couldn't tell you where.
And is mysql still doing that thing with utf8 as a noob trap and utf8_for_real_we_mean_it_this_time_honest or whatever they called it as normal utf8?
Yes, I am joking. We probably could do something like the old iso-646 or whatever it was that swapped letters depending on locale (or equivalent), but it's not something we want to return to.
It's also not something we're entirely free of: Even though it's mostly gone, apparently Bulgarian locales do something interesting with Cyrillic characters. cf https://tonsky.me/blog/unicode/
To unjerk, as it were, it was a thing. So on old systems they'd do stuff like represent æøå with the same code points as {|}. Curly brace languages must have looked pretty weird back then:)
Q. P is a common character across languages. But Q is mostly unused, at least outside the romance languages who appear to spell K that way. But that can be solved by letting the characters have the same code point, and rendering it as K in most regions, and Q in France. I can't imagine any problems arising from that. :)
It's a joke because it includes useless letters nobody needs, like that weird o with the leg, and a rich set of field and record separating characters that are almost completely forgotten, etc, but not normal letters used in everyday language >:(
Isn't that sort of just the cost of doing business in C? It's a sparse language, so it falls to the programmer to cobble together more.
I do also think the concrete example of emails should be taken as a stand-in. Errors like swapping a parameter for an email application is likely not very harmful and detected early given the volume of email that exists. But in other, less fault-tolerant applications it becomes a lot more valuable.
It is pretty funny that C's type system can be described pretty differently based on the speaker's experience. The parable of the Blub language comes to mind.
Parsing is a way of "validating early". You either get a successful parse and the program continues working on known-good data with that knowledge encoded in the type system, or you handle incorrect data as soon as it's encountered.
I used Ratpoison for well over a decade, and only replaced it with sway once I had a new machine and figured it was time to try Wayland. Apparently that's some 4-5 years ago already.
I feel I gotta point out it's a pretty funny example—email comes up so frequently as a thing that you're recommended to neither parse nor validate, just try to send an email to the address and see if it works. If you need to know that it was received successfully, a link to click is the general method.
But "parse, don't validate" is still a generally good idea, no matter the example used. :)
Yeah, let's have a go with the ACI (anti-coercion instrument) and see if we can't make their patents free game. Playing to Trump's tune is unlikely to work out well