Skip to main content
17 events
when toggle format what by license comment
Oct 9, 2017 at 11:57 history edited user13628 CC BY-SA 3.0
added 362 characters in body
Apr 30, 2017 at 15:02 comment added tchrist @guifa If you’re doing it right, then NFC vs NFD won’t matter. The easiest solution is echo blahblahblah | perl -CS -nle 'print scalar(() = /\p{alnum}/g)' which won’t care. That way Jason's cited answer yields 195 whether in NFC or NFD. Another approach is to parse out extended grapheme clusters each time (a base character plus optional combining characters), perhaps using the \X notation in a regex. And if we're code golfing, you can always shorten that to perl -CS -nE'say$x=()=/\p{alnum}/g' if it pleases you to do so. I suppose I should add these methods since they're easier.
Mar 27, 2017 at 19:32 history edited Jason C CC BY-SA 3.0
seems more polite to stick to first-come-first-serve order instead of putting mine first
Mar 27, 2017 at 18:54 history edited user0721090601 CC BY-SA 3.0
fixed bookmarklet
Mar 27, 2017 at 18:50 comment added Jason C @guifa AFAIK: It serves all text encoded as UTF-8, unmodified, exactly as entered, sometimes served as HTML entities (but without modification), and does not do any normalization, and normalization, if any, would be done by your browser on submit or presentation. (Sorry about the double-ping; I replaced my previous comment with one that has more of a disclaimer vibe... 😄)
Mar 27, 2017 at 18:46 history edited Jason C CC BY-SA 3.0
test case
Mar 27, 2017 at 18:36 history edited Jason C CC BY-SA 3.0
added 78 characters in body
Mar 27, 2017 at 18:27 comment added user0721090601 @JasonC Does SE use NFC or NFD? (no one should be using NFC, but alas, it is maddeningly common)
Mar 27, 2017 at 18:26 comment added Jason C @guifa It does not. I just tested it with spanish.stackexchange.com/a/20246 and it reports 190. The correct count is 195. It could be related to localization options on the user's system, or ES6 support.
Mar 27, 2017 at 18:25 comment added user0721090601 @JasonC I'm not sure why mine wouldn't. By using the /u option, the regex should view a word like "qué" as q-u-e-´, and then would remove the ´.
Mar 27, 2017 at 18:24 history edited Jason C CC BY-SA 3.0
added 245 characters in body
Mar 27, 2017 at 18:17 history edited Jason C CC BY-SA 3.0
made fiddle
Mar 27, 2017 at 18:03 history edited user0721090601 CC BY-SA 3.0
added 631 characters in body
Mar 27, 2017 at 17:53 history edited user0721090601 CC BY-SA 3.0
added 631 characters in body
Mar 27, 2017 at 11:15 history edited fedorquiMod CC BY-SA 3.0
added 331 characters in body
S Mar 27, 2017 at 11:08 history answered Charlie CC BY-SA 3.0
S Mar 27, 2017 at 11:08 history made wiki Post Made Community Wiki by Charlie