Intl.Collator sorting in Node, Chrome, Webkit, Firefox and Postgres Collation differences

Ask Question

Asked 1 year, 7 months ago

Modified 1 year, 7 months ago

Viewed 65 times

I am comparing the collation of Intl.Collator in the three browsers, Node and Postgres Collation and realized that the order of most implementations are very different. The only two implementations matching each other for the locale 'en' are Chrome and Firefox. For example the letters from MATHEMATICAL SANS-SERIF ITALIC SMALL A to Z are sorted together with the letters a - z in Intl with locales as e.g. en in Chrome and Firefox but not in Postgres. I was thinking that all implementations are based on the same CLDR data. The differences between the implementations are (Tested with Playwright):

Chrome - Node 1474
Chrome - Webkit 34727
Chrome - Firefox 0
Chrome - Postgres 25781
Webkit - Node 34727
Webkit - Postgres 34727
Node   - Postgres 34892

A more predictable sort order would enable to sort agnostic in the frontend and backend.

The code is as follows:

The Unicode data is parsed from 'UnicodeData.txt'

export async function parseUnicodeData(unicodePath: string): Promise<UnicodeData []> {
  return (await fs.promises.readFile(path.join(unicodePath, 'UCD/UnicodeData.txt'), 'utf-8'))
    .split('\n')
    .filter((line) => line !== '')
    .map((line) => line.split(';'))
    .map(([codeValue]) => ({codeValue}));
}

In Postgres:

CREATE TABLE unicode.character (
  character text PRIMARY KEY
);

-- Filtered codepoints: '0000', 'D800', 'DB7F', 'DB80', 'DBFF', 'DC00', 'DFFF'

INSERT INTO unicode.character VALUES
  (chr(1)),
  (chr(2)),
  ...
  (chr(1114109));

Then in node:

import {Pool} from 'pg';
import {parseUnicodeData} from 'util-unicode-parser';
import {chromium, webkit, firefox} from 'playwright';

const pool = new Pool();

export async function compareCollations(
  unicodeDirectory: string,
  postgresCollationName: string,
  intlLocale: string,
  intlSettings = '{}',
) {
  // setting up playwright
  const pages = await Promise.all([chromium, webkit, firefox]
    .map((browserType) => browserType.launch({headless: true}))
    .map(async (browser) => (await browser).newContext())
    .map(async (context) => (await context).newPage()));

  // parsing unicode data
  const unicodeData = (await parseUnicodeData(unicodeDirectory))
    // codepoints filtered because can not be inserted into Postgres
    .filter(({codeValue}) => !['0000', 'D800', 'DB7F', 'DB80', 'DBFF', 'DC00', 'DFFF'].includes(codeValue));
  const browserCollationString = `[${unicodeData
      .map(({codeValue}) => 
      `String.fromCodePoint(parseInt('${codeValue}', 16))`).join(',')
    }].sort(new Intl.Collator('${intlLocale}', ${intlSettings}).compare)`;

  // creating sorted arrays of all characters
  const nodeCollation = unicodeData.map(({codeValue}) => String.fromCodePoint(parseInt(codeValue, 16))).sort(new Intl.Collator(intlLocale, JSON.parse(intlSettings)).compare);
  const chromeCollation = await pages[0].evaluate(browserCollationString);
  const webkitCollation = await pages[1].evaluate(browserCollationString);
  const firefoxCollation = await pages[2].evaluate(browserCollationString);
  const postgresCollation = (await pool.query(`SELECT character from unicode.character ORDER BY character COLLATE "${postgresCollationName}";`))
    .rows.map(({character}) => character);

  // comparing sorted arrays only first pair is shown
   console.log(chromeCollation.map((c, i) => [c, nodeCollation[i], c.codePointAt(0), nodeCollation[i].codePointAt(0)]).filter(([c, r, pc, rc]) => c !== r ).sort((a, b) => a[2] < b[2] ? -1 : 1).length) 
  // ...
}

edited Apr 5, 2024 at 9:21

asked Apr 5, 2024 at 7:33

Florat

3402 silver badges13 bronze badges

Can you share the code involved?

Nico Haase
– Nico Haase

2024-04-05 07:43:26 +00:00
Commented Apr 5, 2024 at 7:43

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Intl.Collator sorting in Node, Chrome, Webkit, Firefox and Postgres Collation differences

0

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.