6

I have foo table and would like to set bar column to a random string. I've got the following query:

update foo
set bar = select string_agg(substring('0123456789bcdfghjkmnpqrstvwxyz', round(random() * 30)::integer, 1), '')
          from generate_series(1, 9);

But it generates the random string once and reuse it for all rows. How can I make it to generate one random string for each row?

I know I can make it a function like this:

create function generate_bar() returns text language sql as $$
  select string_agg(substring('0123456789bcdfghjkmnpqrstvwxyz', round(random() * 30)::integer, 1), '')
  from generate_series(1, 9)
$$;

and then call the function in the update query. But I'd prefer to do it without a function.

4
  • 1
    Could you please refer this link Commented Jan 4, 2021 at 10:20
  • 2
    I'm not following. Why should I refer to the link? Commented Jan 4, 2021 at 10:24
  • 1
    Because using a custom function is the cleanest way to do this. Commented Jan 4, 2021 at 14:45
  • Right. Created as a function. Commented Jan 4, 2021 at 15:06

4 Answers 4

3

For a random mixed-case numeric-inclusive string containing up to 32 characters use:

UPDATE "foo" SET "bar"= substr(md5(random()::text), 0, XXX);

and replace XXX with the length of desired string plus one. To replace all with length 32 strings, Example:

UPDATE "foo" SET "bar"= substr(md5(random()::text), 0, 33);

14235ccd21a408149cfbab0a8db19fb2 might be a value that fills one of the rows. Each row will have a random string but not guaranteed to be unique.

For generating strings with more than 32 characters

Just combine the above with a CONCAT

Sign up to request clarification or add additional context in comments.

Comments

2

The problem is that the Postgres optimizer is just too smart and deciding that it can execute the subquery only once for all rows. Well -- it is really missing something obvious -- the random() function makes the subquery volatile so this is not appropriate behavior.

One way to get around this is to use a correlated subquery. Here is an example:

update foo
    set bar = array_to_string(array(select string_agg(substring('0123456789bcdfghjkmnpqrstvwxyz', round(random() * 30)::integer, 1), '')
                                    from generate_series(1, 9)
                                    where foo.bar is distinct from 'something'
                                   ), '');

Here is a db<>fiddle.

Comments

0

Not as good as the answer, but if you want to generate a random string with few letters, you could also use:

UPDATE foo
    SET bar = CONCAT(
        SUBSTRING('abcdefghijklmnopqrstuvwxyz', round(random() * 26)::integer + 1, 1),
        SUBSTRING('abcdefghijklmnopqrstuvwxyz', round(random() * 26)::integer + 1, 1),
        SUBSTRING('abcdefghijklmnopqrstuvwxyz', round(random() * 26)::integer + 1, 1))
        );

Comments

0

Here's a sane function that picks from allowed characters:

CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$
select
  string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word
from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789')) as symbols(characters)
  join generate_series(1, $1) on 1 = 1
$$ language sql;  

Then use it as:

UPDATE mytable SET col1 = random_string(10), col2 = random_string(20);

Minimal test:

CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$
select
  string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word
from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789    --')) as symbols(characters)
  join generate_series(1, $1) on 1 = 1
$$ language sql;  
DROP TABLE IF EXISTS tmp;
CREATE TABLE "tmp" ("i" INTEGER, "j" INTEGER, "s" TEXT, "t" TEXT);
INSERT INTO "tmp" (i, j, s, t) SELECT i, i*2, 'a', 'b' FROM generate_series(1, 10) as s(i);
SELECT * FROM "tmp";
UPDATE "tmp" SET s = random_string(10), t = random_string(20);
SELECT * FROM "tmp";

which outputs:

CREATE FUNCTION
DROP TABLE
CREATE TABLE
INSERT 0 10
 i  | j  | s | t 
----+----+---+---
  1 |  2 | a | b
  2 |  4 | a | b
  3 |  6 | a | b
  4 |  8 | a | b
  5 | 10 | a | b
  6 | 12 | a | b
  7 | 14 | a | b
  8 | 16 | a | b
  9 | 18 | a | b
 10 | 20 | a | b
(10 rows)

UPDATE 10
 i  | j  |     s      |          t           
----+----+------------+----------------------
  1 |  2 | pqb0jVp i  | PImey082XovRskbK5mxY
  2 |  4 | DqOtVlf5r4 | 13MPe1WAiTi4Pr pEGHK
  3 |  6 | AITONX Xzg | VTU4gKsN4fuoRR8dVb7o
  4 |  8 | PcmsD5t1g- | JV4ohJ DtKGKwc kRGJ
  5 | 10 | oJ-RtapI-q | G XBIP2UqGpxOSroY3s7
  6 | 12 | ScecWoJ6jy | JDWdjTFBm0rseuVwqdJa
  7 | 14 | 3bigPU7GHG | 1u VEgNIhXYf ZZa7z2W
  8 | 16 |   4vLHduh- | Zk20QXq1t  Jb2fevaQ 
  9 | 18 | sW t7Jzr3v | Cvr3aD wd H8jdgHvSq
 10 | 20 | F3ylfYcqe4 | 0ccHaM9XW-Qzg2tV-gI0
(10 rows)

so we see that both columns were updated with different values each time.

As per my quick benchmark at: SQL Populate table with random data it is about 10x slower than md5(random()::text.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.