Why postgres function so slow but single query is fast?

Question

I have function to get employee in 'Create' status.

    CREATE OR REPLACE FUNCTION get_probation_contract(AccountOrEmpcode TEXT, FromDate DATE,
                                                  ToDate           DATE)
  RETURNS TABLE("EmpId" INTEGER, "EmpCode" CHARACTER VARYING,
  "DomainAccount" CHARACTER VARYING, "JoinDate" DATE,
  "ContractTypeCode" CHARACTER VARYING, "ContractTypeName" CHARACTER VARYING,
  "ContractFrom" DATE, "ContractTo" DATE, "ContractType" CHARACTER VARYING,
  "Signal" CHARACTER VARYING) AS $$
BEGIN
  RETURN QUERY
  EXECUTE 'SELECT
    he.id                                                                       "EmpId",
    rr.code                                                                     "EmpCode",
    he.login                                                                    "DomainAccount",
    he.join_date                                                                "JoinDate",
    contract_type.code                                                          "ContractTypeCode",
    contract_type.name                                                          "ContractTypeName",
    contract.date_start                                                         "ContractFrom",
    contract.date_end                                                           "ContractTo",
    CASE WHEN contract_group.code = ''1'' THEN ''Probation''
    WHEN contract_group.code IN (''3'', ''4'', ''5'') THEN ''Official''
    WHEN contract_group.code = ''2'' THEN ''Collaborator'' END :: CHARACTER VARYING "ContractType",
    ''CREATE'' :: CHARACTER VARYING                                               "Signal"
  FROM
    hr_employee he
    INNER JOIN resource_resource rr
      ON rr.id = he.resource_id
    INNER JOIN hr_contract contract
      ON contract.employee_id = he.id AND contract.date_start = (
      SELECT max(date_start) "date_start"
      FROM hr_contract cc
      WHERE cc.employee_id = contract.employee_id
    )
    INNER JOIN hr_contract_type contract_type
      ON contract_type.id = contract.type_id
    INNER JOIN hr_contract_type_group contract_group
      ON contract_group.id = contract_type.contract_type_group_id
  WHERE
    contract_group.code = ''1''


    AND
    ($1 IS NULL OR $1 = '''' OR rr.code = $1 OR
     he.login = $1)
    AND (
      (he.join_date BETWEEN $2 AND $3)
      OR (he.join_date IS NOT NULL AND (contract.date_start BETWEEN $2 AND $3))
      OR (he.create_date BETWEEN $2 AND $3 AND he.create_date > he.join_date)
    )
    AND rr.active = TRUE
'using AccountOrEmpcode, FromDate, ToDate ;
END;
$$ LANGUAGE plpgsql;

It took 37 second to execute

SELECT *
 FROM get_probation_contract('', '2014-01-01', '2014-06-01');

When I use single query

    SELECT
      he.id                                                                       "EmpId",
      rr.code                                                                     "EmpCode",
      he.login                                                                    "DomainAccount",
      he.join_date                                                                "JoinDate",
      contract_type.code                                                          "ContractTypeCode",
      contract_type.name                                                          "ContractTypeName",
      contract.date_start                                                         "ContractFrom",
      contract.date_end                                                           "ContractTo",
      CASE WHEN contract_group.code = '1' THEN 'Probation'
      WHEN contract_group.code IN ('3', '4', '5') THEN 'Official'
      WHEN contract_group.code = '2' THEN 'Collaborator' END :: CHARACTER VARYING "ContractType",
      'CREATE' :: CHARACTER VARYING                                               "Signal"
    FROM
      hr_employee he
      INNER JOIN resource_resource rr
        ON rr.id = he.resource_id
      INNER JOIN hr_contract contract
        ON contract.employee_id = he.id AND contract.date_start = (
        SELECT max(date_start) "date_start"
        FROM hr_contract
        WHERE employee_id = he.id
      )
      INNER JOIN hr_contract_type contract_type
        ON contract_type.id = contract.type_id
      INNER JOIN hr_contract_type_group contract_group
        ON contract_group.id = contract_type.contract_type_group_id
    WHERE
      contract_group.code = '1'
AND (
      (he.join_date BETWEEN '2014-01-01' AND '2014-06-01')
      OR (he.join_date IS NOT NULL AND (contract.date_start BETWEEN '2014-01-01' AND '2014-01-06'))
      OR (he.create_date BETWEEN '2014-01-01' AND '2014-01-06' AND he.create_date > he.join_date)
    )
    AND rr.active = TRUE

It take 5 second to complete

How to optimize the function above. and why function is slow than single query so much even I use execute 'select ...' in function.

Indexing in field id each table.

YOu seem to have missed a part in your test or in your paste of the query: After contract_group.code = '1' you have AND ETC in the function. Also provide plan output, indexes in place to see if there is a hint there. — Norbert, Commented Mar 17, 2015 at 3:05
thank for your comment. I added query and indexing information. — giaosudau, Commented Mar 17, 2015 at 3:13
Sorry, you are still missing a part in your test query: AND ($1 IS NULL OR $1 = '''' OR rr.code = $1 OR he.login = $1) Plus your hint "using AccountOrEmpcode, FromDate, ToDate " seems to be missing in your test. — Norbert, Commented Mar 17, 2015 at 4:33
I don't see your version of Postgres in the question? And only id columns are indexed? — Erwin Brandstetter, Commented Mar 17, 2015 at 12:44

Pavel Stehule · Accepted Answer · 2015-03-17 05:38:18Z

Possible reason is a blind optimization for prepared statements (embedded SQL). It is little bit better in new PostgreSQL releases, although it can be the issue there too. Execution plan in embedded SQL in PL/pgSQL is reused for more calls - and it is optimized for more often value (not for really used value). Sometimes this difference can make really big slowdowns.

Then you can use dynamic SQL - EXECUTE statement. Dynamic SQL uses only once executed plans and it uses real parameters. It should to fix this issue.

Example of embedded SQL with reused prepared plans.

CREATE OR REPLACE FUNCTION fx1(_surname text)
RETURNS int AS $$
BEGIN
  RETURN (SELECT count(*) FROM people WHERE surname = _surname)
END;

Example with dynamic SQL:

CREATE OR REPLACE FUNCTION fx2(_surname text)
RETURNS int AS $$
DECLARE result int;
BEGIN
  EXECUTE 'SELECT count(*) FROM people WHERE surname = $1' INTO result
      USING _surname;
  RETURN result;
END;
$$ LANGUAGE plpgsql;

Second function can be faster if your dataset contains some terrible often surname - then common plan will be seq scan, but lot of time you will ask some other surname, and you will want to use index scan. Dynamical query parametrization (like ($1 IS NULL OR $1 = '''' OR rr.code = $1 OR) has same effect.

That does not seem to add up. The OP is already using EXECUTE. — Erwin Brandstetter, Commented Mar 17, 2015 at 12:44

Erwin Brandstetter · Accepted Answer · 2015-03-17 12:47:51Z

1

Your queries are not the same.

The first one has

WHERE cc.employee_id = contract.employee_id

where the second one has:

WHERE employee_id = he.id

And also:

($1 IS NULL OR $1 = '''' OR rr.code = $1 OR
 he.login = $1)

Please test again with identical queries and identical values.

answered Mar 17, 2015 at 12:47

Erwin Brandstetter

662k158 gold badges1.2k silver badges1.3k bronze badges

Add a comment |

Collectives™ on Stack Overflow

Why postgres function so slow but single query is fast?

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related