The most interesting thing for me in this article was the mention of `MERGE` almost in passing at the end.
> I'm not a big fan of using the constraint names in SQL, so to overcome both limitations I'd use MERGE instead:
```
db=# MERGE INTO urls t
USING (VALUES (1000004, 'https://hakibenita.com')) AS s(id, url)
ON t.url = s.url
WHEN MATCHED THEN UPDATE SET id = s.id
WHEN NOT MATCHED THEN INSERT (id, url) VALUES (s.id, s.url);
MERGE 1
```
I use `insert ... on conflict do update ...` all the time to handle upserts, but it seems like merge may be more powerful and able to work in more scenarios. I hadn't heard of it before.
Great article, shows a lot of interesting PostgreSQL features. I have used PostgreSQL and MySQL for decades, and this article showed me that I have barely scratched the surface of what is possible.
It's interesting how both virtual columns and hash indexes work, but feel like they're bolted on, vs being made part of the whole ecosystem so that they work seamlessly.
The article explains why they want to avoid this option:
> Starting at version 14, PostgreSQL supports generated columns - these are columns that are automatically populated with an expression when we insert the row. Sounds exactly like what we need but there is a caveat - the result of the expression is materialized - this means additional storage, which is what we were trying to save in the first place!
Thanks, missed that part. I would still be interested in knowing how much additional storage that adds, if the OP is interested in updating the article.
>Currently, constraint exclusion is enabled by default only for cases that are often used to implement table partitioning via inheritance trees. Turning it on for all tables imposes extra planning overhead that is quite noticeable on simple queries, and most often will yield no benefit for simple queries.
PG's lack of plan caching strikes again, this sort of thing is not a concern in other DB's that reuse query plans.
PG does reuse plans, but only if you prepare a query and run it more than 5 times on that connection. See plan_cache_mode[0] and the PREPARE docs it links to. This works great on simple queries that run all the time.
It sometimes really stinks on some queries since the generic plan can't "see" the parameter values anymore. E.g. if you have an index on (customer_id, item_id) and run a query where `customer_id = $1 AND item_id = ANY($2)` ($2 is an array parameter), the generic query plan doesn't know how many elements are in the array and can decide to do an elaborate plan like a bitmap index scan instead of a nested loop join. I've seen the generic plan flip-flop in a situation like this and have a >100x load difference.
The plan cache is also per-connection, so you still have to plan a query multiple times. This is another reason why consolidating connections in PG is important.
The most interesting thing for me in this article was the mention of `MERGE` almost in passing at the end.
> I'm not a big fan of using the constraint names in SQL, so to overcome both limitations I'd use MERGE instead:
``` db=# MERGE INTO urls t USING (VALUES (1000004, 'https://hakibenita.com')) AS s(id, url) ON t.url = s.url WHEN MATCHED THEN UPDATE SET id = s.id WHEN NOT MATCHED THEN INSERT (id, url) VALUES (s.id, s.url); MERGE 1 ```
I use `insert ... on conflict do update ...` all the time to handle upserts, but it seems like merge may be more powerful and able to work in more scenarios. I hadn't heard of it before.
Great article, shows a lot of interesting PostgreSQL features. I have used PostgreSQL and MySQL for decades, and this article showed me that I have barely scratched the surface of what is possible.
Remarkably fresh content.
It's interesting how both virtual columns and hash indexes work, but feel like they're bolted on, vs being made part of the whole ecosystem so that they work seamlessly.
I think a stored generated column allows you to create an index on it directly. Isn't it better approach?
The article explains why they want to avoid this option:
> Starting at version 14, PostgreSQL supports generated columns - these are columns that are automatically populated with an expression when we insert the row. Sounds exactly like what we need but there is a caveat - the result of the expression is materialized - this means additional storage, which is what we were trying to save in the first place!
Thanks, missed that part. I would still be interested in knowing how much additional storage that adds, if the OP is interested in updating the article.
>I think a stored generated column allows you to create an index on it directly. Isn't it better approach?
Is it also possible to create index (maybe partial index) on expressions?
I assume it would increase the storage usage, which they say they are trying to avoid in that example.
some points from this article that I didn't know before.
>Currently, constraint exclusion is enabled by default only for cases that are often used to implement table partitioning via inheritance trees. Turning it on for all tables imposes extra planning overhead that is quite noticeable on simple queries, and most often will yield no benefit for simple queries.
PG's lack of plan caching strikes again, this sort of thing is not a concern in other DB's that reuse query plans.
PG does reuse plans, but only if you prepare a query and run it more than 5 times on that connection. See plan_cache_mode[0] and the PREPARE docs it links to. This works great on simple queries that run all the time.
It sometimes really stinks on some queries since the generic plan can't "see" the parameter values anymore. E.g. if you have an index on (customer_id, item_id) and run a query where `customer_id = $1 AND item_id = ANY($2)` ($2 is an array parameter), the generic query plan doesn't know how many elements are in the array and can decide to do an elaborate plan like a bitmap index scan instead of a nested loop join. I've seen the generic plan flip-flop in a situation like this and have a >100x load difference.
The plan cache is also per-connection, so you still have to plan a query multiple times. This is another reason why consolidating connections in PG is important.
0: https://www.postgresql.org/docs/current/runtime-config-query...