Add grouped-linemerge generalization by leijurv · Pull Request #2482 · osm2pgsql-dev/osm2pgsql

leijurv · 2026-06-01T02:11:14Z

Summary

This PR adds a new generalization strategy, grouped-linemerge, to osm2pgsql-gen. It maintains a table of merged linestrings, the equivalent of:

SELECT col1, col2, ..., (ST_Dump(ST_LineMerge(ST_Collect(geom)))).geom
FROM src
GROUP BY col1, col2, ...
WHERE condition

The key is that this table is global, rather than local to your tile. It is kept up to date incrementally as the source data changes (and it never re-scans the entire planet to do so).

It works similar to the existing generalizations rivers and vector-union. You configure it from your flex style like osm2pgsql.run_gen('grouped-linemerge', {...}), and you run osm2pgsql-gen the same way (after import, and after update).

Motivation

The motivation is merging ways in OSM-Carto with the same rendering. Here's the relevant issue: openstreetmap-carto/openstreetmap-carto#951

OSM ways are often extremely fragmented, only going a block or two at a time in urban areas, due to relations and tagging changes. This significantly impairs rendering (see images near the bottom of that issue). We could LineMerge these ways within each tile as it's rendered, but this makes it really challenging to have consistent geometries for long ways that span metatile boundaries. (see openstreetmap-carto/openstreetmap-carto#5229 and openstreetmap-carto/openstreetmap-carto@master...leijurv:openstreetmap-carto:recursive-cte)

This PR allows us to have a global LineMerge for all ways, grouped only by the colunms that actually affect rendering (like name, ref, highway, layer, etc). There are no tile-boundary artifacts, and it can be maintained incrementally quite efficiently.

How it works

Initial build

The initial build is very simple - we just run that exact query to do a grouped ST_LineMerge across the whole world. This takes a few minutes, because the number of ways in each group is often not so large. We also make an index on the start and end points of each way, this is not strictly required but it significantly helps performance of the incremental update.

Incremental updates

Generalizations are able to receive the expire list of tiles whose geometry has changed. The expire list includes tiles where something was deleted, changed, or added. One possible approach here would be to select all roads in the entire planet that match any grouping key that was touched, and re-LineMerge them all. I ruled this out because some road names are very frequent. Imagine if moving a node on "Main Street" in some town, resulted in re-scanning every road named "Main Street" in the entire world. I assumed this was not an option from a performance perspective. Instead, I used a recursive CTE to spider through the geometry from the expired tile. Each changed area becomes a seed from which it walks outwards. To bound this walk, it must match an exact endpoint, and it must match every element of the grouping columns. This finds the full connected component, which replaces the old connected component geometry.

Usage

-- An expire output on the source geometry feeds the incremental updates:
local exp = osm2pgsql.define_expire_output({ maxzoom = 18, table = 'expire_roads' })
-- ... attach `expire = {{ output = exp }}` to the source table's geometry column ...

-- A destination table holding the grouping columns + the merged geometry
-- (no id column; it's maintained by osm2pgsql-gen):
osm2pgsql.define_table({
  name = 'coalesced_roads',
  columns = {
      { column = 'highway', type = 'text' }, { column = 'name', type = 'text' },
      { column = 'ref', type = 'text' },     { column = 'layer', type = 'int4' },
      -- … any other grouping columns …
      { column = 'way', type = 'linestring', not_null = true },
  }
})

function osm2pgsql.process_gen()
  osm2pgsql.run_gen('grouped-linemerge', {
      src_table        = 'planet_osm_line',
      dest_table       = 'coalesced_roads',
      geom_column      = 'way',
      group_by_columns = 'highway, name, ref, layer',  -- comma-separated
      where            = 'name IS NOT NULL OR ref IS NOT NULL', -- optional pre-filter
      expire_list      = 'expire_roads',  -- } required in
      zoom             = 18,              -- } append mode
  })
end

Then:

osm2pgsql      -O flex -S style.lua --slim … data.osm.pbf   # import (creates the tables)
osm2pgsql-gen          -S style.lua                         # initial build
osm2pgsql   -a -O flex -S style.lua --slim … changes.osc.gz # update
osm2pgsql-gen -a       -S style.lua                         # incremental update

Details

I believe this works in general for incremental updates, but the recursive CTE was challenging to get correct and performant. The reason for the new index is so that we can look up ways by their exact endpoint match. Without the new index, we'd use the existing GiST to lookup the way, which means that even if we ask PostGIS for ST_Intersect or some such, it will actually mechanically do && on the way's entire bbox. This works, to be clear, but it means that if I want to find a way that ends on an exact point, GiST will actually give me all the ways whose entire bbox covers that point, which means less than one percent of the ways that the index gives me are actually eligible. See here openstreetmap-carto/openstreetmap-carto#951 (comment) for discussion about that.

Tests

I put in tests/test-gen-grouped-linemerge.cpp a differential fuzz test. It performs hundreds of random connect/disconnects in a small grid of points. I intentionally set it up so the degree of each node could vary from 0 to 4, which stresses how ST_LineMerge does not merge nodes with a degree above 2. At all times, the incrementally updated geometry must exactly match what ST_LineMerge would have done.

I have run this locally and it works beautifully. Here's the commit to run it leijurv/openstreetmap-carto@b79c0fa and it does add a bunch of labels (looks like this openstreetmap-carto/openstreetmap-carto#951 (comment))

joto · 2026-06-01T15:55:18Z

From a first quick glimpse this looks really interesting. I'll need a while to digest this though. Can you look at the build failures and fix them?

leijurv · 2026-06-01T20:39:17Z

That's great to hear that it's interesting!!!

I've fixed the clang-tidy, and I've fixed a performance mistake.

Here are some numbers for what this does, and how long it takes, over downtown Los Angeles.

Note that in practice, we would never expire an entire tile at Z13. More likely Z18 or Z19. I'm just showing it for clarity.

Note that the most relevant column is probably walk, which is how long the CTE takes. Note that even in a nasty torture test, a large region of los angeles, that made the CTE go to a recursion depth of 433, the time it took is very small, 0.08 seconds in the worst case. I think this is a really good sign - but - I don't actually know what the performance requirements are here. Roughly, fetching all the ways in the affected tile, took about as long as the recursion (at higher zooms), so hopefully that's fine?

As we get to higher zooms, the reason why region doesn't get faster, is the issue I mentioned earlier. The GiST index on a linestring indexes the entire bbox. Therefore, even at "zoom level infinity" (looking up a single point), the GiST will unavoidably give you every way whose entire bbox covers that point (way && point), and then Postgres will then filter it down to ST_Intersects or whatever the query actually was. So, by the time we get to Z17 or Z18, it's almost like we are querying a single point, but there are tons of ways in a dense urban area whose full min-to-max bbox covers that point. (and this is why the cte needs a new index - otherwise every one of those up to 433 lookups would take perhaps 30ms each)

zoom	seed ways	ways traversed	nodes	CTE recursion depth	groups	output geoms	region	walk	collect	delete	insert	total
z13	1,793	3,792	4,430	433	450	783	38ms	81ms	39ms	28ms	49ms	234 ms
z14	616	1,855	2,014	433	135	212	29ms	46ms	23ms	17ms	18ms	133 ms
z15	171	410	452	89	44	60	30ms	20ms	8ms	6ms	9ms	74 ms
z16	50	185	200	38	17	25	30ms	17ms	10ms	5ms	4ms	67 ms
z17	15	152	161	38	9	11	28ms	18ms	11ms	5ms	4ms	66 ms
z18	5	88	92	38	4	6	29ms	16ms	6ms	7ms	4ms	61 ms
z19	4	88	92	38	4	6	31ms	16ms	6ms	7ms	4ms	63 ms
z20	2	58	60	38	2	2	30ms	17ms	9ms	5ms	3ms	64 ms

joto · 2026-06-02T08:09:36Z

If I understand this correctly you tested this on Los Angeles, did you also run tests on the full planet (or larger areas)? From my experience it is often the case that something works fine on a small dataset because basically the whole dataset fits into memory, but performance goes dramatically down if you run it on the full planet, even though each individual query only looks at local data.

Generally the performance requirements are that we need to be able to keep up with minutely changes on a reasonable machine. And ther still needs to be enough capacity left so that we can do the actual querying of the database for rendering. That's rather vague, I know.

leijurv · 2026-06-02T16:10:35Z

I have downloaded California, New York, New Jersey, Texas, Poland, Netherlands, Sweden, some of Japan, and some of England. It's larger than what fits into my ram.

I think it's reasonable to compare "how long does it take to get all lines && the updated tile" with "how long is the rest of the incremental update"?

I will look into trying to measure performance "cold" with nothing cached in memory.

If performance is a problem, I know for sure I could make it much faster if the generalization could receive the exact ways that were edited, like carto's flex lua code, rather than Z/X/Y tiles. That way the slow part (the &&tile_box) can be skipped entirely, and the remaining steps would only walk the network of the actually edited way(s).

leijurv · 2026-06-02T17:23:44Z

Here is an updated table. I restarted postgres between each run, and I used drop_caches as well. It is several times slower which is probably what you expected? The last paragraph of the last comment ^ still applies. I did some rough tests and I can get the total time to maybe 80ms cold, 40ms warm for a single way (recursion depth was a few dozen, I picked a long way. and zoom level is now irrelevant) if the generalization could know the exact endpoints of the way that got expired (way_add, modify, delete).

expired tile	region	walk	collect	delete	insert	total
z13	74	267	42	206	53	642 ms
z14	66	176	20	95	23	382 ms
z15	85	96	9	53	16	259 ms
z16	73	76	6	41	16	212 ms
z17	65	58	6	29	6	164 ms
z18	69	55	6	28	6	165 ms
z19	67	54	7	28	5	161 ms
z20	71	60	9	27	8	175 ms

leijurv · 2026-06-02T17:50:52Z

But surely this can be afforded? Doesn't osm.org have on the order of ~1 change per second? And much of this is selects that are latency-bound not bandwidth-bound so given multithreading this should be basically fine I'd hope

joto · 2026-06-03T13:01:48Z

If performance is a problem, I know for sure I could make it much faster if the generalization could receive the exact ways that were edited, like carto's flex lua code, rather than Z/X/Y tiles. That way the slow part (the &&tile_box) can be skipped entirely, and the remaining steps would only walk the network of the actually edited way(s).

You don't need to base this on tiles, you can base this on the ways that changed! For instance, you can have a separate table where you store (the "way" processing function) all the ids of ways that changed, then use that table as input for the rest of the processing in the "gen" step. But it is not as easy as one would think to implement this. You have to correctly handle deleted ways and all the different kinds of change, including cases like this: Say you have two ways that are not connected, now a third way is changed which makes it connect to both existing and unchanged ways allowing all of them to be merged.

so given multithreading this should be basically fine I'd hope

All of this is I/O bound so multithreading is not going to help.

leijurv · 2026-06-03T17:49:32Z

Okay, I will try to figure out how to get a table of ways that changed 👍

For the case of a new way that connects existing ways, yes that's the rub. Your case can already happen currently, when a way is itself larger than the tile. It's actually even worse than that, because ST_LineMerge will stop merging when the degree of a node goes from 2 to 3, so adding a way can actually split a road geometry. There were broadly two ways to do this: 1. write extremely complicated recursive CTE queries to try to fix up exactly the "edit" and no further (in your case, it would just connect those two geometries without recursively exploring them on the input table side) or 2. explore the entire geometry, delete the entire geometry, re-ST_LineMerge it. I picked the latter in this PR.

I actually think multithreading would help specifically because of how the recursive CTE requires many serial lookups, that is I/O bound but there is a difference between being bound by I/O latency for dependent/sequential random lookup versus being bound by I/O throughput (whether the lookups are sequential or random). So if you were to have several tiles that are all expired, you would probably get a speedup with each thread expiring one of them, versus one thread expiring all sequentially (note: this PR doesn't quite do that, it uses joins to get postgres to do many tiles "at once" internally). But my point on multithreading was not actually about two threads that are both doing expiring, rather about render threads not being blocked by expiring threads, because the expiring threads (and the render threads) are doing dependent lookups through the GiST structure with inherent latency in the processor-side logic from one read to the next.

In any event I will think about this more leijurv/openstreetmap-carto@b79c0fa This is how I had it, but I think I'll need to use a totally different method (not tile based at all). My initial hope is that whenever a way is added, updated, or deleted, I can simply add the exact (x,y) of both endpoints to a table of geometries to be updated, then the generalization can re-explore the geometry from there? That would be the happy / easy path, but I am probably missing something.

lonvia

The mix of tile expiry and recursive query is interesting. I actually think that the solution is simpler because of your use of tile-based expiry instead of way-based. I did an implementation with way-based expiry for waymarkedtrails and only got it to work by tracking the IDs of the source ways for each merged way. That would make this implementation significantly more complex.

ST_LineMerge has a few quirks which might lead to unexpected results but that is something that can be easily addressed later.

Performance stands an falls with the complexity of the merged ways, which in turn depends on the source table and the grouping. I'd like to run the import query on a planet to see how long it takes. That should give us a ballpark figure how expensive merging is.

lonvia · 2026-06-05T16:08:41Z

+      AND ST_Y(ST_StartPoint(d."{geom_column}")) = n.y)
+     OR (ST_X(ST_EndPoint(d."{geom_column}")) = n.x
+      AND ST_Y(ST_EndPoint(d."{geom_column}")) = n.y) )
+)");


Is this delete really necessary? Are there any cases of updating that wouldn't be covered by the DELETE below in line335?

Yes because the recursive CTE scan, that we insert in Step 5, includes fragments that are entirely outside the current tile.

Consider the following four ways, that will LineMerge from A,B,C,D to AB,C,D

| / | C | / --A--•--B--• | \ | D | \ ^ tile boundary

The line 335 delete will only delete the AB geometry. The line 325 delete is what deletes C, D, which is necessary, because Step 5 will reinsert all AB,C,D.

And conversely, the reason why 335 is needed as well as 325, is much easier to justify: when you delete a way that is entirely within this tile, line 335 deletes it from the linemerged table too.

The important question here is what exactly has changed. If A has changed its tags or geometry then both tiles right and left of the boundary will be expired and in your set or regions to recompute. While your CTE recurses outside the expiry tile area, it does not recurse to a merged way that doesn't intersect with a expired tile.

I might be wrong of course, which is why we are going to need lots and lots of tests for the update case, once the implementation has settled down a bit.

In this first example, A has not changed. But we don't know that: imagine this tile expired for some unrelated reason. The tile to the right has not expired.

The point is that if all I know is "something happened in this tile", I will explore every network of ways, some of which have changed, most of which haven't, but I don't know the difference.

lonvia · 2026-06-05T16:13:58Z

+            R"(CREATE INDEX IF NOT EXISTS "{idx_endpt}" ON {src} USING btree)"
+            R"( (ST_X(ST_EndPoint("{geom_column}")),)"
+            R"( ST_Y(ST_EndPoint("{geom_column}"))) {index_predicate})");
+    }


I understand your reasoning for an extra index here but I think this still could be a geometry index on the points, i.e.

CREATE INDEX ... USING gist(ST_StartPoint(geometry))

The you can use ST_Intersect on the start and end points below making the code a lot more readable.

Yes you are correct. GiST(ST_StartPoint(geometry)) is workable to replace btree(ST_X(ST_StartPoint(geometry)), ST_Y(ST_StartPoint(geometry))), however the btree is significantly faster. The reason is that we are only ever doing exact lookups with ==, so the GiST overhead to support range lookups && is unnecessary.

I do not have any concrete numbers for this use case, but back when I was doing a very similar recursive CTE here openstreetmap-carto/openstreetmap-carto#951 (comment) I found that the GiST made the recursive CTE overall about 2 or 3 times slower than the btree

I am happy to go either way but first I think I need a more up-to-date performance answer in tradeoff against what I completely agree is a loss to readability.

In order to get a real performance answer I think I will need to set up a whole planet osm in Postgres, and apply real osmchanges, which I have been putting off but might do this weekend

Ah, I thought the main performance loss was because of the relatively large bboxes for ways. A gist of points wouldn't have that problem. But I wouldn't be surprised if gist is slower overall.

Is it possible to put a ST_Point in a btree and get equality comparison? That would give you the best of both worlds.

Is it possible to put a ST_Point in a btree and get equality comparison?

Yes. I ran a quick test just now. Using the same data as this comment #2482 (comment) where I expire a Z13-size area (this tile) and explore 4430 total endpoints (some of those roads extend quite far)

It's interesting that hot vs cold doesn't matter much for GiST - my interpretation is that the btree index is more I/O-bound, while the GiST index does more nontrivial CPU work while traversing to its goal. The speedup from cold to warm is similar though, in total time (not relative time). So I would suspect that the total number of disk reads is about the same between btree and gist, it's just that gist is doing way more computations along the way.

Variant cold (median) cold (min) warm warm per-probe (×4430 loops) index disk space

A — (x,y) btree ~247 ms 235 ms 67 ms 0.002 ms 364MB

B — point btree ~282 ms 229 ms 110 ms 0.007 ms 576MB

C — point gist ~1070 ms 1033 ms 928 ms 0.09 ms 479MB

I have committed b0dd099

It is an improvement to readability, a bit slower to walk, but surprisingly a bit faster to delete the old geometries, compare to #2482 (comment)

expired tile region walk collect delete insert total

z13 71 303 69 155 90 688 ms

z14 70 158 33 72 36 368 ms

z15 74 71 16 40 14 215 ms

z16 73 73 8 32 12 197 ms

z17 84 66 11 27 16 203 ms

z18 71 52 7 24 7 161 ms

z19 82 71 16 36 13 218 ms

z20 71 59 6 29 7 172 ms

lonvia · 2026-06-05T17:38:39Z

+        AND ST_Y(ST_StartPoint(l."{geom_column}")) = n.y)
+       OR (ST_X(ST_EndPoint(l."{geom_column}")) = n.x
+        AND ST_Y(ST_EndPoint(l."{geom_column}")) = n.y) )
+     AND {where}


One way to make this recursive CTE cheaper is to join here only with nodes added in the last recursion. You need to be very, very careful though not to create endless loops when doing that. So maybe an optimization for later rather than now.

I think this is true already, I've looked at it in EXPLAIN ANALYZE and WorkTable Scan on nodes n (rows=43 loops=319) shows each iteration joins only ~43 frontier rows, not the 13613 accumulated. If it were re-joining the whole set, that scan would show thousands of rows. It works because I am doing WITH RECURSIVE ... UNION rather than WITH RECURSIVE ... UNION ALL.

leijurv · 2026-06-05T19:00:59Z

(I have been focusing more on ST_LineMerge at render time (see here: openstreetmap-carto/openstreetmap-carto#951 (comment)) but I am still interested in this PR as an alternative possible path)

The mix of tile expiry and recursive query is interesting. I actually think that the solution is simpler because of your use of tile-based expiry instead of way-based. I did an implementation with way-based expiry for waymarkedtrails and only got it to work by tracking the IDs of the source ways for each merged way. That would make this implementation significantly more complex.

I admit I do not know how osm2pgsql works and clearly you know more than me

But this is my plan:

Right now we find deleted ways, updated ways, and created ways, by "rounding up" to a Z/X/Y tile. i.e. we get a "low resolution picture" of where edits have happened.

Instead, I'm imagining saving in the exact same way, at the exact same time, expiry data that is simply more precise: the exact ST_X(ST_StartPoint(way)), ST_Y(ST_StartPoint(way)) and ST_X(ST_EndPoint(way)), ST_Y(ST_EndPoint(way)) (including for deleted ways!). In my mind, the queries in this PR would be exactly(?) the same, only higher precision because they are a collection of exact points, rather than polygon geometries. Like _glm_region is a collection of ST_Point rather than a collection of tile bboxes. Possibly I am missing something?

lonvia · 2026-06-05T19:22:52Z

The tricky part with updates here is that your algorithm runs while the tables are in a in-between state: the source table has already all the updated tags and geometries. Deleted ways are gone from the source table without a trace. The destination table, on the other hand, has still the state from before the update was applied. So a simple endpoint matching won't really work. You need to track former way states and deleted ways as well. And that is somewhat tricky with osm2pgsql.

The tile expiry kindly solves the problem for you because it does the expiry always on the old and the new geometry. Thus, the expired region cover where ways were before and after the update, nicely side-stepping the issue at the price of invalidating a bit more than necessary.

lonvia · 2026-06-05T19:26:30Z

I did a quick test on the planet with a table of named highway using highway type, name and ref as grouping criterion. The source table has 114_456_912 rows. Computing the destination with grouping and line merge takes 9min30s and produces a table with 35_462_494 rows. Utterly feasible.

leijurv · 2026-06-05T20:45:02Z

Deleted ways are gone from the source table without a trace.

Wait sorry, I don't mean to be difficult, but just to clarify: Yes, by the time the generalization runs, a deleted way is completely gone from planet_osm_line. But, isn't it possible to have a table, like glm_expired_points, which is just a bunch of ST_Points, where whenever a way is deleted, I add both endpoints to this table? Then when the generalization runs, it can find the geometries that were made from this now-deleted way, because this endpoint will be an internal point of those geometries?

I thought this was possible because joto said above: For instance, you can have a separate table where you store (the "way" processing function) all the ids of ways that changed, then use that table as input for the rest of the processing in the "gen" step.

Pseudocode:

def way_created(way):
    INSERT INTO glm_expired_points(pt) VALUES (ST_StartPoint(way));
    INSERT INTO glm_expired_points(pt) VALUES (ST_EndPoint(way));

def way_updated(old_way, new_way): # not sure how this works
    INSERT INTO glm_expired_points(pt) VALUES (ST_StartPoint(old_way));
    INSERT INTO glm_expired_points(pt) VALUES (ST_EndPoint(old_way));
    INSERT INTO glm_expired_points(pt) VALUES (ST_StartPoint(new_way));
    INSERT INTO glm_expired_points(pt) VALUES (ST_EndPoint(new_way));

def way_deleted(way):
    INSERT INTO glm_expired_points(pt) VALUES (ST_StartPoint(way));
    INSERT INTO glm_expired_points(pt) VALUES (ST_EndPoint(way));

I don't think I need to store the way itself, or delay its deletion/update in any way. I only need to know its endpoints.

The destination table, on the other hand, has still the state from before the update was applied. So a simple endpoint matching won't really work.

Endpoint exact matching would not work because an endpoint of an input way is probably became internal point of an output way. Instead, I would have a GiST on the post-linemerge output table. The lookup of "a way was deleted (its linestring is lost), but I do know what its endpoints were" --> "which linemerged geometries are now expired and need to be rescanned" is just ST_Intersects I think

Computing the destination with grouping and line merge takes 9min30s and produces a table with 35_462_494 rows. Utterly feasible.

Great!

leijurv · 2026-06-06T06:25:03Z

Okay, I implemented a version that uses exact endpoints, rather than tiles. ddb7de4 Here is an explanation.

This generalization needs to know which connected components have been changed, so that it can re-merge only those. Currently in this PR, it gets expired Z/X/Y tiles, and so it must re-walk every road that touches that tile. This is quite slow due to the GiST tile && way issue that I have mentioned a few times so I won't repeat it. And of course, even if you make your tile very small like Z19, the vast majority of these re-walks will be useless red herring, where the tile was edited for some unrelated reason. And making the tiles very small is itself inefficient because so many of them will be expired by OSM edits. But, if we instead search by exact endpoint, all these issues are solved. We only search from exact node equality, and starting from that point, the search is limited to where it matches the highway,ref,name,etc columns that we group by.

A few concerns addressed

Why not do this on the Lua side? Why change core C++? The flex geometry API doesn't have a method to get the startpoint/endpoint of a way. Could add one. But then still, deleting a way is still a problem, it doesn't call process_way. By the time you call process_deleted_way, the exact node locations could be gone (?). So, doing this in Lua would need new accessors and a new deleted geometry hook. It is just a less natural way to hook into deleted way endpoints.

Why not expire only tiles at the endpoints of the way, not the middle? If I edit a long way, maybe we could expire just exactly two tiny tiles, like Z20, one at each endpoint? This is still not perfectly quantized, I cannot look up the way with ==, I still need && which has the issues mentioned before. And you would still need the Lua endpoint accessor still.

Why add this code to expire output rather than a totally new system? It is true that endpoints are not the same as tiles but they are triggered by the same event on the same column. It is able to reuse quite a lot of code while staying backwards compatible.

Summary: define_expire_output will write the exact startpoint,endpoint of each changed way as a ST_Point, the generalization consumes those points and proceeds through the existing steps.

Performance? I've printed several tables above, but the short version is that there was a noise floor around 150ms where even for a single node update where the expiry is tiny I can't get it faster because of all the time spent on all the unrelated ways and slow GiST lookups. With this version, picking single random ways in the same tile (this one), it takes an average of ~68ms to fully reexplore the network. Note that this is a dense road network so the recursion depth here ranged from 1 to 51, and the total ways looked up ranged from 2 to 74. And this parallelizes quite well, if I go from 1 point to 10 points, it only gets about 2x slower not 10x (it goes to ~125ms). If I 10x it again, to 100 random points, it goes to average ~407ms. So osmchanges with many edited nodes are not a problem even if they all are touching dense road networks. And note every measurement in this paragraph was completely cold - I restart Postgres between every query, and I flush my filesystem cache too. If I rerun it warm, it's almost 3x faster, to average ~26ms for 1 point.

I'd be perfectly happy to revert this though, if it's judged to not be worth it.

grouped linemerge

f4a33ec

leijurv mentioned this pull request Jun 1, 2026

Segmented roads lead to too high or too low density of shields and names openstreetmap-carto/openstreetmap-carto#951

Open

leijurv added 2 commits June 1, 2026 09:55

tidy

370a094

fix performance mistake in join order

9ca4b5e

lonvia reviewed Jun 5, 2026

View reviewed changes

switch to btree(st_point)

b0dd099

endpoint based grouped linemerge

ddb7de4

Variant	cold (median)	cold (min)	warm	warm per-probe (×4430 loops)	index disk space
A — (x,y) btree	~247 ms	235 ms	67 ms	0.002 ms	364MB
B — point btree	~282 ms	229 ms	110 ms	0.007 ms	576MB
C — point gist	~1070 ms	1033 ms	928 ms	0.09 ms	479MB

expired tile	region	walk	collect	delete	insert	total
z13	71	303	69	155	90	688 ms
z14	70	158	33	72	36	368 ms
z15	74	71	16	40	14	215 ms
z16	73	73	8	32	12	197 ms
z17	84	66	11	27	16	203 ms
z18	71	52	7	24	7	161 ms
z19	82	71	16	36	13	218 ms
z20	71	59	6	29	7	172 ms

Uh oh!

Conversation

leijurv commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

How it works

Initial build

Incremental updates

Usage

Details

Tests

Uh oh!

joto commented Jun 1, 2026

Uh oh!

leijurv commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joto commented Jun 2, 2026

Uh oh!

leijurv commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leijurv commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leijurv commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joto commented Jun 3, 2026

Uh oh!

leijurv commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lonvia left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leijurv Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leijurv Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leijurv commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lonvia commented Jun 5, 2026

Uh oh!

lonvia commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leijurv commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leijurv commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

leijurv commented Jun 1, 2026 •

edited

Loading

leijurv commented Jun 1, 2026 •

edited

Loading

leijurv commented Jun 2, 2026 •

edited

Loading

leijurv commented Jun 2, 2026 •

edited

Loading

leijurv commented Jun 2, 2026 •

edited

Loading

leijurv commented Jun 3, 2026 •

edited

Loading

leijurv Jun 5, 2026 •

edited

Loading

leijurv Jun 5, 2026 •

edited

Loading

leijurv commented Jun 5, 2026 •

edited

Loading

lonvia commented Jun 5, 2026 •

edited

Loading

leijurv commented Jun 5, 2026 •

edited

Loading

leijurv commented Jun 6, 2026 •

edited

Loading