data (#23) - Performance and memory usage hint (#282) - Message List

Performance and memory usage hint

Hi all,

I thought I would share a helpful hint after trying to figure out why the shortest_path performance on some of my tables was much slower than with other tables, and was consuming a huge amount of RAM (and swap) in the process.

Let's say you have a table with 50,000 edges. Run two quick queries 'SELECT source FROM <yourtable> ORDER BY source DESC LIMIT 50' and 'SELECT target FROM <yourtable> ORDER BY target DESC LIMIT 50'). Make sure that the highest number in the source/target fields is more-or-less in line with the number of edges in the table. (It will probably be somewhat lower than the number of edges, actually.)

I had entered a dummy value like 9999999 in one of those fields for a couple edges as a quick fix during data cleanup. The queries ran correctly, but they were consuming a huge amount of memory. I spent a bunch of time trying to tune postgresql's memory parameters and then noticed an old post on this discussion forum that alluded to a similar problem. I changed the dummy values to more reasonable ones and my shortest_path queries now run in about 1/4 of the time (from about 2 seconds to 500ms).

Hope this helps someone!

Jordan ridethecity.com

  • Message #1001

    Thanks a lot for your hint and sharing this with everyone! It's very helpful one.
    You're right, if you have very high vertex numbers it may decrease performance a lot. Especially if you have lots of data.

    So renumbering source/target ID and keep those numbers low is a good idea. It's usually no issue if you run assign_vertex_id to create the network topology.