Bienvenue sur PostGIS.fr

Bienvenue sur PostGIS.fr , le site de la communauté des utilisateurs francophones de PostGIS.

PostGIS ajoute le support d'objets géographique à la base de données PostgreSQL. En effet, PostGIS "spatialise" le serverur PostgreSQL, ce qui permet de l'utiliser comme une base de données SIG.

Maintenu à jour, en fonction de nos disponibilités et des diverses sorties des outils que nous testons, nous vous proposons l'ensemble de nos travaux publiés en langue française.

source: trunk/workshop-foss4g/indexing.rst @ 11

Revision 1, 8.5 KB checked in by djay, 13 years ago (diff)

Initial import of the svn tree

RevLine 
[1]1.. _indexing:
2
3Section 14: Spatial Indexing
4============================
5
6Recall that spatial index is one of the three key features of a spatial database. Indexes are what make using a spatial database for large data sets possible. Without indexing, any search for a feature would require a “sequential scan” of every record in the database. Indexing speeds up searching by organizing the data into a search tree which can be quickly traversed to find a particular record.
7
8Spatial indices are one of the greatest assets of PostGIS.  In the previous example building spatial joins requires comparing whole tables with each other. This can get very costly: joining two tables of 10,000 records each without indexes would require 100,000,000 comparisons; with indexes the cost could be as low as 20,000 comparisons.
9
10When we loaded the ``nyc_census_blocks`` table, the pgShapeLoader automatically created a spatial index called ``nyc_census_blocks_the_geom_gist``
11
12To demonstrate how important indexes are for performance, let's search ``nyc_census_blocks`` **without** our spatial index.
13
14Our first step is to remove the index.
15
16.. code-block:: sql
17
18  DROP INDEX nyc_census_blocks_the_geom_gist;
19 
20.. note::
21
22   The ``DROP INDEX`` statement drops an existing index from the database system. For more information, see the PostgreSQL `documentation <http://www.postgresql.org/docs/7.4/interactive/sql-dropindex.html>`_.
23   
24Now, watch the "Timing" meter at the lower right-hand corner of the pgAdmin query window and run the following. Our query searches through every single census block in order to identify the Broad Street entry.
25
26.. code-block:: sql
27
28  SELECT blocks.blkid
29  FROM nyc_census_blocks blocks
30  JOIN nyc_subway_stations subways
31  ON ST_Contains(blocks.the_geom, subways.the_geom)
32  WHERE subways.name = 'Broad St';
33 
34::
35
36       blkid     
37 -----------------
38  360610007003006
39 
40The ``nyc_census_blocks`` table is very small (only a few thousand records) so even without an index, the query only takes **55 ms** on my test computer.
41
42Now add the spatial index back in and run the query again.
43
44.. code-block:: sql
45
46  CREATE INDEX nyc_census_blocks_the_geom_gist ON nyc_census_blocks USING GIST (the_geom);
47
48.. note:: The ``USING GIST`` clause tells PostgreSQL to use the generic index structure (GIST) when building the index.  If you receive an error that looks like ``ERROR: index row requires 11340 bytes, maximum size is 8191`` when creating your index, you have likely neglected to add the ``USING GIST`` clause.
49
50On my test computer the time drops to **9 ms**. The larger your table, the larger the relative speed improvement of an indexed query will be.
51
52How Spatial Indexes Work
53------------------------
54
55Standard database indexes create a hierarchical tree based on the values of the column being indexed. Spatial indexes are a little different -- they are unable to index the geometric features themselves  and instead index the bounding boxes of the features.
56
57.. image:: ./indexing/bbox.png
58
59In the figure above, the number of lines that intersect the yellow star is **one**, the red line. But the bounding boxes of features that intersect the yellow box is **two**, the red and blue ones.
60
61The way the database efficiently answers the question "what lines intersect the yellow star" is to first answer the question "what boxes intersect the yellow box" using the index (which is very fast) and then do an exact calculation of "what lines intersect the yellow star" **only for those features returned by the first test**.
62
63For a large table, this "two pass" system of evaluating the approximate index first, then carrying out an exact test can radically reduce the amount of calculations necessary to answer a query.
64
65Both PostGIS and Oracle Spatial share the same "R-Tree" [#RTree]_ spatial index structure. R-Trees break up data into rectangles, and sub-rectangles, and sub-sub rectangles, etc.  It is a self-tuning index structure that automatically handles variable data density and object size.
66
67.. image:: ./indexing/index-01.png
68
69Index-Only Queries
70------------------
71
72Most of the commonly used functions in PostGIS (:command:`ST_Contains`, :command:`ST_Intersects`, :command:`ST_DWithin`, etc) include an index filter automatically. But some functions (e.g., :command:`ST_Relate`) do not include and index filter.
73
74To do a bounding-box search using the index (and no filtering), make use of the :command:`&&` operator. For geometries, the :command:`&&` operator means "bounding boxes overlap or touch" in the same way that for number the :command:`=` operator means "values are the same".
75
76Let's compare an index-only query for the population of the 'West Village' to a more exact query. Using :command:`&&` our index-only query looks like the following:
77
78.. code-block:: sql
79
80  SELECT Sum(popn_total)
81  FROM nyc_neighborhoods neighborhoods
82  JOIN nyc_census_blocks blocks
83  ON neighborhoods.the_geom && blocks.the_geom
84  WHERE neighborhoods.name = 'West Village';
85 
86::
87
88  50325
89 
90Now let's do the same query using the more exact :command:`ST_Intersects` function.
91
92.. code-block:: sql
93
94  SELECT Sum(popn_total)
95  FROM nyc_neighborhoods neighborhoods
96  JOIN nyc_census_blocks blocks
97  ON ST_Intersects(neighborhoods.the_geom, blocks.the_geom)
98  WHERE neighborhoods.name = 'West Village';
99 
100::
101
102  27141
103
104A much lower answer! The first query summed up every block that intersected the neighborhood's bounding box; the second query only summed up those blocks that intersected the neighborhood itself.
105
106Analyzing
107---------
108
109The PostgreSQL query planner intelligently chooses when to use or not to use indexes to evaluate a query. Counter-intuitively, it is not always faster to do an index search: if the search is going to return every record in the table, traversing the index tree to get each record will actually be slower than just linearly reading the whole table from the start.
110
111In order to figure out what situation it is dealing with (reading a small part of the table versus reading a large portion of the table), PostgreSQL keeps statistics about the distribution of data in each indexed table column.  By default, PostgreSQL gathers statistics on a regular basis. However, if you dramatically change the make-up of your table within a short period of time, the statistics will not be up-to-date.
112
113To ensure your statistics match your table contents, it is wise the to run the ``ANALYZE`` command after bulk data loads and deletes in your tables. This force the statistics system to gather data for all your indexed columns.
114
115The ``ANALYZE`` command asks PostgreSQL to traverse the table and update its internal statistics used for query plan estimation (query plan analysis will be discussed later).
116
117.. code-block:: sql
118
119   ANALYZE nyc_census_blocks;
120   
121Vacuuming
122---------
123
124It's worth stressing that just creating an index is not enough to allow PostgreSQL to use it effectively.  VACUUMing must be performed whenever a new index is created or after a large number of UPDATEs, INSERTs or DELETEs are issued against a table.  The ``VACUUM`` command asks PostgreSQL to reclaim any unused space in the table pages left by updates or deletes to records.
125
126Vacuuming is so critical for the efficient running of the database that PostgreSQL provides an "autovacuum" option.
127
128Enabled by default, autovacuum both vacuums (recovers space) and analyzes (updates statistics) on your tables at sensible intervals determined by the level of activity.  While this is essential for highly transactional databases, it is not advisable to wait for an autovacuum run after adding indices or bulk-loading data.  If a large batch update is performed, you should manually run ``VACUUM``.
129
130Vacuuming and analyzing the database can be performed separately as needed.  Issuing ``VACUUM`` command will not update the database statistics; likewise issuing an ``ANALYZE`` command will not recover unused table rows.  Both commands can be run against the entire database, a single table, or a single column.
131
132.. code-block:: sql
133
134   VACUUM ANALYZE nyc_census_blocks;
135
136Function List
137-------------
138
139`geometry_a && geometry_b <http://postgis.org/docs/ST_Geometry_Overlap.html>`_: Returns TRUE if A's bounding box overlaps B's.
140
141`geometry_a = geometry_b <http://postgis.org/docs/ST_Geometry_EQ.html>`_: Returns TRUE if A's bounding box is the same as B's.
142
143`ST_Intersects(geometry_a, geometry_b) <http://postgis.org/docs/ST_Intersects.html>`_: Returns TRUE if the Geometries/Geography "spatially intersect" - (share any portion of space) and FALSE if they don't (they are Disjoint).
144
145.. rubric:: Footnotes
146
147.. [#RTree] http://postgis.org/support/rtree.pdf
148
Note: See TracBrowser for help on using the repository browser.