mirror of
				https://github.com/postgres/postgres.git
				synced 2025-10-24 00:03:18 -04:00 
			
		
		
		
	Reverse PG_BINARY defines
This commit is contained in:
		
							parent
							
								
									cc2b5e5815
								
							
						
					
					
						commit
						a305c7d675
					
				
							
								
								
									
										34
									
								
								doc/FAQ_BSDI
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										34
									
								
								doc/FAQ_BSDI
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,34 @@ | ||||
| This outlines how to increase the number of shared memory buffers | ||||
| supported by BSD/OS.  By default, only 4MB of shared memory is supported | ||||
| by BSDI. | ||||
| 
 | ||||
| Keep in mind that shared memory is not pageable.  It is locked in RAM. | ||||
| 
 | ||||
| Bruce Momjian (pgman@candle.pha.pa.us) | ||||
| 
 | ||||
| --------------------------------------------------------------------------- | ||||
| 
 | ||||
| Increase SHMMAXPGS by 1024 for every additional 4MB of shared | ||||
| memory: | ||||
| 
 | ||||
| /sys/sys/shm.h:69:#define       SHMMAXPGS       1024    /* max hardware pages... | ||||
| 
 | ||||
| The default setting of 1024 is for a maximum of 4MB of shared memory. | ||||
| 
 | ||||
| For those running 4.1 or later, just recompile the kernel and reboot.  | ||||
| For those running earlier releases, there are more steps outlined below. | ||||
| 
 | ||||
| --------------------------------------------------------------------------- | ||||
| 
 | ||||
| Use bpatch to find the sysptsize value for the current kernel.  | ||||
| This is computed dynamically at bootup. | ||||
| 
 | ||||
| 	$ bpatch -r sysptsize | ||||
| 	0x9 = 9 | ||||
| 
 | ||||
| Next, change SYSPTSIZE to a hard-coded value.  Use the bpatch value, | ||||
| plus add 1 for every additional 4MB of shared memory you desire. | ||||
| 
 | ||||
| /sys/i386/i386/i386_param.c:28:#define  SYSPTSIZE 0        /* dynamically... | ||||
| 
 | ||||
| sysptsize can not be changed by sysctl on the fly. | ||||
| @ -1055,3 +1055,534 @@ Hiroshi Inoue | ||||
| Inoue@tpf.co.jp | ||||
| 
 | ||||
| 
 | ||||
| From owner-pgsql-hackers@hub.org Thu Jan 20 18:45:32 2000 | ||||
| Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) | ||||
| 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA00672 | ||||
| 	for <pgman@candle.pha.pa.us>; Thu, 20 Jan 2000 19:45:30 -0500 (EST) | ||||
| Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.14 $) with ESMTP id TAA01989 for <pgman@candle.pha.pa.us>; Thu, 20 Jan 2000 19:39:15 -0500 (EST) | ||||
| Received: from localhost (majordom@localhost) | ||||
| 	by hub.org (8.9.3/8.9.3) with SMTP id TAA00957; | ||||
| 	Thu, 20 Jan 2000 19:35:19 -0500 (EST) | ||||
| 	(envelope-from owner-pgsql-hackers) | ||||
| Received: by hub.org (bulk_mailer v1.5); Thu, 20 Jan 2000 19:33:34 -0500 | ||||
| Received: (from majordom@localhost) | ||||
| 	by hub.org (8.9.3/8.9.3) id TAA00581 | ||||
| 	for pgsql-hackers-outgoing; Thu, 20 Jan 2000 19:32:37 -0500 (EST) | ||||
| 	(envelope-from owner-pgsql-hackers@postgreSQL.org) | ||||
| Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) | ||||
| 	by hub.org (8.9.3/8.9.3) with ESMTP id TAA98940 | ||||
| 	for <pgsql-hackers@postgreSQL.org>; Thu, 20 Jan 2000 19:31:49 -0500 (EST) | ||||
| 	(envelope-from tgl@sss.pgh.pa.us) | ||||
| Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) | ||||
| 	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id TAA25390 | ||||
| 	for <pgsql-hackers@postgreSQL.org>; Thu, 20 Jan 2000 19:31:32 -0500 (EST) | ||||
| To: pgsql-hackers@postgreSQL.org | ||||
| Subject: [HACKERS] Some notes on optimizer cost estimates | ||||
| Date: Thu, 20 Jan 2000 19:31:32 -0500 | ||||
| Message-ID: <25387.948414692@sss.pgh.pa.us> | ||||
| From: Tom Lane <tgl@sss.pgh.pa.us> | ||||
| Sender: owner-pgsql-hackers@postgreSQL.org | ||||
| Status: OR | ||||
| 
 | ||||
| I have been spending some time measuring actual runtimes for various | ||||
| sequential-scan and index-scan query plans, and have learned that the | ||||
| current Postgres optimizer's cost estimation equations are not very | ||||
| close to reality at all. | ||||
| 
 | ||||
| Presently we estimate the cost of a sequential scan as | ||||
| 
 | ||||
| 	Nblocks + CPU_PAGE_WEIGHT * Ntuples | ||||
| 
 | ||||
| --- that is, the unit of cost is the time to read one disk page, | ||||
| and we have a "fudge factor" that relates CPU time per tuple to | ||||
| disk time per page.  (The default CPU_PAGE_WEIGHT is 0.033, which | ||||
| is probably too high for modern hardware --- 0.01 seems like it | ||||
| might be a better default, at least for simple queries.)  OK, | ||||
| it's a simplistic model, but not too unreasonable so far. | ||||
| 
 | ||||
| The cost of an index scan is measured in these same terms as | ||||
| 
 | ||||
| 	Nblocks + CPU_PAGE_WEIGHT * Ntuples + | ||||
| 	  CPU_INDEX_PAGE_WEIGHT * Nindextuples | ||||
| 
 | ||||
| Here Ntuples is the number of tuples selected by the index qual | ||||
| condition (typically, it's less than the total table size used in | ||||
| sequential-scan estimation).  CPU_INDEX_PAGE_WEIGHT essentially | ||||
| estimates the cost of scanning an index tuple; by default it's 0.017 or | ||||
| half CPU_PAGE_WEIGHT.  Nblocks is estimated as the index size plus an | ||||
| appropriate fraction of the main table size. | ||||
| 
 | ||||
| There are two big problems with this: | ||||
| 
 | ||||
| 1. Since main-table tuples are visited in index order, we'll be hopping | ||||
| around from page to page in the table.  The current cost estimation | ||||
| method essentially assumes that the buffer cache plus OS disk cache will | ||||
| be 100% efficient --- we will never have to read the same page of the | ||||
| main table twice in a scan, due to having discarded it between | ||||
| references.  This of course is unreasonably optimistic.  Worst case | ||||
| is that we'd fetch a main-table page for each selected tuple, but in | ||||
| most cases that'd be unreasonably pessimistic. | ||||
| 
 | ||||
| 2. The cost of a disk page fetch is estimated at 1.0 unit for both | ||||
| sequential and index scans.  In reality, sequential access is *much* | ||||
| cheaper than the quasi-random accesses performed by an index scan. | ||||
| This is partly a matter of physical disk seeks, and partly a matter | ||||
| of benefitting (or not) from any read-ahead logic the OS may employ. | ||||
| 
 | ||||
| As best I can measure on my hardware, the cost of a nonsequential | ||||
| disk read should be estimated at 4 to 5 times the cost of a sequential | ||||
| one --- I'm getting numbers like 2.2 msec per disk page for sequential | ||||
| scans, and as much as 11 msec per page for index scans.  I don't | ||||
| know, however, if this ratio is similar enough on other platforms | ||||
| to be useful for cost estimating.  We could make it a parameter like | ||||
| we do for CPU_PAGE_WEIGHT ... but you know and I know that no one | ||||
| ever bothers to adjust those numbers in the field ... | ||||
| 
 | ||||
| The other effect that needs to be modeled, and currently is not, is the | ||||
| "hit rate" of buffer cache.  Presumably, this is 100% for tables smaller | ||||
| than the cache and drops off as the table size increases --- but I have | ||||
| no particular thoughts on the form of the dependency.  Does anyone have | ||||
| ideas here?  The problem is complicated by the fact that we don't really | ||||
| know how big the cache is; we know the number of buffers Postgres has, | ||||
| but we have no idea how big a disk cache the kernel is keeping.  As near | ||||
| as I can tell, finding a hit in the kernel disk cache is not a lot more | ||||
| expensive than having the page sitting in Postgres' own buffers --- | ||||
| certainly it's much much cheaper than a disk read. | ||||
| 
 | ||||
| BTW, if you want to do some measurements of your own, try turning on | ||||
| PGOPTIONS="-d 2 -te".  This will dump a lot of interesting numbers | ||||
| into the postmaster log, if your platform supports getrusage(). | ||||
| 
 | ||||
| 			regards, tom lane | ||||
| 
 | ||||
| ************ | ||||
| 
 | ||||
| From owner-pgsql-hackers@hub.org Thu Jan 20 20:26:33 2000 | ||||
| Received: from hub.org (hub.org [216.126.84.1]) | ||||
| 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA06630 | ||||
| 	for <pgman@candle.pha.pa.us>; Thu, 20 Jan 2000 21:26:32 -0500 (EST) | ||||
| Received: from localhost (majordom@localhost) | ||||
| 	by hub.org (8.9.3/8.9.3) with SMTP id VAA35022; | ||||
| 	Thu, 20 Jan 2000 21:22:08 -0500 (EST) | ||||
| 	(envelope-from owner-pgsql-hackers) | ||||
| Received: by hub.org (bulk_mailer v1.5); Thu, 20 Jan 2000 21:20:35 -0500 | ||||
| Received: (from majordom@localhost) | ||||
| 	by hub.org (8.9.3/8.9.3) id VAA34569 | ||||
| 	for pgsql-hackers-outgoing; Thu, 20 Jan 2000 21:19:38 -0500 (EST) | ||||
| 	(envelope-from owner-pgsql-hackers@postgreSQL.org) | ||||
| Received: from hercules.cs.ucsb.edu (hercules.cs.ucsb.edu [128.111.41.30]) | ||||
| 	by hub.org (8.9.3/8.9.3) with ESMTP id VAA34534 | ||||
| 	for <pgsql-hackers@postgreSQL.org>; Thu, 20 Jan 2000 21:19:26 -0500 (EST) | ||||
| 	(envelope-from xun@cs.ucsb.edu) | ||||
| Received: from xp10-06.dialup.commserv.ucsb.edu (root@xp10-06.dialup.commserv.ucsb.edu [128.111.253.249]) | ||||
| 	by hercules.cs.ucsb.edu (8.8.6/8.8.6) with ESMTP id SAA04655 | ||||
| 	for <pgsql-hackers@postgreSQL.org>; Thu, 20 Jan 2000 18:19:22 -0800 (PST) | ||||
| Received: from xp10-06.dialup.commserv.ucsb.edu (xun@localhost) | ||||
| 	by xp10-06.dialup.commserv.ucsb.edu (8.9.3/8.9.3) with ESMTP id SAA22377 | ||||
| 	for <pgsql-hackers@postgreSQL.org>; Thu, 20 Jan 2000 18:19:40 -0800 | ||||
| Message-Id: <200001210219.SAA22377@xp10-06.dialup.commserv.ucsb.edu> | ||||
| To: pgsql-hackers@postgreSQL.org | ||||
| Reply-to: xun@cs.ucsb.edu | ||||
| Subject: Re. [HACKERS] Some notes on optimizer cost estimates | ||||
| Date: Thu, 20 Jan 2000 18:19:40 -0800 | ||||
| From: Xun Cheng <xun@cs.ucsb.edu> | ||||
| Sender: owner-pgsql-hackers@postgreSQL.org | ||||
| Status: OR | ||||
| 
 | ||||
| I'm very glad you bring up this cost estimate issue. | ||||
| Recent work in database research have argued a more | ||||
| detailed disk access cost model should be used for | ||||
| large queries especially joins. | ||||
| Traditional cost estimate only considers the number of | ||||
| disk pages accessed. However a more detailed model | ||||
| would consider three parameters: avg. seek, avg. latency | ||||
| and avg. page transfer. For old disk, typical values are | ||||
| SEEK=9.5 milliseconds, LATENCY=8.3 ms, TRANSFER=2.6ms. | ||||
| A sequential continuous reading of a table (assuming | ||||
| 1000 continuous pages) would cost | ||||
| (SEEK+LATENCY+1000*TRANFER=2617.8ms); while quasi-randomly | ||||
| reading 200 times with 2 continuous pages/time would | ||||
| cost (SEEK+200*LATENCY+400*TRANSFER=2700ms). | ||||
| Someone from IBM lab re-studied the traditional | ||||
| ad hoc join algorithms (nested, sort-merge, hash) using the detailed cost model | ||||
| and found some interesting results. | ||||
| 
 | ||||
| >I have been spending some time measuring actual runtimes for various | ||||
| >sequential-scan and index-scan query plans, and have learned that the | ||||
| >current Postgres optimizer's cost estimation equations are not very | ||||
| >close to reality at all. | ||||
| 
 | ||||
| One interesting question I'd like to ask is if this non-closeness | ||||
| really affects the optimal choice of postgresql's query optimizer. | ||||
| And to what degree the effects might be? My point is that | ||||
| if the optimizer estimated the cost for sequential-scan is 10 and | ||||
| the cost for index-scan is 20 while the actual costs are 10 vs. 40, | ||||
| it should be ok because the optimizer would still choose sequential-scan | ||||
| as it should. | ||||
| 
 | ||||
| >1. Since main-table tuples are visited in index order, we'll be hopping | ||||
| >around from page to page in the table. | ||||
| 
 | ||||
| I'm not sure about the implementation in postgresql. One thing you might | ||||
| be able to do is to first collect all must-read page addresses from  | ||||
| the index scan and then order them before the actual ordered page fetching. | ||||
| It would at least avoid the same page being read twice (not entirely | ||||
| true depending on the context (like in join) and algo.) | ||||
| 
 | ||||
| >The current cost estimation | ||||
| >method essentially assumes that the buffer cache plus OS disk cache will | ||||
| >be 100% efficient --- we will never have to read the same page of the | ||||
| >main table twice in a scan, due to having discarded it between | ||||
| >references.  This of course is unreasonably optimistic.  Worst case | ||||
| >is that we'd fetch a main-table page for each selected tuple, but in | ||||
| >most cases that'd be unreasonably pessimistic. | ||||
| 
 | ||||
| This is actually the motivation that I asked before if postgresql | ||||
| has a raw disk facility. That way we have much control on this cache | ||||
| issue. Of course only if we can provide some algo. better than OS | ||||
| cache algo. (depending on the context, like large joins), a raw disk | ||||
| facility will be worthwhile (besides the recoverability). | ||||
| 
 | ||||
| Actually I have another question for you guys which is somehow related | ||||
| to this cost estimation issue. You know the difference between OLTP | ||||
| and OLAP. My question is how you target postgresql on both kinds | ||||
| of applications or just OLTP. From what I know OLTP and OLAP would | ||||
| have a big difference in query characteristics and thus  | ||||
| optimization difference. If postgresql is only targeted on | ||||
| OLTP, the above cost estimation issue might not be that | ||||
| important. However for OLAP, large tables and large queries are | ||||
| common and optimization would be difficult. | ||||
| 
 | ||||
| xun | ||||
| 
 | ||||
| 
 | ||||
| ************ | ||||
| 
 | ||||
| From owner-pgsql-hackers@hub.org Thu Jan 20 20:41:44 2000 | ||||
| Received: from hub.org (hub.org [216.126.84.1]) | ||||
| 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA07020 | ||||
| 	for <pgman@candle.pha.pa.us>; Thu, 20 Jan 2000 21:41:43 -0500 (EST) | ||||
| Received: from localhost (majordom@localhost) | ||||
| 	by hub.org (8.9.3/8.9.3) with SMTP id VAA40222; | ||||
| 	Thu, 20 Jan 2000 21:34:08 -0500 (EST) | ||||
| 	(envelope-from owner-pgsql-hackers) | ||||
| Received: by hub.org (bulk_mailer v1.5); Thu, 20 Jan 2000 21:32:35 -0500 | ||||
| Received: (from majordom@localhost) | ||||
| 	by hub.org (8.9.3/8.9.3) id VAA38388 | ||||
| 	for pgsql-hackers-outgoing; Thu, 20 Jan 2000 21:31:38 -0500 (EST) | ||||
| 	(envelope-from owner-pgsql-hackers@postgreSQL.org) | ||||
| Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) | ||||
| 	by hub.org (8.9.3/8.9.3) with ESMTP id VAA37422 | ||||
| 	for <pgsql-hackers@postgreSQL.org>; Thu, 20 Jan 2000 21:31:02 -0500 (EST) | ||||
| 	(envelope-from tgl@sss.pgh.pa.us) | ||||
| Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) | ||||
| 	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id VAA26761; | ||||
| 	Thu, 20 Jan 2000 21:30:41 -0500 (EST) | ||||
| To: "Hiroshi Inoue" <Inoue@tpf.co.jp> | ||||
| cc: pgsql-hackers@postgreSQL.org | ||||
| Subject: Re: [HACKERS] Some notes on optimizer cost estimates  | ||||
| In-reply-to: <000b01bf63b1$093cbd40$2801007e@tpf.co.jp>  | ||||
| References: <000b01bf63b1$093cbd40$2801007e@tpf.co.jp> | ||||
| Comments: In-reply-to "Hiroshi Inoue" <Inoue@tpf.co.jp> | ||||
| 	message dated "Fri, 21 Jan 2000 10:44:20 +0900" | ||||
| Date: Thu, 20 Jan 2000 21:30:41 -0500 | ||||
| Message-ID: <26758.948421841@sss.pgh.pa.us> | ||||
| From: Tom Lane <tgl@sss.pgh.pa.us> | ||||
| Sender: owner-pgsql-hackers@postgreSQL.org | ||||
| Status: ORr | ||||
| 
 | ||||
| "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: | ||||
| > I've wondered why we cound't analyze database without vacuum. | ||||
| > We couldn't run vacuum light-heartedly because it acquires an | ||||
| > exclusive lock for the target table.  | ||||
| 
 | ||||
| There is probably no real good reason, except backwards compatibility, | ||||
| why the ANALYZE function (obtaining pg_statistic data) is part of | ||||
| VACUUM at all --- it could just as easily be a separate command that | ||||
| would only use read access on the database.  Bruce is thinking about | ||||
| restructuring VACUUM, so maybe now is a good time to think about | ||||
| splitting out the ANALYZE code too. | ||||
| 
 | ||||
| > In addition,vacuum error occurs with analyze option in most | ||||
| > cases AFAIK.  | ||||
| 
 | ||||
| Still, with current sources?  What's the error message?  I fixed | ||||
| a problem with pg_statistic tuples getting too big... | ||||
| 
 | ||||
| 			regards, tom lane | ||||
| 
 | ||||
| ************ | ||||
| 
 | ||||
| From tgl@sss.pgh.pa.us Thu Jan 20 21:10:28 2000 | ||||
| Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) | ||||
| 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA08412 | ||||
| 	for <pgman@candle.pha.pa.us>; Thu, 20 Jan 2000 22:10:26 -0500 (EST) | ||||
| Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) | ||||
| 	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id WAA27080; | ||||
| 	Thu, 20 Jan 2000 22:10:28 -0500 (EST) | ||||
| To: Bruce Momjian <pgman@candle.pha.pa.us> | ||||
| cc: Hiroshi Inoue <Inoue@tpf.co.jp>, pgsql-hackers@postgresql.org | ||||
| Subject: Re: [HACKERS] Some notes on optimizer cost estimates  | ||||
| In-reply-to: <200001210248.VAA07186@candle.pha.pa.us>  | ||||
| References: <200001210248.VAA07186@candle.pha.pa.us> | ||||
| Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us> | ||||
| 	message dated "Thu, 20 Jan 2000 21:48:57 -0500" | ||||
| Date: Thu, 20 Jan 2000 22:10:28 -0500 | ||||
| Message-ID: <27077.948424228@sss.pgh.pa.us> | ||||
| From: Tom Lane <tgl@sss.pgh.pa.us> | ||||
| Status: OR | ||||
| 
 | ||||
| Bruce Momjian <pgman@candle.pha.pa.us> writes: | ||||
| > It is nice that ANALYZE is done during vacuum.  I can't imagine why you | ||||
| > would want to do an analyze without adding a vacuum to it.  I guess | ||||
| > that's why I made them the same command. | ||||
| 
 | ||||
| Well, the main bad thing about ANALYZE being part of VACUUM is that | ||||
| it adds to the length of time that VACUUM is holding an exclusive | ||||
| lock on the table.  I think it'd make more sense for it to be a | ||||
| separate command. | ||||
| 
 | ||||
| I have also been thinking about how to make ANALYZE produce a more | ||||
| reliable estimate of the most common value.  The three-element list | ||||
| that it keeps now is a good low-cost hack, but it really doesn't | ||||
| produce a trustworthy answer unless the MCV is pretty darn C (since | ||||
| it will never pick up on the MCV at all until there are at least | ||||
| two occurrences in three adjacent tuples).  The only idea I've come | ||||
| up with is to use a larger list, which would be slower and take | ||||
| more memory.  I think that'd be OK in a separate command, but I | ||||
| hesitate to do it inside VACUUM --- VACUUM has its own considerable | ||||
| memory requirements, and there's still the issue of not holding down | ||||
| an exclusive lock longer than you have to. | ||||
| 
 | ||||
| 			regards, tom lane | ||||
| 
 | ||||
| From Inoue@tpf.co.jp Thu Jan 20 21:08:32 2000 | ||||
| Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) | ||||
| 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA08225 | ||||
| 	for <pgman@candle.pha.pa.us>; Thu, 20 Jan 2000 22:08:29 -0500 (EST) | ||||
| Received: from cadzone ([126.0.1.40] (may be forged)) | ||||
|           by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP | ||||
|    id MAA04148; Fri, 21 Jan 2000 12:08:30 +0900 | ||||
| From: "Hiroshi Inoue" <Inoue@tpf.co.jp> | ||||
| To: "Bruce Momjian" <pgman@candle.pha.pa.us>, "Tom Lane" <tgl@sss.pgh.pa.us> | ||||
| Cc: <pgsql-hackers@postgreSQL.org> | ||||
| Subject: RE: [HACKERS] Some notes on optimizer cost estimates | ||||
| Date: Fri, 21 Jan 2000 12:14:10 +0900 | ||||
| Message-ID: <001301bf63bd$95cbe680$2801007e@tpf.co.jp> | ||||
| MIME-Version: 1.0 | ||||
| Content-Type: text/plain; | ||||
| 	charset="iso-8859-1" | ||||
| Content-Transfer-Encoding: 7bit | ||||
| X-Priority: 3 (Normal) | ||||
| X-MSMail-Priority: Normal | ||||
| X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 | ||||
| X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 | ||||
| In-Reply-To: <200001210248.VAA07186@candle.pha.pa.us> | ||||
| Importance: Normal | ||||
| Status: OR | ||||
| 
 | ||||
| > -----Original Message----- | ||||
| > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] | ||||
| >  | ||||
| > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: | ||||
| > > > I've wondered why we cound't analyze database without vacuum. | ||||
| > > > We couldn't run vacuum light-heartedly because it acquires an | ||||
| > > > exclusive lock for the target table.  | ||||
| > >  | ||||
| > > There is probably no real good reason, except backwards compatibility, | ||||
| > > why the ANALYZE function (obtaining pg_statistic data) is part of | ||||
| > > VACUUM at all --- it could just as easily be a separate command that | ||||
| > > would only use read access on the database.  Bruce is thinking about | ||||
| > > restructuring VACUUM, so maybe now is a good time to think about | ||||
| > > splitting out the ANALYZE code too. | ||||
| >  | ||||
| > I put it in vacuum because at the time I didn't know how to do such | ||||
| > things and vacuum already scanned the table.  I just linked on the the | ||||
| > scan.  Seemed like a good idea at the time. | ||||
| >  | ||||
| > It is nice that ANALYZE is done during vacuum.  I can't imagine why you | ||||
| > would want to do an analyze without adding a vacuum to it.  I guess | ||||
| > that's why I made them the same command. | ||||
| >  | ||||
| > If I made them separate commands, both would have to scan the table, | ||||
| > though the analyze could do it without the exclusive lock, which would | ||||
| > be good. | ||||
| > | ||||
| 
 | ||||
| The functionality of VACUUM and ANALYZE is quite different. | ||||
| I don't prefer to charge VACUUM more than now about analyzing | ||||
| database.  Probably looong lock,more aborts ....  | ||||
| Various kind of analysis would be possible by splitting out ANALYZE. | ||||
|   | ||||
| Regards. | ||||
| 
 | ||||
| Hiroshi Inoue | ||||
| Inoue@tpf.co.jp | ||||
| 
 | ||||
| From owner-pgsql-hackers@hub.org Fri Jan 21 11:01:59 2000 | ||||
| Received: from hub.org (hub.org [216.126.84.1]) | ||||
| 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA07821 | ||||
| 	for <pgman@candle.pha.pa.us>; Fri, 21 Jan 2000 12:01:57 -0500 (EST) | ||||
| Received: from localhost (majordom@localhost) | ||||
| 	by hub.org (8.9.3/8.9.3) with SMTP id LAA77357; | ||||
| 	Fri, 21 Jan 2000 11:52:25 -0500 (EST) | ||||
| 	(envelope-from owner-pgsql-hackers) | ||||
| Received: by hub.org (bulk_mailer v1.5); Fri, 21 Jan 2000 11:50:46 -0500 | ||||
| Received: (from majordom@localhost) | ||||
| 	by hub.org (8.9.3/8.9.3) id LAA76756 | ||||
| 	for pgsql-hackers-outgoing; Fri, 21 Jan 2000 11:49:50 -0500 (EST) | ||||
| 	(envelope-from owner-pgsql-hackers@postgreSQL.org) | ||||
| Received: from eclipse.pacifier.com (eclipse.pacifier.com [199.2.117.78]) | ||||
| 	by hub.org (8.9.3/8.9.3) with ESMTP id LAA76594 | ||||
| 	for <pgsql-hackers@postgreSQL.org>; Fri, 21 Jan 2000 11:49:01 -0500 (EST) | ||||
| 	(envelope-from dhogaza@pacifier.com) | ||||
| Received: from desktop (dsl-dhogaza.pacifier.net [216.65.147.68]) | ||||
| 	by eclipse.pacifier.com (8.9.3/8.9.3pop) with SMTP id IAA00225; | ||||
| 	Fri, 21 Jan 2000 08:47:26 -0800 (PST) | ||||
| Message-Id: <3.0.1.32.20000121081044.01036290@mail.pacifier.com> | ||||
| X-Sender: dhogaza@mail.pacifier.com | ||||
| X-Mailer: Windows Eudora Pro Version 3.0.1 (32) | ||||
| Date: Fri, 21 Jan 2000 08:10:44 -0800 | ||||
| To: xun@cs.ucsb.edu, pgsql-hackers@postgreSQL.org | ||||
| From: Don Baccus <dhogaza@pacifier.com> | ||||
| Subject: Re: Re. [HACKERS] Some notes on optimizer cost estimates | ||||
| In-Reply-To: <200001210219.SAA22377@xp10-06.dialup.commserv.ucsb.edu> | ||||
| Mime-Version: 1.0 | ||||
| Content-Type: text/plain; charset="us-ascii" | ||||
| Sender: owner-pgsql-hackers@postgreSQL.org | ||||
| Status: OR | ||||
| 
 | ||||
| At 06:19 PM 1/20/00 -0800, Xun Cheng wrote: | ||||
| >I'm very glad you bring up this cost estimate issue. | ||||
| >Recent work in database research have argued a more | ||||
| >detailed disk access cost model should be used for | ||||
| >large queries especially joins. | ||||
| >Traditional cost estimate only considers the number of | ||||
| >disk pages accessed. However a more detailed model | ||||
| >would consider three parameters: avg. seek, avg. latency | ||||
| >and avg. page transfer. For old disk, typical values are | ||||
| >SEEK=9.5 milliseconds, LATENCY=8.3 ms, TRANSFER=2.6ms. | ||||
| >A sequential continuous reading of a table (assuming | ||||
| >1000 continuous pages) would cost | ||||
| >(SEEK+LATENCY+1000*TRANFER=2617.8ms); while quasi-randomly | ||||
| >reading 200 times with 2 continuous pages/time would | ||||
| >cost (SEEK+200*LATENCY+400*TRANSFER=2700ms). | ||||
| >Someone from IBM lab re-studied the traditional | ||||
| >ad hoc join algorithms (nested, sort-merge, hash) using the detailed cost | ||||
| model | ||||
| >and found some interesting results. | ||||
| 
 | ||||
| One complication when doing an index scan is that you are | ||||
| accessing two separate files (table and index), which can frequently | ||||
| be expected to cause an considerable increase in average seek time. | ||||
| 
 | ||||
| Oracle and other commercial databases recommend spreading indices and | ||||
| tables over several spindles if at all possible in order to minimize | ||||
| this effect. | ||||
| 
 | ||||
| I suspect it also helps their optimizer make decisions that are | ||||
| more consistently good for customers with the largest and most | ||||
| complex databases and queries, by making cost estimates more predictably | ||||
| reasonable. | ||||
| 
 | ||||
| Still...this doesn't help with the question about the effect of the | ||||
| filesystem system cache.  I wandered around the web for a little bit | ||||
| last night, and found one summary of a paper by Osterhout on the | ||||
| effect of the Solaris cache on a fileserver serving diskless workstations. | ||||
| There was reference to the hierarchy involved (i.e. the local workstation | ||||
| cache is faster than the fileserver's cache which has to be read via | ||||
| the network which in turn is faster than reading from the fileserver's | ||||
| disk).  It appears the rule-of-thumb for the cache-hit ratio on reads, | ||||
| presumably based on measuring some internal Sun systems, used in their | ||||
| calculations was 80%. | ||||
| 
 | ||||
| Just a datapoint to think about. | ||||
| 
 | ||||
| There's also considerable operating system theory on paging systems | ||||
| that might be useful for thinking about trying to estimate the | ||||
| Postgres cache/hit ratio.  Then again, maybe Postgres could just | ||||
| keep count of how many pages of a given table are in the cache at | ||||
| any given time?  Or simply keep track of the current ratio of hits | ||||
| and misses? | ||||
| 
 | ||||
| >>I have been spending some time measuring actual runtimes for various | ||||
| >>sequential-scan and index-scan query plans, and have learned that the | ||||
| >>current Postgres optimizer's cost estimation equations are not very | ||||
| >>close to reality at all. | ||||
| 
 | ||||
| >One interesting question I'd like to ask is if this non-closeness | ||||
| >really affects the optimal choice of postgresql's query optimizer. | ||||
| >And to what degree the effects might be? My point is that | ||||
| >if the optimizer estimated the cost for sequential-scan is 10 and | ||||
| >the cost for index-scan is 20 while the actual costs are 10 vs. 40, | ||||
| >it should be ok because the optimizer would still choose sequential-scan | ||||
| >as it should. | ||||
| 
 | ||||
| This is crucial, of course - if there are only two types of scans  | ||||
| available, what ever heuristic is used only has to be accurate enough | ||||
| to pick the right one.  Once the choice is made, it doesn't really | ||||
| matter (from the optimizer's POV) just how long it will actually take, | ||||
| the time will be spent and presumably it will be shorter than the | ||||
| alternative. | ||||
| 
 | ||||
| How frequently will the optimizer choose wrongly if: | ||||
| 
 | ||||
| 1. All of the tables and indices were in PG buffer cache or filesystem | ||||
|    cache? (i.e. fixed access times for both types of scans) | ||||
| 
 | ||||
| or | ||||
| 
 | ||||
| 2. The table's so big that only a small fraction can reside in RAM | ||||
|    during the scan and join, which means that the non-sequential | ||||
|    disk access pattern of the indexed scan is much more expensive. | ||||
| 
 | ||||
| Also, if you pick sequential scans more frequently based on a presumption | ||||
| that index scans are expensive due to increased average seek time, how | ||||
| often will this penalize the heavy-duty user that invests in extra | ||||
| drives and lots of RAM? | ||||
| 
 | ||||
| ... | ||||
| 
 | ||||
| >>The current cost estimation | ||||
| >>method essentially assumes that the buffer cache plus OS disk cache will | ||||
| >>be 100% efficient --- we will never have to read the same page of the | ||||
| >>main table twice in a scan, due to having discarded it between | ||||
| >>references.  This of course is unreasonably optimistic.  Worst case | ||||
| >>is that we'd fetch a main-table page for each selected tuple, but in | ||||
| >>most cases that'd be unreasonably pessimistic. | ||||
| > | ||||
| >This is actually the motivation that I asked before if postgresql | ||||
| >has a raw disk facility. That way we have much control on this cache | ||||
| >issue. Of course only if we can provide some algo. better than OS | ||||
| >cache algo. (depending on the context, like large joins), a raw disk | ||||
| >facility will be worthwhile (besides the recoverability). | ||||
| 
 | ||||
| Postgres does have control over its buffer cache.  The one thing that | ||||
| raw disk I/O would give you is control over where blocks are placed, | ||||
| meaning you could more accurately model the cost of retrieving them. | ||||
| So presumably the cache could be tuned to the allocation algorithm | ||||
| used to place various structures on the disk. | ||||
| 
 | ||||
| I still wonder just how much gain you get by this approach.  Compared, | ||||
| to, say simply spending $2,000 on a gigabyte of RAM.  Heck, PCs even | ||||
| support a couple gigs of RAM now. | ||||
| 
 | ||||
| >Actually I have another question for you guys which is somehow related | ||||
| >to this cost estimation issue. You know the difference between OLTP | ||||
| >and OLAP. My question is how you target postgresql on both kinds | ||||
| >of applications or just OLTP. From what I know OLTP and OLAP would | ||||
| >have a big difference in query characteristics and thus  | ||||
| >optimization difference. If postgresql is only targeted on | ||||
| >OLTP, the above cost estimation issue might not be that | ||||
| >important. However for OLAP, large tables and large queries are | ||||
| >common and optimization would be difficult. | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| - Don Baccus, Portland OR <dhogaza@pacifier.com> | ||||
|   Nature photos, on-line guides, Pacific Northwest | ||||
|   Rare Bird Alert Service and other goodies at | ||||
|   http://donb.photo.net. | ||||
| 
 | ||||
| ************ | ||||
| 
 | ||||
|  | ||||
| @ -1403,7 +1403,7 @@ From owner-pgsql-hackers@hub.org Sat Jan 22 02:31:03 2000 | ||||
| Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) | ||||
| 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA06743 | ||||
| 	for <pgman@candle.pha.pa.us>; Sat, 22 Jan 2000 03:31:02 -0500 (EST) | ||||
| Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id DAA07529 for <pgman@candle.pha.pa.us>; Sat, 22 Jan 2000 03:25:13 -0500 (EST) | ||||
| Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA07529 for <pgman@candle.pha.pa.us>; Sat, 22 Jan 2000 03:25:13 -0500 (EST) | ||||
| Received: from localhost (majordom@localhost) | ||||
| 	by hub.org (8.9.3/8.9.3) with SMTP id DAA31900; | ||||
| 	Sat, 22 Jan 2000 03:19:53 -0500 (EST) | ||||
| @ -1475,7 +1475,7 @@ From tgl@sss.pgh.pa.us Sat Jan 22 10:31:02 2000 | ||||
| Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) | ||||
| 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA20882 | ||||
| 	for <pgman@candle.pha.pa.us>; Sat, 22 Jan 2000 11:31:00 -0500 (EST) | ||||
| Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id LAA26612 for <pgman@candle.pha.pa.us>; Sat, 22 Jan 2000 11:12:44 -0500 (EST) | ||||
| Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA26612 for <pgman@candle.pha.pa.us>; Sat, 22 Jan 2000 11:12:44 -0500 (EST) | ||||
| Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) | ||||
| 	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id LAA20569; | ||||
| 	Sat, 22 Jan 2000 11:11:26 -0500 (EST) | ||||
| @ -1499,3 +1499,43 @@ Or equivalently, vacuum after updating all the rows. | ||||
| 
 | ||||
| 			regards, tom lane | ||||
| 
 | ||||
| From tgl@sss.pgh.pa.us Thu Jan 20 23:51:49 2000 | ||||
| Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) | ||||
| 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA13919 | ||||
| 	for <pgman@candle.pha.pa.us>; Fri, 21 Jan 2000 00:51:47 -0500 (EST) | ||||
| Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) | ||||
| 	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id AAA03644; | ||||
| 	Fri, 21 Jan 2000 00:51:51 -0500 (EST) | ||||
| To: Bruce Momjian <pgman@candle.pha.pa.us> | ||||
| cc: PostgreSQL-development <pgsql-hackers@postgreSQL.org> | ||||
| Subject: Re: vacuum timings  | ||||
| In-reply-to: <200001210543.AAA13592@candle.pha.pa.us>  | ||||
| References: <200001210543.AAA13592@candle.pha.pa.us> | ||||
| Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us> | ||||
| 	message dated "Fri, 21 Jan 2000 00:43:49 -0500" | ||||
| Date: Fri, 21 Jan 2000 00:51:51 -0500 | ||||
| Message-ID: <3641.948433911@sss.pgh.pa.us> | ||||
| From: Tom Lane <tgl@sss.pgh.pa.us> | ||||
| Status: ORr | ||||
| 
 | ||||
| Bruce Momjian <pgman@candle.pha.pa.us> writes: | ||||
| > I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER);  Table is | ||||
| > 400MB and index is 160MB. | ||||
| 
 | ||||
| > With index on the single in4 column, I got: | ||||
| > 	 78 seconds for a vacuum | ||||
| > 	121 seconds for vacuum after deleting a single row | ||||
| > 	662 seconds for vacuum after deleting the entire table | ||||
| 
 | ||||
| > With no index, I got: | ||||
| > 	 43 seconds for a vacuum | ||||
| > 	 43 seconds for vacuum after deleting a single row | ||||
| > 	 43 seconds for vacuum after deleting the entire table | ||||
| 
 | ||||
| > I find this quite interesting. | ||||
| 
 | ||||
| How long does it take to create the index on your setup --- ie, | ||||
| if vacuum did a drop/create index, would it be competitive? | ||||
| 
 | ||||
| 			regards, tom lane | ||||
| 
 | ||||
|  | ||||
| @ -8,7 +8,7 @@ | ||||
|  * Portions Copyright (c) 1996-2000, PostgreSQL, Inc | ||||
|  * Portions Copyright (c) 1994, Regents of the University of California | ||||
|  * | ||||
|  * $Id: c.h,v 1.71 2000/06/02 15:57:40 momjian Exp $ | ||||
|  * $Id: c.h,v 1.72 2000/06/02 16:33:17 momjian Exp $ | ||||
|  * | ||||
|  *------------------------------------------------------------------------- | ||||
|  */ | ||||
| @ -896,7 +896,7 @@ extern char *vararg_format(const char *fmt,...); | ||||
|  * ---------------------------------------------------------------- | ||||
|  */ | ||||
| 
 | ||||
| #ifndef __CYGWIN32__ | ||||
| #ifdef __CYGWIN32__ | ||||
| #define	PG_BINARY	0 | ||||
| #define	PG_BINARY_R	"rb" | ||||
| #define	PG_BINARY_W	"wb" | ||||
|  | ||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user