pgsql: Use perfect hashing, instead of binary search, for keyword looku


Use perfect hashing, instead of binary search, for keyword lookup.

We've been speculating for a long time that hash-based keyword lookupought to be faster than binary search, but up to now we hadn't founda suitable tool for generating the hash function. Joerg Sonnenbergerprovided the inspiration, and sample code, to show us that rolling our

own generator wasn't a ridiculous idea. Hence, do that.

The method used here requires a lookup table of approximately 4 bytesper keyword, but that's less than what we saved in the predecessor commitafb0d0712, so it's not a big problem. The time savings is indeedsignificant: preliminary testing suggests that the total time for raw

parsing (flex + bison phases) drops by ~20%.

Patch by me, but it owes its existence to Joerg Sonnenberger;
thanks also to John Naylor for review.

Discussion: https://postgr.es/m/20190103163340.GA15803@britannica.bec.de

Branch------

master

Details-------

https://git.postgresql.org/pg/commitdiff/c64d0cd5ce24a344798534f1bc5827a9199b7a6e

Modified Files--------------src/common/Makefile | 9 +-src/common/kwlookup.c | 73 +++---src/include/common/kwlookup.h | 4 +src/include/parser/kwlist.h | 3 +-src/interfaces/ecpg/preproc/Makefile | 13 +-src/interfaces/ecpg/preproc/c_keywords.c | 51 ++--src/interfaces/ecpg/preproc/c_kwlist.h | 3 +-src/interfaces/ecpg/preproc/ecpg_kwlist.h | 3 +-src/pl/plpgsql/src/Makefile | 13 +-src/pl/plpgsql/src/pl_reserved_kwlist.h | 5 +-src/pl/plpgsql/src/pl_unreserved_kwlist.h | 7 +-src/tools/PerfectHash.pm | 376 ++++++++++++++++++++++++++++++src/tools/gen_keywordlist.pl | 53 ++++-src/tools/msvc/Solution.pm | 10 +-

14 files changed, 516 insertions(+), 107 deletions(-)

On Wed, Jan 9, 2019 at 7:48 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:> Use perfect hashing, instead of binary search, for keyword lookup.>> We've been speculating for a long time that hash-based keyword lookup> ought to be faster than binary search, but up to now we hadn't found> a suitable tool for generating the hash function. Joerg Sonnenberger> provided the inspiration, and sample code, to show us that rolling our> own generator wasn't a ridiculous idea. Hence, do that.>> The method used here requires a lookup table of approximately 4 bytes> per keyword, but that's less than what we saved in the predecessor commit> afb0d0712, so it's not a big problem. The time savings is indeed> significant: preliminary testing suggests that the total time for raw> parsing (flex + bison phases) drops by ~20%.>> Patch by me, but it owes its existence to Joerg Sonnenberger;

> thanks also to John Naylor for review.

Wow. That is a VERY significant improvement.

-- Robert Haas

EnterpriseDB: http://www.enterprisedb.com


The Enterprise PostgreSQL Company