Dashboard > ApacheDS Addons > ... > Developer's Guide > Search Algorithm Discussion
Search Algorithm Discussion
Added by Alex Karasulu, last edited by Alex Karasulu on Oct 29, 2005  (view change)
Labels: 
(None)


Trustin Lee at gleamynode.net says: (4:32:03 AM)
   What is the difference between evaluators in 'event' package and ones in 'partition.impl.btree'?

Trustin Lee at gleamynode.net says: (4:32:12 AM)
   That was I was curious hehe

Trustin Lee at gleamynode.net says: (4:32:28 AM)
   That was what I was curious about. 

Alex Karasulu says: (4:32:30 AM)
   well the evaluators in the database leverage indices 

Alex Karasulu says: (4:32:42 AM)
   they also pull candidates from the database

Trustin Lee at gleamynode.net says: (4:33:09 AM)
   I see.  So one is in-memory version and the other isn't.

Alex Karasulu says: (4:33:13 AM)
   see the system of evaluators in the DB are coupled to enumerators

Alex Karasulu says: (4:33:21 AM)
   partly

Alex Karasulu says: (4:34:09 AM)
   let me continue

Trustin Lee at gleamynode.net says: (4:34:37 AM)
   k

Alex Karasulu says: (4:34:44 AM)
   see the system is coupled so an enumerator is found to generate candidate ids

Alex Karasulu says: (4:34:55 AM)
   this enumerator will run off of an index

Alex Karasulu says: (4:35:09 AM)
   either a user provided index if avialable or a system index

Alex Karasulu says: (4:35:15 AM)
   its like a cursor in a database

Alex Karasulu says: (4:35:19 AM)
   it walks the index

Alex Karasulu says: (4:35:34 AM)
   but tries to pick the index with the least scan count

Alex Karasulu says: (4:35:46 AM)
   the least scan count for one of the ExprNodes

Alex Karasulu says: (4:36:14 AM)
   what happens in the search engine is first the ExprNode tree is annotated 

Alex Karasulu says: (4:36:23 AM)
   this is done by the optimizer

Alex Karasulu says: (4:37:30 AM)
   it basically walks the tree checking each node to see first if an index is available for the attribute ID then it asks the index associated built on the attribute how many items it has with that exprnode's value

Alex Karasulu says: (4:37:41 AM)
   so if I have (ou=blah) and an index on ou

Alex Karasulu says: (4:37:52 AM)
   the optimizer asks the ou index how many blahs it has

Alex Karasulu says: (4:37:57 AM)
   this is a constant time operation

Trustin Lee at gleamynode.net says: (4:38:03 AM)
   I see.

Alex Karasulu says: (4:38:14 AM)
   so all nodes in the tree are labelled this way

Alex Karasulu says: (4:38:40 AM)
   btw the algorithm does special things to even compute a value for branch nodes

Alex Karasulu says: (4:39:00 AM)
   for example an OR branch node would add all the annotations on children

Alex Karasulu says: (4:39:10 AM)
   I think

Alex Karasulu says: (4:39:13 AM)
   

Trustin Lee at gleamynode.net says: (4:39:20 AM)
   

Alex Karasulu says: (4:39:20 AM)
   yeah yeah

Alex Karasulu says: (4:39:29 AM)
   a AND node would take the worst case

Trustin Lee at gleamynode.net says: (4:39:38 AM)
   Right.

Alex Karasulu says: (4:39:40 AM)
   of the childresn with the highest value

Alex Karasulu says: (4:39:57 AM)
   because this is the highest number that can be returned by that logical branch 

Alex Karasulu says: (4:40:07 AM)
   this number is the number of matching candidates

Alex Karasulu says: (4:40:27 AM)
   now with these annotations we feed the tree to the search engine

Alex Karasulu says: (4:40:47 AM)
   the engine finds a single enumerator to generate candidate IDs 

Alex Karasulu says: (4:40:58 AM)
   the cheapest enumerator

Trustin Lee at gleamynode.net says: (4:41:03 AM)
   Hmm..

Alex Karasulu says: (4:41:30 AM)
   this enumerator must be for a leaf expression

Alex Karasulu says: (4:42:03 AM)
   so as user pulls entries from a NamingEnumeration the enumerator walks the index generating index records with IDs 

Alex Karasulu says: (4:42:08 AM)
   candidate IDS

Alex Karasulu says: (4:42:09 AM)
   IDs

Alex Karasulu says: (4:42:35 AM)
   the rest of the expr nodes basically act as evaluators

Alex Karasulu says: (4:42:51 AM)
   sometimes the entry may never get pulled from the master table

Alex Karasulu says: (4:43:23 AM)
   if enough indices are available then a tests can be conducted using index values instead of pulling out the entire entry from the master table

Alex Karasulu says: (4:43:41 AM)
   intersting isn't it

Trustin Lee at gleamynode.net says: (4:43:57 AM)
   interesting.

Trustin Lee at gleamynode.net says: (4:44:01 AM)
   Quite complex.

Alex Karasulu says: (4:44:26 AM)
   well it is because we try not to have to cache the entire result set in membory

Alex Karasulu says: (4:44:58 AM)
   basically we pull entries from the store as they are needed to push them out the door back to the client

Alex Karasulu says: (4:45:28 AM)
   if we cached the result set before a return then you can easily make the server shit itself

Alex Karasulu says: (4:45:30 AM)
   

Site running on a free Atlassian Confluence Open Source Project License granted to Safehaus. Evaluate Confluence today.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.4 Build:#809 Jun 12, 2007) - Bug/feature request - Contact Administrators