Saturday, June 20, 2020

AEM Oak queries and indexing

AEM Oak queries and indexing

Oak doesn't index contents by default.

Indexes are configured as nodes in repository under oak:index node with the type oak:queryIndexDefinition.

Property Index :

1. Go to crxde, under oak:index create a new node with name PropertyIndex and type oak:queryIndexDefinition
2. Set properties - type:property and propertyNames:propertynames of the node

Ordered Index :

Deprecated. Lucene index is to be used instead.

Lucene Full Index :

Index is update asynchronously by background thread.

1. Go to crxde, under oak:index create a new node with name LuceneIndex and type oak:queryIndexDefinition
2. Set properties - type:lucene and async:async of the node

Lucene Property Index :

1. Go to crxde, under oak:index create a new node with name LucenePropertyIndex and type oak:queryIndexDefinition
2. Set properties - type:lucene, async:async, fulltextEnabled:false, includePropertyNames:Nameofproperties of the node

You can provide custom analysed for lucene. Analyser can have tokenizer, tokenfilters and charfilters.

Solr can also be used. It can either be a embedded solr configuration or external solr server.

You can debug the AEM queries using query debugging or MBean output  -

http://serveraddress:port/system/console/jmx


Best Practices :

Explain query tool can be used.
Traversal or prefetching results should be preferred over queries in components
Indexes should be in place
Instead of large queries, break down the queries in small and then combine the results as and when possible.
Set limits of the queries execution
  • -Doak.queryLimitInMemory=500000
  • -Doak.queryLimitReads=100000
Use lucene indexes wherever possible
Solr should be used when the server capacity is limited.
External solr should be used only when required as it introduces letency
Optimize indexes so that queries can run faster. Like using evaluatePathRestrictions, sorting, only put required contents in the indexes, define rules for node types in indexes, indexes for the paths where queries would run
If your nodestore is at a different place, do copyonread
Oak indexes should not be reindexed until oak index configuration has changed or binary is missing/corrupted.

Text Pre-Extraction of Binaries

Process of extracting and processing texts from binaries directly from data stores via an isolated process.

Useful when Lucene reindexing is done for the large volume of binaries with readable texts like pdfs, docs etc when full text search is expected.

Useful when supporting the new Lucene indexing.













No comments:

Post a Comment

Some more AEM 6 Interview Questions for Architects

 Some more AEM 6 Interview Questions for Architects 1. Consider you have a workflow with two steps. One step is to transfer the asset from s...