|
|
|
|
|
The SmartIndexing Algorithm
The kernel of the TableProcessor object has changed completely implementing a new SmartIndexing algorithm that, if used with the caching feature, improves the speed of table processing up to 50 (!) times and more in batch table processing, comparing with AWS HTML Producer 3.5.
Thanks to new algorithm, TableProcessor builds an incremental index of all tables that were met in HTML text during processing table. When you parse other tables on the web page, the parser does not need to follow whole HTML source in order to find the necessary table; it just looks through the index and finds position of the table. "Incremental" here means that the index is not necessary built for whole page from the very beginning; it's built only for those tables which were met during search of the necessary table. If a table that is not indexed yet must be processed, the indexer will index all tables from the end of previously indexed block to the requested table position. So, the index will not be rebuilt from the very beginning.
Because the index is a simple and compact VB collection (it contains objects representing HTML tables in the source text), it works very fast.
|
|
|
|
|