Home | News | Products | Services | Support | Contacts | About us

Featured Product

AWS HTML Producer is a powerful HTML processing engine. You can not only parse HTML templates but also extract the latest stock indices, currency rates, weather, news from various websites! See on-line demos.
COM and Net editions available.

Testimonials

HTML Producer allows us to easily build dynamic content extraction routines that can fit both types of storage. Perhaps the greatest strength of HTML Producer is that it makes it very easy to create custom HTML tags, with attributes, that can triger virtually any type of data access or process routines that return dynamic content to a web page. In my opinion its worth checking out if are into template driven content solutions.

Jim Santarius
NCC Technology Group

Our Products

AWS HTML Producer

AWS HTML Producer 4.0
Enterprise-level HTML processing solution.

AWS HTML Producer .Net

AWS HTML Producer 4.0 .Net
Take advantage of the cutting-edge HTML processing in your .Net apps!

AWS HTML Wizard

AWS HTML Wizard 1.0
Design multi-page web forms and on-line wizards quickly and easily!

AWS FilePicker
AWS FilePicker

AWS FilePicker 2.1
"Open file" dialog for your web apps. Includes full-featured file manager!

About the Table Processor Object

Now almost any web page organizes its information with help of HTML tables. AWS HTML Producer has built-in robust HTML table processor that can help you process HTML tables and extract any information they keep. Thus, with this component you can easily retrieve necessary information from almost any web site. The TableProcessor object allows parsing whole HTML tables, recognizing cells and extracting their values. Along with the cell values, TableProcessor provides information on the structure of the table as well. This chapter contains overview of the TableProcessor object and show how it can be used for extracting information from web pages.

The table processing feature of AWS HTML Producer has several levels that let design the solution more clear and ready for future modernizations:

1. HTML tags level. This is the low level of the table processing process. It parses the HTML input and finds table-related HTML tags (<table>, <tr>, <td> and <th>), extract their values and attributes. The HTMLParserASP object operates on this level.

2. HTML table level. This is the middle level. It catches HTML parser’s notifications about the found tags and extracts the whole rows and cell. It sends notifications to the high level when table itself, its rows and cells are found. For these tasks the TableParser object is developed. It is private for the component, as the TableProcessor object only needs its services.

3. Table cells level. This is a high level; it processes the table in whole, featuring populating the table cell collection, merged cells support and other features.

The high level procedures work with HTML tables as with a set of objects – rows and cells, not as with separate HTML tags (<table>, <tr>, <td>, <th>). However, in contrast to DOM (Document Object Model), implemented in web browsers, it does not build the table-rows-cells tree representation of a table. The tree model can be a good solution for the applications that support scripting where it is important to create clear hierarchal model of the table to facilitate writing scripts. However, if you just are going to point the cell that contains the data you need, creating a tree is quite heavy and inconvenient solution. Imagine that you just desire to get values from a column with revenue rates for each month in a year balance table. With the tree, you would have to select a row firstly, next – a cell and only afterwards you would be able to get a value. This is a fully object-oriented and clear model, but too huge for simple data gathering tasks.

TableProcessor features an 1-D model where a plain collection of all table cells is supported. A TableCell object that has 2 special properties for finding out its position within a table represents each table cell. The first property keeps the index of the column where the cell is (ColIndex) and the second one – the index of the row of the cell (RowIndex). This gives you ability to point any cell you want within a table. Left image depict this.

TableProcessor recognizes table cells and creates a TableCell object for each one. It has no methods and stores only properties – parameters of a cell. As the cells are found, they are put into the TableCells collection that stores all the table cells linearly, in a 1-D form. Using the TableCells intrinsic GetCell method that returns a cell on specified column index (ColIndex argument) and row index (RowIndex argument):

Set Cell = Cells.GetCell(1,2)

Coordinates of a cell of a known table you can find out using TableProcessor Console (see a tutorial below).

After you got a cell, you have easy access to the cell value:

Value = Cell.Value 

And that’s all!

TableProcessor can process not only simple tables as above, but also tables with multiple merged (spanned) cells as well. But as each cell is represented by one TableCell object, which cannot be merged, one trick is implemented here in order to support initial table structure.

Look at the following weather table:

 

Moscow

New York

Rome

Temperature

63 F (17 C)

92F (33C)

77 F (25 C)

Humidity

52%

Wind

N at 5 mph (8 kph)

N at 3 mph (5 kph)

N at 3 mph (5 kph)

You see that the Humidity row has three merged cells (for Moscow, New York and Rome, they all have the same humidity level). How this table is represented in the TableCells collection? TableProducer creates a separate TableCell object (i.e., a separate cell in resulting table) for each cell that is merged in one and they all get the same value (“52%” in this case). However, these cells are not equal.

Usually, in HTML source there is a cell that keeps a value and has the Rowspan (or Colspan) attribute that “expands” the fence of this cell and “gives” its value to the neighbor cells that are merged with this one. So, the cell that gives its value to the other ones is known as a master cell and others that get this value – merged, or spanned cells. TableCell object has the IsSpanned property that shows if this cell is a merged cell and belongs to a master one. If it set True, you can get coordinates of the master cell from the MasterCellColIndex and MasterCellRowIndex properties.

This is the weather table as it is represented in the TableCells collection. The value of the master cell is shown in bold:

 

Moscow

New York

Rome

Temperature

63 F (17 C)

92F (33C)

77 F (25 C)

Humidity

52% (master)

52% (merged)

52% (merged)

Wind

N at 5 mph (8 kph)

N at 3 mph (5 kph)

N at 3 mph (5 kph)

TableProcessor offers also great flexibility in choosing the HTML table to process among the other tables on a page. You can select a table by five ways:

  • By its number on the page. Just count number of the table from the top.

  • By table name. If the table has the Name attribute specified, you can use it to select the table.

  • By custom attribute. If the table has any attribute with an unique value, you can specify the name of this attribute with its value and the table will be found!

  • By search string New! - you can specify a text string that would match HTML text within a table you want to process. Use this to find a table for processing by a text string it contains.
  • By regular expression pattern New! - the same as the previous one, but lets take advantage of the power of the regular expressions while matching a text inside the table to be selected. In this mode you need provide a valid regular expression pattern.

Take a look at an example:

MyTableProcessor.SelectCriterion = SelectCriteria.SelectBySearchString
MyTableProcessor.SearchString = "World Weather"

HTML Producer

Demos!

What's New?

What is HTML Producer?

What can I do with HTML Producer?

Requirements

Features

Documentation

Pricing

Inside HTML Producer

Processing HTML with AWS HTML Producer

About the Table Processor Object

The TableProcessor Object Tutorial

Selecting Tables by Search String and Pattern

The SmartIndexing Algorithm

File Caching in the TableProcessor object

Need a custom parser?

We're ready to develop a custom HTML processing component (or the entire application) if you feel HTML Producer does not fit your needs. We have 2+ year experience in developing HTML processing solutions. And be sure, it will NOT be that expensive.
Contact us to get quote.
Services by AWS.

Copyright © AW/Systems, LLC 2002—2004.  |   Privacy Policy

To the top