Parse HTML

Actions

NameDescription
parse collection

Parse a collection of HTML files

parse file

Parse an HTML file

remove parse

Remove a parsed HTML file, including its sections and sentences


Inputs

ActionNameDescription
parse collectioncollection-item

The item representing a collection whose documents are to be parsed

parse filefile-item

An item representing the original file from which the HTML file to be parsed was obtained.

parse filehtml-item

In item representing the html text to be parsed. It is the subject of a triple whose verb is sys:fileHasContent and whose object is the HTML text.

parse fileparse-source

The identification number of the source where the sections and sentences of the parsed file are to be stored.

parse filereference-url

The URL to be used to look up references

remove parseparsed-file-item

The item representing the parsed file to be removed


Outputs

ActionNameDescription
parse collectionparsed-collection-item

An item representing the collection that has been parsed

parse fileparsed-file-item

An item representing the results of parsing the html text. The sections and sentences are associated with it by triples.