Parse HTML

Actions

NameDescription
parse collection

Parse a collection of HTML files

parse file

Parse an HTML file

remove document and its parse

Remove a parsed HTML file that was parsed from a document, including its sections and sentences, and remove the document

remove parse

Remove a parsed HTML file, including its sections and sentences, but do not remove the document


Inputs

ActionNameDescription
[]|parse collectioncollection-item

The item representing a collection whose documents are to be parsed

[]|parse filefile-item

An item representing the original file from which the HTML file to be parsed was obtained

[]|parse filehtml-item

An item representing the html text to be parsed. It is the subject of a triple whose verb is sys:fileHasContent and whose object is the HTML text

[]|parse fileparse-source

The identification number of the source where the sections and sentences of the parsed file are to be stored

[]|parse filereference-url

The URL to be used to look up references

[]|remove document and its parsedocument-item

The item representing the document that is to be removed with its parse

[]|remove parseparsed-file-item

The item representing the parsed file to be removed


Outputs

ActionNameDescription
[]|parse collectionparsed-collection-item

An item representing the collection that has been parsed

[]|parse fileparsed-file-item

An item representing the results of parsing the html text. The sections and sentences are associated with it by triples.

Local Run Sequence

["python3", "-m", "html_parser", "$API", "$CRED", "$KEY"]