Actions
| Name | Description |
|---|---|
| parse collection | Parse a collection of HTML files |
| parse file | Parse an HTML file |
| remove document and its parse | Remove a parsed HTML file that was parsed from a document, including its sections and sentences, and remove the document |
| remove parse | Remove a parsed HTML file, including its sections and sentences, but do not remove the document |
Inputs
| Action | Name | Description |
|---|---|---|
| []|parse collection | collection-item | The item representing a collection whose documents are to be parsed |
| []|parse file | file-item | An item representing the original file from which the HTML file to be parsed was obtained |
| []|parse file | html-item | An item representing the html text to be parsed. It is the subject of a triple whose verb is sys:fileHasContent and whose object is the HTML text |
| []|parse file | parse-source | The identification number of the source where the sections and sentences of the parsed file are to be stored |
| []|parse file | reference-url | The URL to be used to look up references |
| []|remove document and its parse | document-item | The item representing the document that is to be removed with its parse |
| []|remove parse | parsed-file-item | The item representing the parsed file to be removed |
Outputs
| Action | Name | Description |
|---|---|---|
| []|parse collection | parsed-collection-item | An item representing the collection that has been parsed |
| []|parse file | parsed-file-item | An item representing the results of parsing the html text. The sections and sentences are associated with it by triples. |
Local Run Sequence
| ["python3", "-m", "html_parser", "$API", "$CRED", "$KEY"] |