Package net.bioclipse.managers
Class JSoupManager
java.lang.Object
net.bioclipse.managers.JSoupManager
- All Implemented Interfaces:
IBactingManager
,net.bioclipse.managers.business.IBioclipseManager
Manager for JSoup functionality to parse HTML content.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondoi()
Lists the DOIs of the articles associated to this manager.org.jsoup.nodes.Document
Parses a file with HTML content from the workspace into the JSoupDocument
.org.jsoup.nodes.Document
parseString
(String htmlString) Parses a string with HTML content into the JSoupDocument
.removeHTMLTags
(String htmlString) Takes a HTML string and removes all tags.org.jsoup.select.Elements
Selects a subsection of theDocument
and returns it as anElements
object.
-
Constructor Details
-
JSoupManager
Creates a newJSoupManager
.- Parameters:
workspaceRoot
- location of the workspace, e.g. "."
-
-
Method Details
-
parseString
Parses a string with HTML content into the JSoupDocument
.- Parameters:
htmlString
- the HTML asString
- Returns:
- the HTML content as
Document
-
parse
public org.jsoup.nodes.Document parse(String htmlFile) throws net.bioclipse.core.business.BioclipseException Parses a file with HTML content from the workspace into the JSoupDocument
.- Parameters:
htmlFile
- the name of the HTML file in the workspace- Returns:
- the HTML content as
Document
- Throws:
net.bioclipse.core.business.BioclipseException
- when the file could not be read
-
removeHTMLTags
Takes a HTML string and removes all tags.- Parameters:
htmlString
- the HTML asString
- Returns:
- the text bits from the HTML
-
select
Selects a subsection of theDocument
and returns it as anElements
object.- Parameters:
doc
- JSoup document to select from asElement
cssSelector
- String with a Cascading Style Sheet selector instruction- Returns:
- the selected content
-
getManagerName
- Specified by:
getManagerName
in interfacenet.bioclipse.managers.business.IBioclipseManager
-
doi
Description copied from interface:IBactingManager
Lists the DOIs of the articles associated to this manager.- Specified by:
doi
in interfaceIBactingManager
- Returns:
- a
List
of String with DOIs
-