|
DLESE Tools v1.6.0 |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.dlese.dpc.util.HTMLParser
public class HTMLParser
The HTMLParser class contains methods which allow an HTML document to be parsed. These methods allow text in the document to be extracted, as well as the contents of Meta tags Header (h1 , h2, h3, .. h6) tags, the Title tag, all the links in the page etc. Example html document at http://www.abc.org: (for help with explaining the methods in this API)
Middle school students can learn about hurricane science and safety with the Hurricane Strike module, while more advanced students can utilize the multimedia technology of the online meteorology guide Hurricanes.
One of ABC's newest collections, the NASA Scientific Visualization Studio, offers data, images and animations from previous Atlantic storms.
Constructor Summary | |
---|---|
HTMLParser(String resourcelocn)
Constructor of an HTMLParser object |
|
HTMLParser(String htmlcontent,
String charset)
Constructor of an HTMLParser object |
Method Summary | |
---|---|
String[] |
getAllLinks()
returns a String array of all the links in the html document. |
String |
getHeaderText()
returns all the text in the html page which is contained within header tags (which includes |
String |
getImgAlts()
returns a String containing all the text within the alt attribute of all the img tags in the html document |
String |
getLinkTitles()
returns a String containing all the text within the title attribute of all the links in the html document |
String |
getMetaTagContentByName(String name)
returns the content of the Meta tag whose name equals mname. |
String |
getTitleText()
returns the title of the HTML page , i.e. |
String |
getWholeText()
returns the text of the whole html document, stripped of all the HTML tags. |
boolean |
hasMetaTagName(String name)
returns true if the html document contains a Meta tag with a name equal to mname , otherwise returns false e.g. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public HTMLParser(String resourcelocn) throws org.htmlparser.util.ParserException
resourcelocn
- either a URL or the name of an HTML file
org.htmlparser.util.ParserException
- e.g.:
HTMLParser hp = new HTMLParser("http://www.dlese.org");
HTMLParser hp2 = new HTMLParser(testthis.htm);public HTMLParser(String htmlcontent, String charset) throws org.htmlparser.util.ParserException
htmlcontent
- String containing the HTML to be parsedcharset
- if null, the default encoding is used
org.htmlparser.util.ParserException
Method Detail |
---|
public String getHeaderText() throws org.htmlparser.util.ParserException
org.htmlparser.util.ParserException
public String getTitleText() throws org.htmlparser.util.ParserException
org.htmlparser.util.ParserException
public boolean hasMetaTagName(String name) throws org.htmlparser.util.ParserException
name
- name of the Meta Tag
org.htmlparser.util.ParserException
public String getMetaTagContentByName(String name) throws org.htmlparser.util.ParserException
name
- name of the Meta Tag
org.htmlparser.util.ParserException
public String[] getAllLinks() throws org.htmlparser.util.ParserException
org.htmlparser.util.ParserException
public String getLinkTitles() throws org.htmlparser.util.ParserException
org.htmlparser.util.ParserException
public String getImgAlts() throws org.htmlparser.util.ParserException
org.htmlparser.util.ParserException
public String getWholeText() throws org.htmlparser.util.ParserException
org.htmlparser.util.ParserException
|
DLESE Tools v1.6.0 |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |