Hi, I'd like to extract some data from a html page with Html Agility Pack.
The html page is full of these list items. The data I need is shown in Red.
The code at the bottom works fine for extracting the image url and the name, but I don't know how to extract the InnerText from the Div/Class, because the Class name is not the same for every list item. Does anybody know how to do this?
The html page is full of these list items. The data I need is shown in Red.
Code:
<li>
<img src="http://website.com/up/uploads/image/MDTGAoMHKsTjliLLr_170x136.jpg" width="170" height="136"
alt="maria's profile" class="png">
<a href="/maria/"
class=
"corners"> </a>
<div class="thumbnail_label thumbnail_label_f">F</div>
<div class="details">
<div class="title">
<a href=
"/maria/">maria</a>
<span class="age genderf">28</span>
</div>
<ul class="sub-info">
<li class="location">California, United States</li>
<li class="member">28374</li>
</ul>
</div>
</li>
Code:
<div class="thumbnail_label thumbnail_label_a">A</div>
<div class="thumbnail_label thumbnail_label_f">F</div>
<div class="thumbnail_label thumbnail_label_m">M</div>
etc
etc
vb.net Code:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim htmldoc As New HtmlDocument htmldoc.Load("C:\test.htm") For Each li As HtmlNode In htmldoc.DocumentNode.SelectNodes("//li") Dim imageNode As HtmlNode = li.SelectSingleNode(".//img") If imageNode IsNot Nothing Then Dim src As HtmlAttribute = imageNode.Attributes("src") '// Image url Dim nameNode As HtmlNode = li.SelectSingleNode(".//a") If nameNode IsNot Nothing Then Dim href As HtmlAttribute = nameNode.Attributes("href") '// Name '??? Dim divNode As HtmlNode = li.SelectSingleNode(".//div[@class='thumbnail_label']") If divNode IsNot Nothing Then End If End If End If Next End Sub