Quantcast
Channel: VBForums - Visual Basic .NET
Viewing all articles
Browse latest Browse all 27412

[RESOLVED] Extract DIV from html with Html Agility Pack

$
0
0
Hi, I'd like to extract some data from a html page with Html Agility Pack.

The html page is full of these list items. The data I need is shown in Red.

Code:

<li>
    <img src="http://website.com/up/uploads/image/MDTGAoMHKsTjliLLr_170x136.jpg" width="170" height="136"
        alt="maria's profile" class="png">
 
  <a href="/maria/"
      class=
  "corners">&nbsp;</a>
 
    <div class="thumbnail_label thumbnail_label_f">F</div>
             
  <div class="details">
    <div class="title">
      <a  href=
      "/maria/">maria</a>
      <span class="age genderf">28</span>
    </div>
    <ul class="sub-info">
      <li class="location">California, United States</li>
      <li class="member">28374</li>
    </ul>
  </div>
</li>

The code at the bottom works fine for extracting the image url and the name, but I don't know how to extract the InnerText from the Div/Class, because the Class name is not the same for every list item. Does anybody know how to do this?

Code:

<div class="thumbnail_label thumbnail_label_a">A</div>
<div class="thumbnail_label thumbnail_label_f">F</div>
<div class="thumbnail_label thumbnail_label_m">M</div>
etc
etc

vb.net Code:
  1. Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
  2.         Dim htmldoc As New HtmlDocument
  3.         htmldoc.Load("C:\test.htm")
  4.  
  5.         For Each li As HtmlNode In htmldoc.DocumentNode.SelectNodes("//li")
  6.  
  7.             Dim imageNode As HtmlNode = li.SelectSingleNode(".//img")
  8.             If imageNode IsNot Nothing Then
  9.                 Dim src As HtmlAttribute = imageNode.Attributes("src")
  10.                 '// Image url
  11.                 Debug.Print(src.Value)
  12.  
  13.                 Dim nameNode As HtmlNode = li.SelectSingleNode(".//a")
  14.                 If nameNode IsNot Nothing Then
  15.                     Dim href As HtmlAttribute = nameNode.Attributes("href")
  16.                     '// Name
  17.                     Debug.Print(href.Value.Replace("/", String.Empty))
  18.  
  19.  
  20.                     '???
  21.                     Dim divNode As HtmlNode = li.SelectSingleNode(".//div[@class='thumbnail_label']")
  22.                     If divNode IsNot Nothing Then
  23.                         Debug.Print(divNode.InnerText)
  24.                     End If
  25.  
  26.  
  27.                 End If
  28.             End If
  29.         Next
  30.     End Sub

Viewing all articles
Browse latest Browse all 27412

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>