Loop through HTML elements to set or retrieve values

So, in this week’s installment, we’ll look at some basic HTML parsing methods and also how to fill out forms and submit them via code. I still see a lot people asking how to get the text from a specific hyperlink or setting the value of an input box on a web page. In this post, I’ll try to cover the method I use most when working with HTML parsing. I’ll show you how to get the link text from a hyperlink, set the text of an input box or textarea field, and I’ll also show you how to click form buttons to submit forms.

Every HTML element, such as anchors, divs, img, input, all have what’s called “attributes.” Here is an example of some general HTML code that shows you the use of attributes:

<input type="text" name="log" id="user_login" class="input" value="" size="20" tabindex="10" />
<input type="password" name="pwd" id="user_pass" class="input" value="" size="20" tabindex="20" />
<input type="submit" name="wp-submit" id="wp-submit" class="button-primary" value="Log In" tabindex="100" />

In each of those lines above, every word that comes before an equals sign (=) is considered an attribute. Each HTML element has specific attributes, some of which are common among all of them, but I do not want to go into that with this post. A basic understanding of what they are and how they’ll be used in our VB world is all that is needed for this post.

So now that we understand attributes and are familiar with their syntax, placement, and function, let’s look at how we can set them and retrieve them using VB. In VB, there are two methods that will be your “go to” tactics for doing this: .SetAttribute and .GetAttribute (can you guess which one gets and which one sets? *wink*)

Set the value of an input box or textarea:
There are 2 ways to do this. One option is to use the .GetElementById method of the HTML Document. If you’re lucky, the web page you’re working with will use the ID attribute of every HTML element in the HTML code. This makes it a lot easier to parse it with VB. Here is an example of setting the value of an input box with the ID of “id”:

WebBrowser1.Document.GetElementById("id").SetAttribute("value", "New Value")

What we’ve done there is fetched the HTML element “id” and set its “value” attribute to “New Value.” For input boxes, the value is what is shown inside the input box.
The other way to set the value of an input box with VB is to loop through the HTML collection of inputs and find the one you need based on an attribute value. The following code chunk should be put in your black book of code tricks as you’ll be using it a lot if HTML parsing is something you do often:

Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("input")
For Each curElement As HtmlElement In theElementCollection
      curElement.SetAttribute("value", "New Value")
Next

Without getting into the details, the above code merely gets all the elements with the tag “input” and stores them in an “HTML Element Collection”. This allows us to then loop through this collection of “inputs” and do what we’d like with each one. Here are a couple of ways to get different tags:

To get all hyperlinks:

.GetElementsByTagName("a")

To get all inputs:

.GetElementsByTagName("input")

To get all divs:

.GetElementsByTagName("div")

To get all spans:

.GetElementsByTagName("span")

To get all images:

.GetElementsByTagName("img")

The For Loop then loops through the collection and for each element (curElement), you have the available fore-mentioned methods to use to get/do what you need. Using .SetAttribute allows you to set the value of any attribute for that element, while .GetAttribute allows you to retrieve the value of any attribute. In addition to retrieving the attribtue values, VB also allows you to fetch other things like the .InnerHTML (HTML inside the element’s tags), the .InnerText (text between the element’s tags), .OuterHTML (HTML of the element’s parent), and .OuterText (the text between the parent’s elements’ tags).

Clicking an HTML element such as a button or hyperlink:
Now let’s look at how to “click” things with our code. You can pretty much click anything you want. Many people often ask, “What if the link or button calls a javascript function?”. Simple answer: “Doesn’t matter.” As we’ll be “clicking” the link or button just as a visitor would, the normal “happenings” that would occur are going to happen as they usually would. It’s not like we’re having to call the javascript function directly or something…

So.. the HTML element we’ll be using this for most commonly is the “input” button, which will usually have an attribute of “type”. When looking to click a button, the attribute “type” will usually have a value of “submit”. That is the one we want!

Pop Quiz: Question: How many ways are there to do this? Answer: 2!

We can address the input button by ID if it is provided in the HTML code, or we can loop through the collection of Input elements. If we have to take the loop route, what we would do is test the .GetAttribute(“type”) value to see if it is equal to “submit”. If it is, then we’ll “click” it. Here’s how that would look:

Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("input")
For Each curElement As HtmlElement In theElementCollection
   If curElement.GetAttribute("type").ToLower = "submit" Then
         curElement.InvokeMember("click")
    End If
Next

We call the .InvokeMember method on the HTML element which basically translates to “perform the following action on this element”. In our case, the action is to “click” it. This works for input buttons, hyperlinks, images, or anything else that you would be able to click normally with a mouse!

While this isn’t the most in-depth look at HTML automation, hopefully it will give you a rough idea of the procedures used most commonly to set an HTML field’s value, or retrieve a particular value from the HTML. I make use of this “go to” HTML loop in my Scraper class to make it even easier to use!

Comments welcome.

Like it? Share it:

24 thoughts on “Loop through HTML elements to set or retrieve values

  1. Ray

    Excellent HTML parsing info – thank you!

    Was wondering if you knew how to ‘trap’ a web page when a user clicks on a link (from a webbrowser control embedded on a WinForm) and it then opens a ‘new’ browser. So that’s why I say ‘trap’ because you need to (1) create a new WinForm with another webbrowser control in it (on the fly), and (2) open the new page in that 2nd webbrowser instance.

    I know in vb6 you this should work:

    Private Sub WebBrowser1_NewWindow2(ppDisp As Object, Cancel As Boolean)
    Dim frm As Form1
    Set frm = New Form1
    Set ppDisp = frm.WebBrowser1.Object
    frm.Show
    End Sub

    But I’m trying to accomplish this in VB2010.

    Thank you in advance for your help.

    Best regards,
    -Ray

    Reply
  2. Steve Post author

    Hi Ray,
    That’s a bit tricky. It could get quite complex if you were wanting to account for Javascript “links” and links that have a “Target” attribute assigned. If you don’t need to worry about these 2 cases, you could “trap” the .ActiveElement of the WebBrowser Document and e.Cancel in the Navigating event of the WebBrowser.

    You could then launch a new form (with WebBrowser) and onLoad have it Navigate to the Url of the ActiveElement you captured in the previous Form.

    If you don’t need to account for those 2 special cases, I might be able to throw some code together for you.

    Thanks for reading,
    Steve

    Reply
  3. Ray

    So this is what I have so far:

    VB code:

    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
    WebBrowser1.Document.GetElementById(“lookupId”).SetAttribute(“value”, “123″)
    WebBrowser1.Navigate(“javascript:lookupRequest();”)
    End Sub

    This is a part of the imbedded web page (note this cannot be altered, as I don’t own the page).

    Request ID Lookup: Go

    So at this point, I can find the ‘lookupId’ field and set its attribute, i.e “123” then press the ‘go’ button (which launches a 2nd browser). However, I’m guessing I need that new web page to be in a browser instance that I create. Would greatly appreciate anything you can do to help, even a basic template. Thank you.

    Reply
    1. Kevin Rollins

      I actually found my answer a couple hours after posting the question. I used-
      WebBrowser1.Document.Window.Frames(1).Document.GetElementsByTagName(“a”)

      Thank you very much for creating such an informative site and considering my questions.

      Thanks,
      Kevin

      Reply
  4. Kevin Rollins

    Like other comments made, there is excellent HTML parsing information contained on your page. I am learning a good amount of information on this topic, and your post cleared up a lot of questions I have.
    The issue I am having is dealing with frames that compose a particular webpage. The page that I am working with has a frameset, as determined by looking at the main file I refer to as the ‘index.html’ file. Using the techniques I have learned from your post, I can read from that HTML file fine using the VB line-
    ElementCollection = WebBrowser1.Document.GetElementsByTagName(“frame”)

    I can also Refer to the HTML file in the frameset that contains the objects and information I need by using-
    WebBrowser1.Document.All(1).InnerHtml

    I put this information in a message box and can view all of the innerHtml as one big string. What I need is to collect all the objects and loop through one at a time until I find the one desired.

    I am trying to use your collection line again, but this time addressing a particular frame HTML file. If I use the line-
    ElementCollection = WebBrowser1.Document.All(1).GetElementsByTagName(“a”)

    It returns nothing.

    Basically, I need to find a way to collect HTML elements from a website that contains multiple frames and specify which frame I want to read from.

    Any help or suggestions would be greatly appreciated.
    Thank you very much,
    Kevin

    Reply
  5. Steve Post author

    Hi Kevin,
    Take a look at the comment just above yours. Ray had the same requirement and you can accomplish it with:

    WebBrowser1.Document.Window.Frames(1).Document.GetElementsByTagName(“a”)

    WebBrowser1.Document.Window.Frames(1) is how you address a particular frame (the “1″ is the index of the frame in the Frame collection on the page).

    You can loop through these with a simple For Loop:
    For i as Integer = 0 to WebBrowser1.Document.Window.Frames.Count - 1
    Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.Window.Frames(i).GetElementsByTagName("a")
    For Each curElement As HtmlElement In theElementCollection
    MsgBox(curElement.GetAttribute("href").ToLower) 'Use whatever attribute you want here
    End If
    Next

    Reply
  6. mel

    hi steve could you teach me how to scrape yahoo groups links in my yahoo groups coz im new to programming im using vb10 btw.

    Reply
  7. Steve Post author

    Try something like:

    Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName(“a”)
    For Each curElement As HtmlElement In theElementCollection
    If curElement.GetAttribute(“href”).ToLower.Contains(“/group”) And curElement.GetAttribute(“href”).ToLower.Contains(“/yguid”) Then
    MsgBox(curElement.InnerText.ToString)
    End If
    Next

    Reply
  8. Steve Post author

    You’ll have to post the html code for the “next page” link/button and I’ll be able to help you out.

    Thanks for reading! Feel free to Google +1 me if I’ve helped at all.

    Reply
  9. mel

    Public Function GetLinksYahoo()
    Dim ScrapedData As New List(Of String)
    Dim theElementCollection As HtmlElementCollection = wbyahoo.Document.GetElementsByTagName(“a”)
    For Each curElement As HtmlElement In theElementCollection
    If curElement.GetAttribute(“href”).ToLower.Contains(“/group”) And curElement.GetAttribute(“href”).ToLower.Contains(“/yguid”) Then
    ScrapedData.Add(curElement.OuterHtml)
    End If
    Next
    Return ScrapedData
    For Each a In ScrapedData
    yahoolistbox.Items.Add(a)
    Next
    End Function
    Private Sub btnmyyahoo_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnmyyahoo.Click
    Do Until wbyahoo.ReadyState = WebBrowserReadyState.Complete
    Application.DoEvents()
    Loop
    Call GetLinksYahoo()
    End Sub

    not working man

    Reply
  10. Steve Post author

    First off, you gotta help me out with “not working”.. that could mean so many things… errors, unintended output, no output, hard drive got formatted, neighbor stole your car, etc…

    It looks like you’re doing the “Return ScrapedData” before adding the items to your Listbox. This means it will exit the function before the adding takes place (this may be the problem you’re seeing).

    Is it not finding any of the links or is it not adding it to the Listbox? Need more detail on what’s not working.

    Reply
  11. Steve Post author

    Have you placed a “stop” on this line: ScrapedData.Add(curElement.OuterHtml)

    Does curElement.OuterHtml hold any value?

    Place your “Return” statement after the loop:
    For Each a In ScrapedData
    yahoolistbox.Items.Add(a)
    Next
    Return ScrapedData

    Reply
  12. Steve Post author

    Also, you do not need the word “Call” in front of the Function/Method. That’s a “scripting” thing.. it’s not needed. I noticed also that you’re returning the List from the function, but not assigning it to anything.

    This makes me think that either A) You don’t actually need it returned for what you’re trying to accomplish (adding the items to a listbox) or B) You are not fully understanding the Function/Method difference.

    If you don’t need it returned, you can change your Function to a Method and remove the Return line.

    What is the URL of the page you are scraping? (if it is public)

    I only have MSN Messenger / Skype (sourcematters)

    Reply
  13. mel

    if i use your scraperdemo it actually get my specific links but it has two links that i dont want to get, and it does not get the innertext

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of follow-up comments via e-mail. You can also subscribe without commenting.