Loop through HTML elements to set or retrieve values

So, in this week’s installment, we’ll look at some basic HTML parsing methods and also how to fill out forms and submit them via code. I still see a lot people asking how to get the text from a specific hyperlink or setting the value of an input box on a web page. In this post, I’ll try to cover the method I use most when working with HTML parsing. I’ll show you how to get the link text from a hyperlink, set the text of an input box or textarea field, and I’ll also show you how to click form buttons to submit forms.

Every HTML element, such as anchors, divs, img, input, all have what’s called “attributes.” Here is an example of some general HTML code that shows you the use of attributes:

<input type="text" name="log" id="user_login" class="input" value="" size="20" tabindex="10" />
<input type="password" name="pwd" id="user_pass" class="input" value="" size="20" tabindex="20" />
<input type="submit" name="wp-submit" id="wp-submit" class="button-primary" value="Log In" tabindex="100" />

In each of those lines above, every word that comes before an equals sign (=) is considered an attribute. Each HTML element has specific attributes, some of which are common among all of them, but I do not want to go into that with this post. A basic understanding of what they are and how they’ll be used in our VB world is all that is needed for this post.

So now that we understand attributes and are familiar with their syntax, placement, and function, let’s look at how we can set them and retrieve them using VB. In VB, there are two methods that will be your “go to” tactics for doing this: .SetAttribute and .GetAttribute (can you guess which one gets and which one sets? *wink*)

Set the value of an input box or textarea:
There are 2 ways to do this. One option is to use the .GetElementById method of the HTML Document. If you’re lucky, the web page you’re working with will use the ID attribute of every HTML element in the HTML code. This makes it a lot easier to parse it with VB. Here is an example of setting the value of an input box with the ID of “id”:

WebBrowser1.Document.GetElementById(“id”).SetAttribute(“value”, “New Value”)

What we’ve done there is fetched the HTML element “id” and set its “value” attribute to “New Value.” For input boxes, the value is what is shown inside the input box.
The other way to set the value of an input box with VB is to loop through the HTML collection of inputs and find the one you need based on an attribute value. The following code chunk should be put in your black book of code tricks as you’ll be using it a lot if HTML parsing is something you do often:

Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName(“input”)
For Each curElement As HtmlElement In theElementCollection
curElement.SetAttribute(“value”, “New Value”)

Without getting into the details, the above code merely gets all the elements with the tag “input” and stores them in an “HTML Element Collection”. This allows us to then loop through this collection of “inputs” and do what we’d like with each one. Here are a couple of ways to get different tags:

To get all hyperlinks: .GetElementsByTagName(“a”)
To get all inputs: .GetElementsByTagName(“input”)
To get all divs: .GetElementsByTagName(“div”)
To get all spans: .GetElementsByTagName(“span”)
To get all images: .GetElementsByTagName(“img”)

The For Loop then loops through the collection and for each element (curElement), you have the available fore-mentioned methods to use to get/do what you need. Using .SetAttribute allows you to set the value of any attribute for that element, while .GetAttribute allows you to retrieve the value of any attribute. In addition to retrieving the attribtue values, VB also allows you to fetch other things like the .InnerHTML (HTML inside the element’s tags), the .InnerText (text between the element’s tags), .OuterHTML (HTML of the element’s parent), and .OuterText (the text between the parent’s elements’ tags).

Clicking an HTML element such as a button or hyperlink:
Now let’s look at how to “click” things with our code. You can pretty much click anything you want. Many people often ask, “What if the link or button calls a javascript function?”. Simple answer: “Doesn’t matter.” As we’ll be “clicking” the link or button just as a visitor would, the normal “happenings” that would occur are going to happen as they usually would. It’s not like we’re having to call the javascript function directly or something…

So.. the HTML element we’ll be using this for most commonly is the “input” button, which will usually have an attribute of “type”. When looking to click a button, the attribute “type” will usually have a value of “submit”. That is the one we want!

Pop Quiz: Question: How many ways are there to do this? Answer: 2!

We can address the input button by ID if it is provided in the HTML code, or we can loop through the collection of Input elements. If we have to take the loop route, what we would do is test the .GetAttribute(“type”) value to see if it is equal to “submit”. If it is, then we’ll “click” it. Here’s how that would look:

Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName(“input”)
For Each curElement As HtmlElement In theElementCollection
If curElement.GetAttribute(“type”).ToLower = “submit” Then
End If

We call the .InvokeMember method on the HTML element which basically translates to “perform the following action on this element”. In our case, the action is to “click” it. This works for input buttons, hyperlinks, images, or anything else that you would be able to click normally with a mouse!

While this isn’t the most in-depth look at HTML automation, hopefully it will give you a rough idea of the procedures used most commonly to set an HTML field’s value, or retrieve a particular value from the HTML. I make use of this “go to” HTML loop in my Scraper class to make it even easier to use!

Comments welcome.

24 thoughts on “Loop through HTML elements to set or retrieve values

  1. Excellent HTML parsing info – thank you!

    Was wondering if you knew how to ‘trap’ a web page when a user clicks on a link (from a webbrowser control embedded on a WinForm) and it then opens a ‘new’ browser. So that’s why I say ‘trap’ because you need to (1) create a new WinForm with another webbrowser control in it (on the fly), and (2) open the new page in that 2nd webbrowser instance.

    I know in vb6 you this should work:

    Private Sub WebBrowser1_NewWindow2(ppDisp As Object, Cancel As Boolean)
    Dim frm As Form1
    Set frm = New Form1
    Set ppDisp = frm.WebBrowser1.Object
    End Sub

    But I’m trying to accomplish this in VB2010.

    Thank you in advance for your help.

    Best regards,

  2. Hi Ray,
    That’s a bit tricky. It could get quite complex if you were wanting to account for Javascript “links” and links that have a “Target” attribute assigned. If you don’t need to worry about these 2 cases, you could “trap” the .ActiveElement of the WebBrowser Document and e.Cancel in the Navigating event of the WebBrowser.

    You could then launch a new form (with WebBrowser) and onLoad have it Navigate to the Url of the ActiveElement you captured in the previous Form.

    If you don’t need to account for those 2 special cases, I might be able to throw some code together for you.

    Thanks for reading,

  3. So this is what I have so far:

    VB code:

    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
    WebBrowser1.Document.GetElementById(“lookupId”).SetAttribute(“value”, “123”)
    End Sub

    This is a part of the imbedded web page (note this cannot be altered, as I don’t own the page).

    Request ID Lookup: Go

    So at this point, I can find the ‘lookupId’ field and set its attribute, i.e “123” then press the ‘go’ button (which launches a 2nd browser). However, I’m guessing I need that new web page to be in a browser instance that I create. Would greatly appreciate anything you can do to help, even a basic template. Thank you.

    1. I actually found my answer a couple hours after posting the question. I used-

      Thank you very much for creating such an informative site and considering my questions.


  4. Like other comments made, there is excellent HTML parsing information contained on your page. I am learning a good amount of information on this topic, and your post cleared up a lot of questions I have.
    The issue I am having is dealing with frames that compose a particular webpage. The page that I am working with has a frameset, as determined by looking at the main file I refer to as the ‘index.html’ file. Using the techniques I have learned from your post, I can read from that HTML file fine using the VB line-
    ElementCollection = WebBrowser1.Document.GetElementsByTagName(“frame”)

    I can also Refer to the HTML file in the frameset that contains the objects and information I need by using-

    I put this information in a message box and can view all of the innerHtml as one big string. What I need is to collect all the objects and loop through one at a time until I find the one desired.

    I am trying to use your collection line again, but this time addressing a particular frame HTML file. If I use the line-
    ElementCollection = WebBrowser1.Document.All(1).GetElementsByTagName(“a”)

    It returns nothing.

    Basically, I need to find a way to collect HTML elements from a website that contains multiple frames and specify which frame I want to read from.

    Any help or suggestions would be greatly appreciated.
    Thank you very much,

  5. Hi Kevin,
    Take a look at the comment just above yours. Ray had the same requirement and you can accomplish it with:


    WebBrowser1.Document.Window.Frames(1) is how you address a particular frame (the “1” is the index of the frame in the Frame collection on the page).

    You can loop through these with a simple For Loop:
    For i as Integer = 0 to WebBrowser1.Document.Window.Frames.Count - 1
    Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.Window.Frames(i).GetElementsByTagName("a")
    For Each curElement As HtmlElement In theElementCollection
    MsgBox(curElement.GetAttribute("href").ToLower) 'Use whatever attribute you want here
    End If

  6. Try something like:

    Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName(“a”)
    For Each curElement As HtmlElement In theElementCollection
    If curElement.GetAttribute(“href”).ToLower.Contains(“/group”) And curElement.GetAttribute(“href”).ToLower.Contains(“/yguid”) Then
    End If

  7. You’ll have to post the html code for the “next page” link/button and I’ll be able to help you out.

    Thanks for reading! Feel free to Google +1 me if I’ve helped at all.

  8. Public Function GetLinksYahoo()
    Dim ScrapedData As New List(Of String)
    Dim theElementCollection As HtmlElementCollection = wbyahoo.Document.GetElementsByTagName(“a”)
    For Each curElement As HtmlElement In theElementCollection
    If curElement.GetAttribute(“href”).ToLower.Contains(“/group”) And curElement.GetAttribute(“href”).ToLower.Contains(“/yguid”) Then
    End If
    Return ScrapedData
    For Each a In ScrapedData
    End Function
    Private Sub btnmyyahoo_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnmyyahoo.Click
    Do Until wbyahoo.ReadyState = WebBrowserReadyState.Complete
    Call GetLinksYahoo()
    End Sub

    not working man

  9. First off, you gotta help me out with “not working”.. that could mean so many things… errors, unintended output, no output, hard drive got formatted, neighbor stole your car, etc…

    It looks like you’re doing the “Return ScrapedData” before adding the items to your Listbox. This means it will exit the function before the adding takes place (this may be the problem you’re seeing).

    Is it not finding any of the links or is it not adding it to the Listbox? Need more detail on what’s not working.

  10. Have you placed a “stop” on this line: ScrapedData.Add(curElement.OuterHtml)

    Does curElement.OuterHtml hold any value?

    Place your “Return” statement after the loop:
    For Each a In ScrapedData
    Return ScrapedData

  11. Also, you do not need the word “Call” in front of the Function/Method. That’s a “scripting” thing.. it’s not needed. I noticed also that you’re returning the List from the function, but not assigning it to anything.

    This makes me think that either A) You don’t actually need it returned for what you’re trying to accomplish (adding the items to a listbox) or B) You are not fully understanding the Function/Method difference.

    If you don’t need it returned, you can change your Function to a Method and remove the Return line.

    What is the URL of the page you are scraping? (if it is public)

    I only have MSN Messenger / Skype (sourcematters)

  12. if i use your scraperdemo it actually get my specific links but it has two links that i dont want to get, and it does not get the innertext

Leave a Reply

Your email address will not be published. Required fields are marked *