How does the site know what part of the list to serve up? Clicking on the link for 'D' gives us this address: So the only difference in the address is the letters themselves.
Adding on the i Page No parameter will get us from page to page. en Pd Nm=All&i Page No=' # We found this by looking at Pfizer's listing LAST_PAGE_NUMBER = 485 # create a subdirectory called 'pfizer-list-pages' LIST_PAGES_SUBDIR = 'pfizer-list-pages' Dir.mkdir(LIST_PAGES_SUBDIR) unless File.exists?If you click through other pages of the list, you'll see that the one constant is this: With this as the base address, what follows the question mark are the parameters that tell Pfizer's website what results to show.For the list of alphabetized names, the name of the is en Pd Nm.The particular datapoint you want will always be nested within the same tags.Finding a webpage's structure is best done through your browser's web development tools, or plugins such as Firefox's immensely useful Firebug.So just as it's possible to skip the link-clicking part to navigate the list, it's possible to automate the changing of these parameters instead of manually typing them.
The Pfizer website gives us the option of paging through their entire list, from start to finish, 10 entries at a time.
The right column lists the actual doctor and provides a link to a page that includes the breakdown of payments for that doctor.
In this list view, however, the payment details are not always the same as what's shown on each doctor's page. In your browser's address bar, you'll notice that the website incorporates a new parameter, called "hcpdisplay Name".
Clicking on the "Last" link reveals a couple of things about the website: Upon closer inspection of the page, you'll notice that the names in the left-most column don't refer to health care providers, but the person to whom the check was made out.
In most cases, it's the same name of the doctor; in others, it's the name of the clinic, company, or university represented by the doctor.
(LIST_PAGES_SUBDIR) # So, from 1 to 485, we'll open the same address on Pfizer's site, but change the last number for page_number in 1..