Description
Sometimes we need to extract an image from inside a table on a web page. In this quick tutorial, you will learn how to interact and extract such elements.
Instructions
Follow these steps:
1. Look at the page structure and check how the image is displayed inside the table. Typically, it would be contained inside a tag like <img src='sourcepage'>
In the image below it’s possible to see that there are tags <img src=””> in each line of the table.
2. Create the workflow to extract the image from that table. We can divide our workflow into three main sessions
In session 1 you open the website and do the proper navigation. Note that the website in this case has an iFrame — which leads you to search for the iFrame before doing the extraction.
Session 2 consists in using the Get element property activity to extract the src property from the img tag. For that, use the Browser Picker to select the element on the web page. In the example, our selectors return 2 elements, which are the two images on the web page.
In the Get element property activity properties, it is necessary to select the src in the "Property name" field.
The result is an array with the image URLs in it:
In session 3 you must download the images. To do that, loop through the array created as the result of the Get element property activity and use the Copy file activity.
curl <url> -o <output_path>
Your final result is simple: the files downloaded in the specified path.