how to scrape currency data in python
Introduction
Whether you’re building an application that needs to scrape data or are just interested in learning how it works, this is the perfect tutorial for you. We’ll be going over how to scrape currency data off of investopedia.com
We’ll be going over how to scrape currency data off of investopedia.com
We’ll be going over how to scrape currency data from investopedia.com.
If you haven’t done so yet, please follow this guide [here] (https://www.investopedia.com/terms/f/forexmarketsandexchangerates.asp) to read about the basics of forex markets and exchange rates on Investopedia before continuing with this tutorial!
First, we’ll import our libraries, requests, and beautifulsoup
The first step is to import our libraries, requests, and beautifulsoup. We’ll also create a variable called url which will hold the URL of the currency data (the site we’re scraping).
import requests
import beautifulsoup
import json
url = https://investopedia.com/link-to-currency-pairs
we’ll be using the popular currency conversion USD/EUR, but you can use any currency pair
Next, we need to find the appropriate data for our scraping. This will be a USD/EUR currency pair, but it doesn’t matter if you want to convert USD to GBP or USD to AUD. You could also use USD-to-CAD, etc., as long as it’s an available currency pair on Google Finance’s website.
I’ll use USD/EUR for this tutorial because it’s easy enough, and most people aren’t familiar with that exchange rate.
and now we’ll query the investopedia website
This will import the requests package, which we’ll use to query the Investopedia website. Next, we’ll import the beautifulsoup package so that we can parse the HTML data into a Python dictionary (data structure). Finally, let’s create an empty investment_object variable for our collection of dictionary objects. Now let’s query investopedia and grab some information from its homepage:
$ python scrape_ investing.py
Now that we’ve queried their website, let’s parse their html with beautiful soup
In this section, we’ll use the BeautifulSoup library to parse the HTML obtained by our web scraping script. We’ll also make use of pandas, a Python library that makes it easy to work with tabular data (like CSV files).
To get started, we’ll need to import all of our dependencies: requests and beautifulsoup4. Then, we can open up our html file in a browser and copy/paste it into a variable called soup.
Now we’ve got to find the appropriate table to use within the HTML
Now we’ve got to find the appropriate table to use within the HTML, which will be based on what currency data we need. If you’re looking for coins in USD, for example, search for any table containing pairs of cash (like EUR/GBP). If you want all of a country’s exchange rates or just their rate with USD, search for tables with both and an associated column with currency names.
Now that we’ve got the right table, let’s find all of our headers within that table
Now that we’ve got the correct table, let’s find all of our headers within that table. We’ll do this with a for loop.
We can use a for loop to go through each row in our currency_headings_table and check if the column name is already in our headers array. If it is, then we append new values to that array; otherwise, we create a new dictionary entry and add a key value pair:
currencies = []
for i in range(0,currency_headings_table[‘numRows’]):
if currency_headings_table[‘colName’] in headers:
headers[currency_headings_table[‘colName’]] += [i] #append list with i as an item
else:
entries = {‘colName’:i}
headers[currency_headings_table[‘colName’]] = entries
Conclusion
Now that we have our final_dict dictionary, we can convert the values into something more usable. We’ll be using pandas to do this. Then we’ll save the data as a csv file so that you can open it up in excel or google docs and use it however you want! Alternatively, you may of course use a exchange rates API, which will avoid you a lot of trouble and will only cost you a few dollars.