I have this template to track my stock portfolio (try it out!). As the crowd is getting numb to COVID-19 (You SHOULD NOT) and the market is going up, I bought a bit more and opened mine after quite a while. And then...
Problem
The column to calculate dividend is busted!
OK, Google
Well, the error is very obvious, it says Error: Could not fetch url: <some url>
. Something is wrong with the given URL.
The dividend data I used is fetched from Yahoo finance, where the typical link looks like this:
https://finance.yahoo.com/quote/DIS
, in which the last is a stock symbol (DIS is the symbol for Disney, yeah!). Then I used IMPORTHTML
as the built-in function in Google Spreadsheets to fetch the page, after a couple parse operations, I can clean the text and get dividend of each share from the table.
I tried to access the url via browser, which works perfectly fine! Hmm... Such werid, much confusing. Then I noticed as I clicked the summary tab of the page, the URL became https://finance.yahoo.com/quote/DIS?p=DIS
(noticed it appended ?p=DIS
).
Maybe this one would work? I replaced the formula with an extra ?p=<stock symbol>
suffix. However, it still didn't work.
OK, Google. Time to do some search. Later I found some suggestion of changing https to http. Alright, mine is already https, what if I change from https to http instead? After I changed all the URLs from https to http, it worked. I also tried only change https to http in the original url, however, it doesn't work either. To be more specific:
The worked URL:
http://finance.yahoo.com/quote/DIS?p=DIS
Those didn't work:
http://finance.yahoo.com/quote/DIS
https://finance.yahoo.com/quote/DIS
https://finance.yahoo.com/quote/DIS?p=DIS
Is it? IMPORTXML is the Cure
From the experiment, looks like I should use http? Not exactly. The new URLs failed later in the afternoon that day. Besides, it was a solution I found while shooting in the dark, I didn't really understand it.
Then I found there is another API, called IMPORTXML
(XML, not HTML here). I used to think it is for XMLs, turned out it supports XPath and can get content from a URL.
Solution
Awesome! Immediately I changed the formula to IMPORTXML
and query the original URL, which is https://finance.yahoo.com/quote/DIS
, with an XPath as //*[@id='quote-summary']/div[2]/table/tbody/tr[6]/td[2]
. It means the 2nd column of the 6th row of the tbody of the table of the 2nd div that is in an element with ID as 'quote-summary' (read from the tail).
Indeed it Worked!
The conclusion here is, unless you are dealing with a simple page, always use IMPORTXML
instead! My guess is IMPORTHTML
doesn't handle complicate page very well. Besides, with XPath you can do a lot more fancy work.
If you have better solutions, please leave a comment.