Could not Fetch URL in Google Spreadsheets

I have this template to track my stock portfolio (try it out!). As the crowd is getting numb to COVID-19 (You SHOULD NOT) and the market is going up, I bought a bit more and opened mine after quite a while. And then...

Problem

The column to calculate dividend is busted!

OK, Google

Well, the error is very obvious, it says Error: Could not fetch url: <some url>. Something is wrong with the given URL.

The dividend data I used is fetched from Yahoo finance, where the typical link looks like this: https://finance.yahoo.com/quote/DIS, in which the last is a stock symbol (DIS is the symbol for Disney, yeah!). Then I used IMPORTHTML as the built-in function in Google Spreadsheets to fetch the page, after a couple parse operations, I can clean the text and get dividend of each share from the table.

I tried to access the url via browser, which works perfectly fine! Hmm... Such werid, much confusing. Then I noticed as I clicked the summary tab of the page, the URL became https://finance.yahoo.com/quote/DIS?p=DIS (noticed it appended ?p=DIS).

Maybe this one would work? I replaced the formula with an extra ?p=<stock symbol> suffix. However, it still didn't work.

OK, Google. Time to do some search. Later I found some suggestion of changing https to http. Alright, mine is already https, what if I change from https to http instead? After I changed all the URLs from https to http, it worked. I also tried only change https to http in the original url, however, it doesn't work either. To be more specific:

The worked URL:
http://finance.yahoo.com/quote/DIS?p=DIS

Those didn't work:
http://finance.yahoo.com/quote/DIS
https://finance.yahoo.com/quote/DIS
https://finance.yahoo.com/quote/DIS?p=DIS

Is it? IMPORTXML is the Cure

From the experiment, looks like I should use http? Not exactly. The new URLs failed later in the afternoon that day. Besides, it was a solution I found while shooting in the dark, I didn't really understand it.

Then I found there is another API, called IMPORTXML (XML, not HTML here). I used to think it is for XMLs, turned out it supports XPath and can get content from a URL.

Solution

Awesome! Immediately I changed the formula to IMPORTXML and query the original URL, which is https://finance.yahoo.com/quote/DIS, with an XPath as //*[@id='quote-summary']/div[2]/table/tbody/tr[6]/td[2]. It means the 2nd column of the 6th row of the tbody of the table of the 2nd div that is in an element with ID as 'quote-summary' (read from the tail).

Indeed it Worked!

The conclusion here is, unless you are dealing with a simple page, always use IMPORTXML instead! My guess is IMPORTHTML doesn't handle complicate page very well. Besides, with XPath you can do a lot more fancy work.

If you have better solutions, please leave a comment.