Screen scraping function for return YTD?

You could park the xml query off in the side of your spreadsheet and then make a reference to the cell where the TTM Yield is displayed and it will just pull that skipping the text loaded to the cell below (or above).
 
Last edited:
This is the same problem as before where there are multiple HTML elements in the page that meet the query criteria:
"//table/tbody[1]/tr[1]/td[1]/span".

If you look at the page source, you will see that these tables are inside of divs, so you could try putting a div[x] specification in front of the table spec, but you can also see that there are multiple nested divs on the page above the table you want to target, so you'd have to count all those in order to figure out what x should be. That would work, but it's fragile because I don't think we can trust Morningstar not to rearrange things and add or remove divs whenever they want to. So let's see if we can find a more robust way to get this data.

This is the HTML for the number you want to extract:
<span vkey="ttmYield" class="gr_text1">1.84%</span>

That vkey attribute looks promising because it seems to be a field name that describes the data in the span. Using Find shows that there are no other occurrences of vkey="ttmYield" on the page. So great, now we have a unique identifier for this number and all we need is an xpath query on the attribute. Fortunately that's pretty easy:

//span[@vkey='ttmYield']

Now you will only have trouble if Morningstar renames this data, which seems unlikely. Though of course they could decide to stop displaying it altogether as target2019 pointed out. If it's only available on a page that requires login, then Google's IMPORTXML function won't work.
 
//span[@vkey='ttmYield']

.
Once again Cathy workled it out. worked into the query I get the following for VTSAX which does indeed return the value we want:

=IMPORTXML("https://quotes.morningstar.com/fundq/c-header?&t=XNAS:VTSAX&region=usa&culture=en-US&version=RET&cur=&test=QuoteiFrame", "//table/tbody[1]/tr[1]/td[1]//span[@vkey='ttmYield']")

If you can give us the path to similar data on the VTI table maybe we can solve for ETFs as well. Here is the URL for the VTI table. The element we want is Yield. No vkey elements
 
Last edited:
=IMPORTXML("https://quotes.morningstar.com/fundq/c-header?&t=XNAS:VTSAX&region=usa&culture=en-US&version=RET&cur=&test=QuoteiFrame", "//table/tbody[1]/tr[1]/td[1]//span[@vkey='ttmYield']")

Actually you don't need "//table/tbody[1]/tr[1]/td[1]" in the query for this page. Just "//span[@vkey='ttmYield']" is sufficient. It's better to just use the shorter version in case the page is ever reformatted in the future.

I'll look at the VTI page and let you know what I find. I don't mind parsing the HTML as long as you find the pages that have the data. That's the part I find tedious.
 
Here's the one for VTI:

=IMPORTXML("https://etfs.morningstar.com/etfq/quote-banner?&t=ARCX:VTI&region=usa&culture=en-US&version=RET&cur=&test=QuoteiFrame
&e=eyJlbmMiOiJBMTI4R0NNIiwiYWxnIjoiUlNBLU9BRVAifQ.P4uPJX5PajdOYbzhWxMOJrjFXS4aNrDDONDV9_utUuEQy1ki5DGw62tS5wgj56
ELPTBEAzRWWJ9StootAmhSB5QtjpDW6A08N7QaZPcJ4eJ-LsgOOnhsOs0hC6TEDE8F2EcxJwlDt3H-aydYN2KmEKK7MyFe6QyvO4A4Kla5CPE.1rFZvFBp7pG9SYjZ.6lTz5KXYpz0EdOsQ9PxTy3gCYcdap1wx9tLP6t7AVnLs7H5_M0hMsEnn98sn
A85K6W8h7Jz4gIqg0gH4PivqRDKQ0ZSye4J6Q6AIIIvORV4FptwDY5ysw6dTkuKTK9Qe6MHQzL2YiTQN6436rep2pZqFrHlRCTRs_rmbtGaW0
QGQGIv5lMoc6JS8YAHPxv5_TY390HeoRnTLnlldzJhFQgVyDmvW0_g5K832EdE.v3u2HDkDsQLIxias-8R_fQ&_=1542989580222", "//span[@id='Yield']")

I added some line breaks in here because the length of the URL caused formatting problems on the forum page. If it doesn't work in Sheets, just delete the breaks in the URL.
 
Last edited:
I keep getting "#N/A" Imported xml content cannot be parsed.

This just seems to be too big a problem....and looking for the breaks is giving me a headache!! :LOL:
 
Here's the one for VTI:

=IMPORTXML("https://etfs.morningstar.com/etfq/quote-banner?&t=ARCX:VTI&region=usa&culture=en-US&version=RET&cur=&test=QuoteiFrame
&e=eyJlbmMiOiJBMTI4R0NNIiwiYWxnIjoiUlNBLU9BRVAifQ.P4uPJX5PajdOYbzhWxMOJrjFXS4aNrDDONDV9_utUuEQy1ki5DGw62tS5wgj56
ELPTBEAzRWWJ9StootAmhSB5QtjpDW6A08N7QaZPcJ4eJ-LsgOOnhsOs0hC6TEDE8F2EcxJwlDt3H-aydYN2KmEKK7MyFe6QyvO4A4Kla5CPE.1rFZvFBp7pG9SYjZ.6lTz5KXYpz0EdOsQ9PxTy3gCYcdap1wx9tLP6t7AVnLs7H5_M0hMsEnn98sn
A85K6W8h7Jz4gIqg0gH4PivqRDKQ0ZSye4J6Q6AIIIvORV4FptwDY5ysw6dTkuKTK9Qe6MHQzL2YiTQN6436rep2pZqFrHlRCTRs_rmbtGaW0
QGQGIv5lMoc6JS8YAHPxv5_TY390HeoRnTLnlldzJhFQgVyDmvW0_g5K832EdE.v3u2HDkDsQLIxias-8R_fQ&_=1542989580222", "//span[@id='Yield']")

I added some line breaks in here because the length of the URL caused formatting problems on the forum page. If it doesn't work in Sheets, just delete the breaks in the URL.
Damn, I got close. I wasn't sure of what your @ was doing but guessed it was a pointer to the quoted entry. I looked at the source and got down to [@id='Yield'] but it came back empty. When I just eliminate the table stuff it works fine.

Thank you. My sole experience with HTML was hand writing it for a home page circa 1994. I need a very simple page to make sense of it. Even these reduced M* table pages confuse me.
 
Marko. I had to put Cathy's code in Notepad+ and eliminate the page breaks. I will paste the code below and see if the forum accepts it. For VTI copied directly from my cell:

=IMPORTXML("https://etfs.morningstar.com/etfq/quote-banner?&t=ARCX:VTI&region=usa&culture=en-US&version=RET&cur=&test=QuoteiFrame&e=eyJlbmMiOiJBMTI4R0NNIiwiYWxnIjoiUlNBLU9BRVAifQ.P4uPJX5PajdOYbzhWxMOJrjFXS4aNrDDONDV9_utUuEQy1ki5DGw62tS5wgj56ELPTBEAzRWWJ9StootAmhSB5QtjpDW6A08N7QaZPcJ4eJ-LsgOOnhsOs0hC6TEDE8F2EcxJwlDt3H-aydYN2KmEKK7MyFe6QyvO4A4Kla5CPE.1rFZvFBp7pG9SYjZ.6lTz5KXYpz0EdOsQ9PxTy3gCYcdap1wx9tLP6t7AVnLs7H5_M0hMsEnn98snA85K6W8h7Jz4gIqg0gH4PivqRDKQ0ZSye4J6Q6AIIIvORV4FptwDY5ysw6dTkuKTK9Qe6MHQzL2YiTQN6436rep2pZqFrHlRCTRs_rmbtGaW0QGQGIv5lMoc6JS8YAHPxv5_TY390HeoRnTLnlldzJhFQgVyDmvW0_g5K832EdE.v3u2HDkDsQLIxias-8R_fQ&_=1542989580222", "//span[@id='Yield']")

If it works for you and you want it for another ticker, post the ticker and I will see if I can do it.
 
I keep getting "#N/A" Imported xml content cannot be parsed.

This just seems to be too big a problem....and looking for the breaks is giving me a headache!! :LOL:

@Marko -- I made a sheet that has all of the links I contributed to this thread: ImportXML Tests

It should be easier to copy/paste from there.
 
Marko. I had to put Cathy's code in Notepad+ and eliminate the page breaks. I will paste the code below and see if the forum accepts it. For VTI copied directly from my cell:

If it works for you and you want it for another ticker, post the ticker and I will see if I can do it.

It works!! But apparently only for ETFs, not regular funds.
Please! Don't spend/waste another minute on this....
 
It works!! But apparently only for ETFs, not regular funds.
Please! Don't spend/waste another minute on this....
One of the earlier ones works for mutual funds. You can see it on Cathy’s sheet. St least for Vanguard you can just change the ticker symbol. ETFs have custom links. By the way the Google Finance function for TTM works fine for funds but not for ETFs.

Playing around with this is fun.
 
I experimented a bit with those extra long URLs. It seems like everything after the ticker symbol is just controlling the output format. We don't care about fonts and whether tables have grids and whatnot, so we can just delete all that stuff.

I updated my spreadsheet to use these URLs instead of the long ones and concatenate in the exchange and symbol:

-http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?&t=XNAS:VTIAX
-https://quotes.morningstar.com/fundq/c-header?&t=XNAS:VTSAX
-https://etfs.morningstar.com/etfq/quote-banner?&t=ARCX:VTI

Sheet Here
 
Last edited:
Thanks Cathy. Much nicer. I checked a couple of mutual funds and all I have checked appear to use XNAS exchange. ETFs vary on the exchanges so you have to inspect the page with Chrome Development to find out where they are going. But with your shorter URLs it is very simple to format the queries.

Well, it's been fun but I can't imagine what use I will be able to make of this hard won knowledge other than these few functions. Back to cycling and books.
 
Thanks Cathy. Much nicer. I checked a couple of mutual funds and all I have checked appear to use XNAS exchange. ETFs vary on the exchanges so you have to inspect the page with Chrome Development to find out where they are going. But with your shorter URLs it is very simple to format the queries.

Well, it's been fun but I can't imagine what use I will be able to make of this hard won knowledge other than these few functions. Back to cycling and books.

Don/Cathy, thanks for all of this!
 
Thanks for this topic. I've been struggling with keeping YTD percentages up when using the google function.

I put this formula in and it works!... most of the time. Well, at least one mutual fund I have seems to return something drastically different that actual.

The fund is Fidelity® Contrafund® Fund Class K (FCNKX). When I just google the fund letters the YTD is 5.55%. When using the formula it returns .73%.

The formula I'm using is =IMPORTXML(concatenate("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?&t=XNAS:",B7,"&Region=usa&culture=en-US&ops=clear&cur=&s=0P00001MJB&ndec=2&ep=true&ali g n=d&annlz=true&comparisonRemove=false&loccat=&taxa dj=&benchmarkSecId=&benchmarktype="), "//table/tbody[1]/tr[1]/td[5]")

Where b7 is FCNKX.

Am I missing something? I saw the latter post regarding exchanges, but when I used that formula I got nothing.

Thanks,

Bob D
 
Thanks for this topic. I've been struggling with keeping YTD percentages up when using the google function.

I put this formula in and it works!... most of the time. Well, at least one mutual fund I have seems to return something drastically different that actual.

The fund is Fidelity® Contrafund® Fund Class K (FCNKX). When I just google the fund letters the YTD is 5.55%. When using the formula it returns .73%.

The formula I'm using is =IMPORTXML(concatenate("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?&t=XNAS:",B7,"&Region=usa&culture=en-US&ops=clear&cur=&s=0P00001MJB&ndec=2&ep=true&ali g n=d&annlz=true&comparisonRemove=false&loccat=&taxa dj=&benchmarkSecId=&benchmarktype="), "//table/tbody[1]/tr[1]/td[5]")

Where b7 is FCNKX.

Am I missing something? I saw the latter post regarding exchanges, but when I used that formula I got nothing.

Thanks,

Bob D

.73% is as of 11/21/2018.
5.55% is as of 10/31/2018.
16.91% is as of 9/30/2018.

For the current day, simplify the formula to:
=IMPORTXML(concatenate("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?&t=XNAS:",B7), "//table/tbody[1]/tr[1]/td[5]")

If you want the data as of the end of previous month, 10/31/2018, use this:
=IMPORTXML(concatenate("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?align=m&t=XNAS:",B7), "//table/tbody[1]/tr[1]/td[5]")

For end of previous quarter, 9/30/2018, it's this:
=IMPORTXML(concatenate("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?align=q&t=XNAS:",B7), "//table/tbody[1]/tr[1]/td[5]")
 
Thanks Cathy. I am mostly looking for current YTD, so I guess as of the day (or day before) that I am running it.

Thanks for checking. Wow, I checked the link below as well as yahoo finance and they both have 5.48% YTD, which has changed from my earlier post, so I guess it has updated for todays short trading day.

1. https://www.google.com/search?rlz=1..........0i71j35i39j0i67j0i131j0i10.IJzuSLvqyrM

-OR-

2. https://finance.yahoo.com/quote/FCNTX/

But how misleading can it get! I went directly to Fidelity, logged in and checked and it states .68% YTD, which lines up with your formula results! I trust Fidelity, and now your formula, for the most accurate YTD numbers.

Thanks for your follow up. I am now more confident in the numbers and will incorporate them into my google sheet.

Bob D
 
.73% is as of 11/21/2018.
5.55% is as of 10/31/2018.
16.91% is as of 9/30/2018.

For the current day, simplify the formula to:
=IMPORTXML(concatenate("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?&t=XNAS:",B7), "//table/tbody[1]/tr[1]/td[5]")


OK Cathy explain this one to me. When I look at the above formula on the original post and when I copy it to my spreadsheet it doesn't work. That is because I don't see or copy the close parenthesis, comma, quote sign and first forward slash in front of table. What I see and copy is /table... instead of ), "//table... I kept getting an error in my spreadsheet. I corrected that in Notepad+ and it works fine in the spreadsheet

But, when I quote your post in the forum (the screen I am on now) the formula shows properly.
Here is a screen capture of what I see and copy in the original post:
 

Attachments

  • CathySnip.JPG
    CathySnip.JPG
    26.7 KB · Views: 9
Last edited:
donheff,
1) There may be hidden characters. If you copy and paste into Word, and show hidden characters, you'll see it if it is there.
2) There are two editors in use on E-R. One or both has the potential to take your input and transform it to something else.
3) Backslash character is tricky. Double backslash can trigger a security input filter?
 
Well, it is even weirder than that. In my reply as I see it now the ),"/ are not present in the quote box. But when I edited the reply to add my version as follows and posted the edited reply, Cathy's ),"/ appeared in the quote box when posted. When I re-edited and pulled out my copy of the formula her "/ disappeared again. Go figure.

Edit: Now I see that it doesn't show up in the quote box even though it shows up in the preview screen. Here is my version:

=IMPORTXML(concatenate("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?&t=XNAS:",B7), "//table/tbody[1]/tr[1]/td[5]")
 
Last edited:
donheff,
1) There may be hidden characters. If you copy and paste into Word, and show hidden characters, you'll see it if it is there.
2) There are two editors in use on E-R. One or both has the potential to take your input and transform it to something else.
3) Backslash character is tricky. Double backslash can trigger a security input filter?
Maybe hidden characters explains the problem I had editing in the spreadsheet. When I tried to enter the correct characters as soon as I hit return the formula changed from what I was trying to put in and I got errors. When I edited the formula in Notepad+ and pasted it in it worked fine.

Edit: I now see that my "corrected" displays improperly also.

Edit: Here is a version without the concatenate. Lets see how it displays:

=IMPORTXML("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?&t=XNAS:FCNKX", "//table/tbody[1]/tr[1]/td[5]")

Edit: that displays properly. It looks like the '),"/' sequence is causing the problem.
 
Last edited:
Your browser is on the lookout for possible security breach. Google docs, same.

Quote marks can be a problem too. Sometimes you have a smart quote, and it will cause a problem in functions.
 
I don't know what to tell you, because all the posts where you pasted in the formula or it's in a reply quote look fine to me. The only one where I can see a problem is the screenshot image. I'm using the latest Chrome on a Windows 10 laptop, and for most of these formulas I just typed them in the forum editor. A few I copied/pasted from Google Sheets.

I'd check your browser and see if you have any add-ons installed that might mess with the HTML. I'm thinking of something like the Skype add-on that tries to detect phone numbers in web pages and turn them into buttons that will start a call. Or the Pinterest add-on that sticks a "save to Pinterest" icon on every image. Or look at this thread in a different browser and see if that changes things.

Just for testing:
- here is a right-paren comma double-quote forward-slash forward-slash: ),"//
- here is a right-paren comma double-quote forward-slash: ),"/
- here is a right-paren comma double-quote: ),"
- here is a right-paren comma: ),
 
It is clearly something in my browser, although I too am using Chrome and don't have any extensions that it would seem could cause the problem. It is something in the security setting since I get a Not Secure warning when I load the page. The characters are obscured in Chrome (for me) and in Edge but are not obscured in an old copy of Firefox that probably is not up to date.

Incidentally, my Chrome doesn't obscure the characters in your just testing post so the problem is somehow involved in the context of the original functions - probably the association with the overall URL and the potential for some nefarious hack from those characters.
 
Back
Top Bottom