Formula to Retrieve Earnings Date

Crusader

Confused about dryer sheets
Joined
Jul 8, 2020
Messages
6
I am requesting help in putting together an “=ImportHTML” formula for Google Sheets to get the “Earnings Date” for ticker symbol EA (or any other valid ticker symbol) from CNBC (https://www.cnbc.com/quotes/?symbol=EA&qsearchterm=). In this case, I am trying to import the date “2020-07-30.”


The formula, as I understand it, will be similar to the one below:


=IMPORThtML("http://www.cnbc.com/quotes/?symbol=ea&qsearchterm="& A4,"//table/tbody[1]/tr[3]/td[1]")


My time and effort has yet to yield results. Suggestions will be greatly appreciated.
 
I am requesting help in putting together an “=ImportHTML” formula for Google Sheets to get the “Earnings Date” for ticker symbol EA (or any other valid ticker symbol) from CNBC (https://www.cnbc.com/quotes/?symbol=EA&qsearchterm=). In this case, I am trying to import the date “2020-07-30.”


The formula, as I understand it, will be similar to the one below:


=IMPORThtML("http://www.cnbc.com/quotes/?symbol=ea&qsearchterm="& A4,"//table/tbody[1]/tr[3]/td[1]")


My time and effort has yet to yield results. Suggestions will be greatly appreciated.

I'm no expert on this Sheets function, but based on a little reading and testing, it looks like your syntax should be more like (and this still won't work as you desire, but it does retrieve data from the website)...

=IMPORThtML("http://www.cnbc.com/quotes/?symbol=ea", "TABLE", 1)

"TABLE" must be explicitly used as a parameter assuming you wish to retrieve a table of data. If you wish to retrieve a list from the website you would use "LIST" as the parameter.

The third parameter must be a number. If you wish it to be a derived number, you would have to derive it first in a different cell of the spreadsheet and then reference the cell in the formula.

I found data from that website using tables 1 and 5 only. Tables 2, 3, and 4 appear to be empty ones. Anything above 5 don't seem to be valid tables on the page (they give an error).

I suspect they are doing fancy stuff on the website that makes it difficult/impossible to retrieve all the data on it. For example, I found reference that you can only retrieve data from how the webpage first displays on loading. It's possible that some of the data on that page is rendered in a secondary call to some data source.

Again, no expert here. But that's my two cents since no expert has yet to chime in with an answer.
 
This is not going to work the way you're trying to do it. If you select "view source" in your browser, you'll see that the actual data you want is in a JSON string like this:

var symbolInfo = {"symbol":"EA","curmktstatus":"POST_MKT","mappedSymbol":{"xsi:nil":"true"},"source":"Last NASDAQ LS, VOL From CTA","EventData":{"yrhiind":"N","announce_time":"A","next_earnings_date_today":"N","yrloind":"N","next_earnings_date":"2020-07-30","is_halted":"N"},...

When your browser loads the page, it executes some JavaScript code that parses that JSON string and inserts the data elements into the tables you see.

When Google Sheets pulls that page with IMPORTHTML, it gets the raw HTML and it doesn't run the Javascript to recreate the tables. So when you ask it to retrieve the table you want, it can only get the placeholders, which are empty.
 
I have forgotten everything I learned in that thread. Now I just beg Cathy61 to solve my problems. :)
 
Thank you everyone for support/suggestions.

I am totally with cathy63: pulling this information the "standard" way will not work - JSON technology has made sure of that. I looked at the link posted by target2019 for scraping similar data from Morningstar; converting the formula from "returnytd" to earnings date is way beyond my skill level - but I will look into it. Until then, my manual operation lives on.

I am open to using Morningstar or another reliable website to get the information I need.
 
Thank you, target2019.

I am not skilled enough to put together a complex importHTML formula; however, looks like I have a better chance of a formula with https://money.cnn.com.

Will share the formula - if I come up with one. Until then, please keep the suggestions coming.
 
Crusader,
I could not find the earnings date on those Morningstar pages.

However, this page (and Yahoo) have the date.
https://money.cnn.com/quote/quote.html?symb=EA

I am just not savvy enough to create the importHTML or importXML statement.

I can, however, easily get an error. LOL.

The Yahoo Finance page has the same problem as the CNBC page where the data is loaded from JSON. But, the CNN page works! Here's a function that returns the date.

=importxml("https://money.cnn.com/quote/quote.html?symb=EA","//*[@id='wsod_snapshotView']/div[5]/div[2]/div/table/tr[1]/td[2]")

I have no idea why CNN has the earnings report set for one day later than everyone else, since it appears the report will actually be after market close on 7/29, but hopefully Crusader doesn't need exact accuracy on this. Thanks for suggesting these other pages target2019.

(If copy/paste of the above text does not work, it may be due to your browser suppressing some characters when displaying code in forum posts, which has been a problem in the past. Refer to the image below for the exact function.)
 

Attachments

  • Annotation 2020-07-18 173511.jpg
    Annotation 2020-07-18 173511.jpg
    27.9 KB · Views: 27
Thank you and God Bless You, cathy63: your formula worked!

Also thanks to target2019 and others for suggestions and support.

I consider this post "resolved." Since I am not familiar with this forum, I don't know if I can mark it as such.
 
Thank you and God Bless You, cathy63: your formula worked!

Also thanks to target2019 and others for suggestions and support.

I consider this post "resolved." Since I am not familiar with this forum, I don't know if I can mark it as such.

You're welcome, I'm happy to have helped! This is more of a discussion forum, rather than a tech support site, so we don't have a "resolved" button or anything like that.

Now that you've found us, I hope you'll stick around, introduce yourself, and explore more of the discussions about early retirement.
 
This is a great way to end the weekend, after picking away at the various import functions in google sheets. I was able to add yet one more cathy63 function to my finance functions worksheet.

One thing I found was that browsers have an "Inspect" command. Right-click on the web page somewhere and you'll see it.

It can be frustrating looking through the developer tools. But I found the element in that source which cathy63 is referencing. Of course her function works, but I am confused by the following.

Her xpath (this is the 2nd part of her importXML function):
Code:
//*[@id='wsod_snapshotView']/div[5]/div[2]/div/table/tr[1]/td[2]
Brave browser - "copy xpath" while selecting the earnings date in the source:
Code:
//*[@id="wsod_snapshotView"]/div[4]/div[2]/div/table/tbody/tr[1]/td[2]
I get why a single quote is used instead of double quote.
I understand that tbody isn't necessary.

But what of the "div[4]"? The working function uses "div[5]"...
:facepalm:
 
This is a great way to end the weekend, after picking away at the various import functions in google sheets. I was able to add yet one more cathy63 function to my finance functions worksheet.

One thing I found was that browsers have an "Inspect" command. Right-click on the web page somewhere and you'll see it.

It can be frustrating looking through the developer tools. But I found the element in that source which cathy63 is referencing. Of course her function works, but I am confused by the following.

Her xpath (this is the 2nd part of her importXML function):
Code:
//*[@id='wsod_snapshotView']/div[5]/div[2]/div/table/tr[1]/td[2]
Brave browser - "copy xpath" while selecting the earnings date in the source:
Code:
//*[@id="wsod_snapshotView"]/div[4]/div[2]/div/table/tbody/tr[1]/td[2]
I get why a single quote is used instead of double quote.
I understand that tbody isn't necessary.

But what of the "div[4]"? The working function uses "div[5]"...
:facepalm:

I got stuck on this same issue yesterday when I was working on it. I used Chrome's Dev tools to grab the Xpath and I got the same one you did, with div[4], and that didn't work, which was puzzling because looking at the HTML it seemed correct.

To debug it, I tried this "//*[@id='wsod_snapshotView']/div" which returns all the divs under the selected id. Now I could see that the data I wanted was actually in the second column of row 5, so once I had that it wasn't too hard to build a working Xpath query with div[5] instead of [4].

My theory about why this happens is that the ImportXML function is actually executing on Google's servers, not in the browser that's running Google Sheets. So some Google server is sending a GET request to the server at https://money.cnn.com and I suspect the cnn.com server has some logic that formats the response a little differently for each browser type. The Google server probably doesn't identify itself as Chrome, so it doesn't get back exactly what we see in the dev tools window.
 
This also works.
Code:
=importxml("https://money.cnn.com/quote/quote.html?symb=EA","//*[@class='wsod_dataTable wsod_dataTableBig wsod_quoteFinancialsModule']/tr[1]/td[2]")
Have to be more careful using @class, though.

To continue with the question of why div[5] vs. div[4], I think it could be that once the selector (//) finds the string, the Google importXML function counts end tags (/div). But the 'copy xpath' feature in browser inspector is counting opening tags (div).

In any event, we are all good.
 
Damn, you guys are dragging me back into the Chrome Developer Tools again.
 
Now that you've found us, I hope you'll stick around, introduce yourself, and explore more of the discussions about early retirement.

Thank you cathy63; I plan to visit often and learn.
 
I found another working formula. Unfortunately I lost the web page that had a neat description of ETF info and how to fetch YTD performance from the site below.
Code:
=index(IMPORTXML("http://www.etf.com/AOR/table","//span[@class='cell col4 perfYtd']"),1,1)

You'll have to change out AOR for your ETF symbol. If you go to the web page source, it is easy enough to find other information in tables on the page.

One error I ran into was not seeing the capital 'Y' in that formula. So be careful with any changes you make.

Here's 1-year performance:
Code:
=index(IMPORTXML("http://www.etf.com/AOR/table","//span[@class='cell col5 perf1Yr']"),1,1)
 
I was "caught" sending a link in one of my posts; served time (3+ weeks). :facepalm: I am back, but I understand I am still under "suspicion" until I post X number of [clean] messages.

Hopefully, this is one of them: if it goes through, I will post more addressing the formula cathy63 so graciously shared.
 
I just skimmed this thread out of curiosity. This is basically "screen scraping," which is highly likely to break next time the web site tweeks its UI. So the solution that works today may not work next month or next year.

A better solution would be, if possible, to find a site that publishes an API and fetch the information that way. Still not guarantees that it will not break in the future but almost certainly not as frequently. You might want to muck around here: CRSP - The Center for Research in Security Prices | to see if the information is available and you can find a way to access it without spending a bunch of money.
 
Screen scraping refers to literally transferring the screen contents to a processing application.
https://www.techopedia.com/definition/16597/screen-scraping

The formula and methods in this thread are using API and built-in functions of Google Sheets, etc. The emphasis is on Free and DIY.

All data exchange methods break eventually, even costly subscriptions. There are really long threads on B*Heads about these matters, so there is enough interest to resolve the breaks and find a fix, or even another function.

"The Journey is the Reward" is how I think of this. Even though the web page source has become intricate, to say the least, I still tinker behind the scenes by viewing source and looking for information about what produced the viewed results.
 
Back
Top Bottom