IMPORTXML Function – Google Sheets

The IMPORTXML function brings XML data from the web into your Google Sheet. Once the XML comes into the spreadsheet, you can use it in other functions. Copy the spreadsheet with the examples into your Google Drive to follow along.

The XML language contains structured data. HTML, the language of the web, is a form of XML. Other XML formats include CSV, TSV, RSS, and ATOM feeds.

ℹ️ The XML data must be available to anyone and therefore cannot be protected by a username and password.

Syntax

=IMPORTXML(URL, XPath_query, [locale])

  • URL – Website address of the XML data
  • XPath_query – The location of the data within the website. This query uses the XPath language.
  • [locale] – Optional language or region code. If this is blank, the function uses the URL’s locale.

Related Functions

IMPORTRANGE – Import data from another Google Sheet

IMPORTHTML – Import an HTML table or list

IMPORTDATA – Import data from the web in the CSV or TSV format

IMPORTFEED – Import an RSS or ATOM feed

Errors

#N/ACould not fetch URL – The site with the data is blocking access. You cannot fix this problem inside Google Sheets.

Video Tutorial

Video thumbnail
YouTube Video about IMPORTXML

Examples

Example 1 – Use IMPORTXML to Import a List of Holidays

In this example, we will import an official list of federal holidays using a government site.

The IMPORTXML formula bring in holidays
List of Holidays

=IMPORTXML("https://www.opm.gov/policy-data-oversight/pay-leave/federal-holidays/#url=2022","/html[1]/body[1]/div[1]/div[1]/div[3]/form[2]/section[3]/div[1]/table[1]/tbody[1]/tr/td[1]")

The formula looks complicated. However, there are only two components. The URL and the XPath_query.

First, the source website is https://www.opm.gov/policy-data-oversight/pay-leave/federal-holidays/#url=2022

Secondly, for the URL, the trick is finding a site that won’t block the formula. The best way to tell is try the IMPORTXML function and hope you don’t get a Could not fetch URL error. Remember to surround the URL (and the XPath_query) with quotes.

ℹ️ In subsequent years, we should be able to change the year (2022) at the end of the URL, and the holidays will update automatically.

On the other hand, the XPath_query is more challenging to get than the URL. I used a chrome extension called Selectors hub. Once installed, I copied the XPath for the first column in the table of dates.

Using Selectors Hub to copy the XPath
Copying the XPath

Using this tool, we get an XPath of /html[1]/body[1]/div[1]/div[1]/div[3]/form[2]/section[3]/div[1]/table[1]/tbody[1]/tr/td[1]. You can read more about XPath here, but this code is telling Google Sheets to get the first column (td[1]) from each row (tr) in the table.

XML data coming into two columns
Data in Two Columns

The function placed the data from the column on the web into two columns in our spreadsheet. We’ll take this as a gift because this leaves the dates as valid numbers in the first column that we can use in other functions such as WORKDAY, WORKDAY.INTL, NETWORKDAYS, and NETWORKDAYS.INTL.

Example 2 – Use IMPORTXML to List All Links on a Website

Next, we will use IMPORTXML to pull the links from a page. For the best example, of course, we will use this website’s functions page at https://sheetshelp.com/functions/!

Sheetshelp page with links pointed out
Links on sheetshelp.com

As you can see, there are several links on the functions page. We will use the XPath of //a/@href, which grabs the a elements with their link text.

The output of the IMPORTXML function
Links Exported from sheethelp.com/functions

=IMPORTXML("https://sheetshelp.com/functions/","//a/@href")

The links include every link in the source code, including a link to #content. The #content link is for accessibility using a keyboard; you cannot see it without using the Tab key.

Conclusion

Clearly, the applications for IMPORTXML are endless, but the general technique is the same. Find a URL that works and grab the correct XPath. After doing that, you can bring XML into your sheet.

Leave a Comment