Content Extraction

Learn how to parse and extract content of the response data via JSONPath, XPath or regular expressions for subsequent requests.

6 minute read

Content Extraction refers to the ability to parse and extract content of the response data. This feature comes in handy for example if you need dynamically generated content for subsequent requests such as access tokens.

To extract content from a response you have to set the extraction option in the request options parameter. You can use more than one content extraction per request.

specify the desired type (jsonpath, xpath, regexp, header)
specify the names for referencing
specify the expressions

For example:

session.get("/tokens", {
  extraction: {
    jsonpath: {
      "accessToken": "authorization.token",
      "checksum": "authorization.checksum"
    }
  }
});

To use the extracted content in subsequent request you can do so with the getVar function.

session.get("/ping?token=" + session.getVar("accessToken"));

Note

Since a launched client will execute only one session and they don’t share anything, you cannot reuse extracted data from responses across sessions or clients.

JSONPath

JSONPath is used to find information in a JSON Object.

In your test case definition you can use JSONPath as follows.

Given this JSON response:

{
  "authorization": {
    "token": "s3cret-access-token"
  }
}

and this request options to extract the access token:

session.get("/tokens", {
  extraction: {
    jsonpath: {
      "accessToken": "$.authorization.token",
    }
  }
});

will make accessToken available as a variable to the current client within the same session:

session.get("/ping?token=" + session.getVar("accessToken"));

JSONPath Support

Note

We only support a subset of what JSONPath describes.

$. at the beginning to indicate the document root, is optional and implied.
We allow the dot-notation (attribute1.subattribute2.subattribute3) and array brackets for more complex fields (attribute1["ProductName with $"])
We allow array position access via brackets (attribute1[0])
We do not support script-expressions
You can use simple list filters in the form of $.something[?@.attribute==value].attribute
- attribute may be one or more attributes (e.g. attr1 or attr1.attr2.attr3)
- value can be a simple number or a quoted string
- see below for examples
If the result of your JSONPath expression is not true, false, a string or number, we will reencode the result into a JSON encoded string. Note that this does not preserve whitespaces or field order of the original input though.
If your expression does not match anything, you will either get an empty string ("") or an empty list ("[]") when using list filters. You can use conditionals to check for empty matches.

Example condition to check for an empty match when using list filters:
```
session.if(session.getVar("listResult"), "=", "[]", function(context) {
  /* listResult did not match */
});
```
Example condition for an empty match for normal jsonpath expressions:
```
session.if(session.getVar("accessToken"), "=", "", function(context) {
  /* accessToken did not match */
});
```

Here is another example for selecting elements from an array. Given this JSON document

{
  "products": [
    {
      "id": "8e7cd50a-ff44-4518-a02b-51d2dceaafb1",
      "type": "42",
      "name": "iPhone 6",
      "properties": {
        "color": "spacegray"
      },
      "available": true
    },
    {
      "id": "5e41653c-de17-4d6a-9616-5d80257d4b7e",
      "type": "23",
      "name": "iPhone 5",
      "properties": {
        "color": "white"
      },
      "available": false
    }
  ]
}

Accessing items by index
- $.products[0].id will match 8e7cd50a-ff44-4518-a02b-51d2dceaafb1
- $.products[1].id will match 5e41653c-de17-4d6a-9616-5d80257d4b7e
- $.products[3].id will result in an empty string
- $.products[random(length(@))].id will pick a random item from products and returns its id
Selecting by filter
- $.products[?@.type==23].name will match iPhone 5
- $.products[?@.type==42].properties.color will match spacegray
- $.products[?@.name=="iPhone 6"].type will match 42
- $.products[?@.name=="iAndroid 2099"].type will match [], as nothing matched the filter
- $.products[?@.available].name will return iPhone 6
- $.products[?@.available==false].name will return iPhone 5
- $.products[?@.available!=true].name will return iPhone 5

Note

JSONPath functions are currently in beta, so their usage is limited. Feel free to give feedback on more use-cases and ideas.

Inside the filters you can use the following functions:

length(element) - length() returns the length of the passed element. Normally this is used with @ to access to length of the current element in a filter.
random(int) - random() returns a random integer between zero (inclusive) and the provided parameter (exclusive). This can be used in combination with length() to a pick a random element from a list: $.products[random(length(@))].id.

XPath

XPath (the XML Path language) is a language for finding information in an XML document.

In your test case definition you can use XPath as follows.

Given this response:

<?xml version="1.0" encoding="UTF-8"?>
<users>

  <user>
    <firstname>Giada</firstname>
    <lastname>De Laurentiis</lastname>
    <email>giadalaurentiis@example.com</email>
  </user>

</users>

and this request options:

session.get("/tokens", {
  extraction: {
    xpath: {
      "email": "/users/user[1]/email"
    }
  }
});

Will make email available as a dynamic data source within the same session:

session.put("/user?email=" + session.getVar("email"));

Regular Expression

Regular expressions are used to find a matching string.

In your test case definition you can use regexp as follows.

Given this response:

<p>Welcome john.doe@example.com!</p>
<p>You can confirm your account email through the link below:</p>
<p>
  <a href="http://test/users/confirmation?confirmation_token=noXuMgKe/i5pPP4wdv5Kq&amp;locale=en">
    Confirm my account
  </a>
</p>

and this request options:

session.get("/data/test.html", {
  extraction: {
    regexp: {
      "confirmationToken": "confirmation_token=([\\w_\\/-]*)",
    }
  },
});

will make confirmationToken available as a dynamic data source within the same session:

session.get("/ping?token=" + session.getVar("confirmationToken"));

Note that you MUST have a match group within your regular expression - the first specified match group will be assigned to the variable.

Regular Expressions on Headers

Since the regexp extraction only applies to the response body, you can use the regexpheader extration to work with headers, e.g. to grab parts of a Link header.

session.get("https://example.com/api/", {
  extraction: {
    "regexpheader": {
      "docid": "Link: .*/id/(.*)",
    },
  },
});
session.assert("doc_present", session.getVar("docid"), "!=", "");

session.get("/doc/:docid", {
  params: {
    docid: session.getVar("docid"),
  }
});

Assuming the first request returns a Link: https://api.example.com/doc/id/4711 header, the regexpheader extraction would store 4711 into the variable docid.

Each regexpheader extraction is checked against each header in a random order and the first match is used.

HTTP Response Header

HTTP Response Header contain meta information of the response message.

In your test case definition you can extract HTTP response header field values using the request option header as follows.

Given this header response:

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 211
Connection: keep-alive
Status: 200 OK
Date: Fri, 12 Feb 2016 08:43:01 GMT
X-Powered-By: Phusion Passenger 5.0.23
Server: nginx/1.8.0 + Phusion Passenger 5.0.23

and this request options:

session.get("/tokens", {
  extraction: {
    header: {
      "serverHeader": "server"
    }
  },
});

Will make the HTTP response header Server with the value nginx/1.8.0 + Phusion Passenger 5.0.23 available as a dynamic data source named serverHeader:

session.get("/ping?server=" + session.getVar("serverHeader"));

Keep in mind that dynamic data sources are available in the same session only.

Similarly cookie extraction allows to extract cookie values:

session.get("/login", {
  extraction: {
    cookie: {
      "varAuthToken": "Api-Token"
    }
  }
})

This example copies the Api-Token cookie value from the response into the dynamic data source varAuthToken.

Body Extraction

Note

This feature is very costly and should be used sparingly.

You can also store the whole body in a variable:

session.get("/profile/me.html", {
  extraction: {
    body: {
      "varContentBody": true
    }
  }
})

This example will fill the variable varContentBody with the response body of /profile/me.html.

Last modified August 3, 2022

Content Extraction

Note

JSONPath

JSONPath Support

Note

Note

XPath

Regular Expression

Regular Expressions on Headers

HTTP Response Header

Cookie Extraction

Body Extraction

Note