Drop #382 (2023-12-04): A HARd Day's Night

Record; Sanitize; Playback

Okta is a widely-used identity and access management (IAM) company that provides a combined software+service that helps companies manage and secure user authentication into modern applications, and for developers to build identity controls into applications, website web services, and devices.

Earlier this year, attackers gained access to Okta’s customer support management system and viewed files uploaded by certain Okta customers as part of recent support cases. These files were HTTP Archive (HAR) files, which company support personnel use to replicate customer browser activity during troubleshooting sessions. HAR files can contain sensitive data, including cookies and session tokens, that malicious actors can use to impersonate valid users.

This breach hurt quite a number of organizations, employees, and customers. It also put some focus on HAR files and caused a spurt of new development projects around them.

So, today, we have a super-focused Drop on some ways to record, sanitize, and playback/inspect these potentially catastrophically destructive bits of JSON.

Three goes at an effective summary failed to generate anything useful, so no TL;DR today.

HAR Files Redux

A HAR file is a (potentially) large blob of JSON that represents a comprehensive log of a web browser’s interaction with a site. It’s like a detailed diary of your browser’s conversation with the internet, recording everything from the initial hello to the final goodbye.

So, what does a HAR file do? It keeps track of each resource loaded by the browser, along with timing information for each resource. This includes the URL of each request, request and response headers, timings, and any error messages or status codes encountered during the communication. It’s a spiffy tool for debugging authentication issues, as it can identify where things get stuck. It’s like having a translator who can interpret the complex language of web interactions.

These files are primarily used for identifying performance issues, such as bottlenecks and slow load times, and page rendering problems. They can help web developers, site analysts, security teams, and compliance audits to analyze network traffic and site communications between a browser and web server. They can also be used to investigate security vulnerabilities, such as cross-site scripting (XSS). It’s like having a digital Columbo who can ferret the “bad guys” slowing down your website or causing security issues.

If you’re a web developer or a site analyst, HAR files can provide you with a fast, behind-the-scenes snapshot of how a site is performing, or to quickly identify where something may be off or need fine-tuning for better performance. If you’re into cybersecurity, HAR files can help you examine network traffic for security analysis, identifying any suspicious activity or potential site vulnerabilities.

I primarily use them to identify and reverse-engineer “hidden” APIs of various websites/apps I use.

Record

Safari, Chrome, and Firefox all support recording, saving, and importing HAR files. The section header shows the Safari Developer Tools “Network” tab view of me loading up my main site (https://rud.is). This will look very similar across all three major browsers. Tapping on any entry lets you see tons of detail, like this:

You can save out the entire session into a HAR file via the context menu:

As we noted back in March, some proxy servers — like the most excellent mitmproxy — can also export all the interceptions to a HAR file.

A HAR file captures a snapshot of the interactions between a web browser and a web server, which can include the following information:

  • Complete Request and Response Headers: This includes all data sent and received, such as method types (GET, POST, etc.), status codes, URLs, cookies, and more.

  • Payload Content: This refers to the actual data exchanged between the client and server, which can be crucial for diagnosing issues related to data submission or retrieval.

  • Timing Information: HAR files provide detailed timing breakdowns for each phase of the interaction, including DNS lookup, connection time, SSL handshake, and content download. This information can help identify performance bottlenecks.

To get the most out of the next section, record a HAR file of a browser session where you login to something. Just make sure you delete that file once you’re done playing.

Sanitize

The Okta breach gave us some new tools to help make these HAR files a bit safer than they are in their raw, initially saved form.

  • Cloudflare’s HAR Sanitizer (Introducing HAR Sanitizer: secure HAR sharing) launched a client-side-processing-only HAR sanitizer that looks for certain keywords to target for sanitizing. (It does what it says on the tin.)

  • gobeyondidentity/har-sanitize scans HAR files to identify potential session cookies that may be unsafe to share with third parties. By flagging these cookies, the program aims to prevent the inadvertent sharing of sensitive information. These are they keywords it looks for. (It does what it says on the tin, except the CLI binary name in the README is wrong. It should be har-sanitize, not sanitize_har.)

  • frontegg/harmor (landing page) provides either an interactive walkthrough to help you sanitize items on a very granular basis, or allows batch processing of HAR files with a “check template (which it can auto-create from an interactive session). Their landing page shows what it looks for, but the tool does not seem to have been tested well, and I can’t recommend relying on it in the current state.

Of all three, Cloudflare’s seems to be the most comprehensive:

but, har-sanitize is also safe to rely on as of the publishing date of this Drop.

Playback/Inspect

You can import HAR files back into browsers, and I highly suggest doing so with the one you sanitized (if you’re playing along at home) so you can validate the results.

The aforementioned mitmproxy also supports loading up of HAR files.

There are also many libraries for dealing with HAR files in Python, Go, Rust, JavaScript, and R (the main languages we tend to cover in these Drops):

  • Python: The haralyzer library provides a framework for analyzing web pages based on a HAR file. It contains classes like HarParser, HarPage, and HarEntry for representing a full file, a single page, and an entry in a page, respectively. Each HarEntry has a request and response that contains items such as the headers, status code, timings, etc.

  • Go: There are a couple of libraries available. Hargo is a library and command-line utility that parses HAR files, can convert them to curl format, and serve as a load test driver. Another library is har which is part of the Chrome DevTools Protocol and provides commands, types, and events for the HAR domain.

  • Rust: The har-rs library provides serialization and deserialization for HAR files. Another library, har-analyzer, is a tool to analyze HAR files, particularly useful for analyzing which domain/url is slow or blocked.

  • JavaScript: Har2Postman is a JavaScript library that converts HAR files into Postman collections, which can be useful for automatic HTTP requests and optionally including tests that assert the correct response from the URL called. @types/har-format provides a set of types that make it easy to [de]serialize HAR content. Both Cloudflare and HARmor use this, so poke there for example code.

  • R: The HARtools package provides tools and utilities to interact with HAR files. It allows you to read HAR files and interact with the resulting output. NOTE: This one is no longer on CRAN, but you can install it from GitHub.

They’re just heavily nested JSON files, though, so you can use anything that works with JSON to read them. Here’s a jq filter to pull out all the request URLs:

$ jq '.log.entries[].request.url' rud.is.har
"https://rud.is/"
"https://rsms.me/inter/inter.css"
"https://rud.is/css/main.css"
"https://rud.is/ukr-shield.png"
"https://rud.is/cap.svg"
"https://rud.is/js/main.js"
"https://rsms.me/inter/font-files/Inter-Regular.woff2?v=4.0"
"https://rsms.me/inter/font-files/Inter-Italic.woff2?v=4.0"
"https://rud.is/images/preloader.gif"
"https://rsms.me/inter/font-files/Inter-Medium.woff2?v=4.0"
"https://rsms.me/inter/font-files/Inter-Light.woff2?v=4.0"
"https://rsms.me/inter/font-files/Inter-ExtraLight.woff2?v=4.0"
"https://rsms.me/inter/font-files/Inter-Thin.woff2?v=4.0"
"https://rsms.me/inter/font-files/Inter-SemiBold.woff2?v=4.0"
"https://rsms.me/inter/font-files/Inter-Bold.woff2?v=4.0"
"https://rsms.me/inter/font-files/Inter-ExtraBold.woff2?v=4.0"
"https://rsms.me/inter/font-files/Inter-Black.woff2?v=4.0"

FIN

The internet is a very unsafe place, so please be extra careful when sharing any information over it with anyone. This goes double for HAR files. ☮️

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.