Secure auth for your URL's in logs

Deepjyoti Barman @deepjyoti30
Mar 2, 2022 • 3:01 PM UTC
Post cover

Everyone that writes program at a professional level uses some kind of logger to let the user (or perhaps the developer) know what exactly the program is doing. However, logs can contain a lot of things, sometimes, it can contain stuff like URL's or sensitive data like username, password etc.

Let's take an example scenario, if we use ElasticSearch with a client library, we provide it an ES URL to connect to and it doesn't exist, the logger will end up logging that the URL is dead.

So what do we do in such a case? We just write code to mask the authentication part from the URL ;-p

The problem

Let's dive a bit deeper into the issue. Let's say we have a logger running that logs all kinds of data. Now this logger, also, at times, logs URL's. Now this is fine if the logs are local and it's not exposed. However, in a lot of cases (mostly enterprise software), the logs are somehow backed up so it can be accessed by devs to debug or look at what's exactly happening under the hood.

Let's say one of our logger, logs the line: http://127.0.0.1:9200 is dead. This is fine, since we are just printing the URL. However, what if we pass an URL that contains sensitive data like auth in the URL itself? Something like http://foo:bar@127.0.0.1:9200 is dead.

If someone gets access to these logs, they would be pretty easily be able to access the URL since we are basically feeding them the username, password and the URL.

GitHub Actions actually takes this into consideration and if there's any thing like an URL containing any auth details, it automatically masks that.

How do we fix it

We can have a few approaches to mask this auth details from the URL. We can do one of the following:

  1. Remove auth part altogether
  2. Replace auth with ***

Before we start

We need to check if a string is even an URL. There are some nice answers on this stackoverflow question on how to do that. For now, we will stick to a simpler option to see if a string is URL.

In environments like codebases where URL's are passed from envs, it's very unlikely to get complex URL's so a simpler regex won't be bad in any way.

The following code checks if a string is an URL in a naive way:

// Check if URL
isURL, _ := regexp.MatchString(`^https?://(www.)?.+\..+$`, passedString)

// isURL will indicate if the string is an URL

So the regex is:

^https?://(www.)?.+\..+$

Remove auth part

This is simpler to do, in the sense that the regex would be pretty simple in this case. We just check the string to see if we have an URL and accordingly we match and replace the auth part from it.

Let's say we have a string http://foo:bar@127.0.0.1:9200, the following code converts it to: http://127.0.0.1:9200

re := regexp.MustCompile(`//.+:.+@`)
cleanedString := re.ReplaceAllString(passedString, "//")

The regex for removing username and password from an URL (if it's passed) is:

//.+:.+@

Using the above regex, we can match the URL to see if there is an username after the // and before the @. Then we can replace this pattern with //.

Replace auth part with stars

Let's take our regex a bit farther and replace the auth part instead of removing it. This can be done pretty easily, we just need to use the above code and just change the replace string.

re := regexp.MustCompile(`//.+:.+@`)
cleanedString := re.ReplaceAllString(passedString, "//***:***@")

Above regex will replace the auth part with //***:***@ which would mean:

http://foo:bar@127.0.0.1:9200 -> http://***:***@127.0.0.1:9200

Replace password from URL and keep username

Now, this is my favorite. How about we keep the username in the URL and just replace the password? That can be done obviously but more importantly, it makes the whole URL look better plus it's easier for the viewer to know which URL we are refering to (in case there are more than one in the environment).

So, what we do here is, use regex substitution.

Now, what are substitutions exactly? We define a pattern inside the pattern we wrote and we say that this nested pattern should be named something (like var) and then during replacing the patter, we can just say that use var and we basically reuse a variable defined during the match step.

Let's understand it better with some code.

re := regexp.MustCompile(`\/\/(?P<username>.+):.+@`)
cleanedVar := re.ReplaceAllString(stringedVar, "//${username}:***@")

So in the above code, we are saying that the pattern .+ should be named username and then we reference it in the replaced string using ${username} which basically means we are putting the variable back into the string according to our need.

In the above case, the URL change will be:

http://foo:bar@127.0.0.1:9200 -> http://foo:***@127.0.0.1:9200

So basically we show the URL like http://foo:***@127.0.0.1:9200. Isn't that neat?

Regex substitutions are a gem and everyone should absolutely know how it's used in order to do magic in one line ;-)

Discussion