Secure design for storing logs in Azure – part 1

Time to revive this blog. It’s been waaay to inactive lately, while I’ve been busy on customer projects, and figuring out where my blog should go. A lot of what I previously blogged, is now in Azure documentation and a large number of blogs. I’ve decided to shift focus to describe the challenges I run into at work, and what solutions we come up with. So in short: Move away from theory, and onto real world solutions. Let’s go!

In the past few years logging and traceability has become more and more of a requirement from customers I work with. This is even more true when they’re doing Azure projects.

Over and over again, I’m asked if we can create a secured and truested space in Azure, where logs can be stored and made tamper proof.

In this post I will share my design for creating a secure area in your Azure environment, where you can place your logs. There are many ways of doing this, but this is my preferred proposal that can easily be adjusted to customer needs.

Why secure logs?

They’re just logs, why do we need to secure them? I’ve heard that question over and over again. The answer is simple: if your logs aren’t secure, you can’t trust the integrity, and can’t prove what happened in case of a leak, hack or misconfiguration.

Consider a hack. Someone gained access to a database of yours hosted in Azure, containing customer information protected under GDPR. GDPR requires that you inform customers in case of a leak. With Azure SQL Auditing you can store information about all queries that was executed on your database. You can see all the information you need to investigate the incident; timestamp, client ip, username, database, query. Without this information you wouldn’t know how the hacker connected to your database, and you wouldn’t know what data was leaked as part of the query.

The basics:

You will need the following to do this configuration:

  • Azure AD P2 license
  • A minimum of 2 Azure subscriptions

The Azure AD P2 license is for Azure AD PIM. You can obtain this through other licenses too, like EMS E5 and M365 E5.

The design

The way to good security it based on a good design. The below drawing shows the concept I’m basing my implementations on. Your design can be different from this, but the concept is the same: We need a locked down subscription, that no one has access to.

We do this by creating a seperate subscription just for our logs. This subscription is placed in a Management Group that is under strict control. No one should have permanent access to the Management Group, or the parent groups. Consider the design below: If someone has contributor access on the Root Management Group, they’ll inherit permissions to manage the locked down subscription.

To control access, you’ll need a good Identity Management tool – Azure AD is one, and with an Azure AD P2 license you gain access to Azure AD Privileged Identity Management, or just AAD PIM, which is exactly what we need here.

Secure Logs design

As you can see, I have two subscriptions in this example. One is for the production environment of a BI application, the other one is for logging data from the BI application. This way the BI department can deploy their application and resources, without having access to the IT subscriptions and resources, including the Secure Logs subscription.

That’s the management group and subscription layout, now comes the resources.

In the Secure Logs subscription, you will find a Log Analytics workspace, which will be used for storing logs. In the Prod subscription, a SQL Server with a database on is deployed, and Auditing is enabled.

Secure Logs design - resources

For Azure SQL, you can enable auditing on either the server level, og database level. If you enable it on the server level, all databases deployed on the server, will inherit these audit settings. If you enable on database level, you will need to configure it for each database individually. I prefer enabling server level auditing, and make sure no one has permissions to change these settings. You can do this by using the correct Azure RBAC roles, for the people who needs access to Azure SQL servers and databases.

To make sure you catch changes in configuration of auditing, you should enable Azure Activity Log collection too, and use Azure Policy to alert you of changes to the configuration.

All of the above configuration, will be described in the next two posts. Until that, stay secure!


Azure Backup – 24h limit for restores with MARS agent

I recently had to help a customer with a restore from Azure. They were hit by ransomware and got their file server encrypted. Not an issue, they had Azure Backup configured by doing a file backup of the full VM (vhdx files), so it could be restored. Then came this message:

Recovery volume is available till 31-01-2019 14:34:42. Mount time is extended to a maximum of 24 hours in case of an ongoing file-copy.

And as we feared, the restore job was interrupted and failed after exactly 24 hours. There are 2 ways to work around this issue:

Registry key

First workaround is adding a registry key to the server where you are restoring the file, with the following information:
Path: HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows Azure Backup\\Config\\CloudBackupProvider
Name: RecoveryJobTimeOut 
Value: Amount of hours you need for your recovery job to run, 36 for example

This is the recommended workaround. Please note that the dialog box will still tell you that you have a 24 hours limit, but this is not the case! It’s just a static warning in the client.

Pause file copy

Another option is to pause the file copy. It’s just a simple file copy from a source to a destination. If you pause it before the 24 hour limit and dismount of recovery volume, you can initiate a new restore and mount of the volume, and continue the file copy.

I don’t recommend doing this, since it’s easy to mess up and then you would have to start over.

The best solution

So these are just workarounds. The best solution is to build a backup solution that works as needed, and from my experience >24 hours of restore time is rarely accepted.
You can do estimates of how long recovery will take, since we know some of the numbers:

  • Bandwidth – how much bandwidth do you have? Azure is limited to ~60 Mbps when restoring, so even you have more bandwidth it won’t help.
  • Restore size – how much data do you need to restore?

In this situation we had about 620 GB to restore, and 100 Mbps internet. Due to the 60 Mbps limitation the restore time was around 25 hours, so we weren’t that far from the limit.
There were some lessons learned for the customer in this case, and we’re doing a reconfiguration of their backup now. The file server is no longer backed up on the VM level, but rather file level. This is much better for a server like this, where normally they would only need to restore a few files.