Article
16 Nov
2023

Snowflake: Ensuring data security at scale

Snowflake positions itself as the optimal solution for organisations prioritising secure and scalable cloud data management. The security features of the Snowflake Cloud Data Platform, including encryption, access controls, and data masking, play a crucial role in safeguarding sensitive information during the digital revolution.
Andrew Kopera
|
6
min read
snowflake-ensuring-data-security-at-scale

Safeguarding data integrity and confidentiality amidst the digital revolution

It’s fair to say we’re living in a Data Era. 

Industries are generating an increasing volume of data every day, and with these increases comes both a desire and need to interrogate, analyse and report on this data. 

Which invariably means sharing and moving data around, which inherently introduces security risks. 

For any industry, data security should be paramount - no company wants to make the news for falling victim to a data breach. 

In today's world, we all are involved and work with a large volume of data: personal and business data such as financial information, intellectual property, confidential customer data, healthcare data, and more. Any data held digitally is potentially at risk. Safeguarding data is essential to protecting the integrity, security, and confidentiality of sensitive information, maintaining compliance with regulatory requirements, and ensuring business continuity. 

At The Virtual Forge we are proud to have achieved ISO 27001 accreditation. ISO 27001 is the world’s best known standard for information security management systems, and covers three principles of information security. 

The first is confidentiality, ensuring only the right people can access the information held by an organisation. The second principle is information integrity guaranteeing that data that the organisation uses to pursue its business or keeps safe for others is reliably stored and not erased or damaged. Lastly, the standard highlights availability, emphasising that the organisation and its clients can access the information whenever necessary such that business purposes and customer expectations are satisfied1

How can Snowflake help with data security

In recent years, Snowflake Cloud Data Platform has become an increasingly popular cloud-based data warehousing solution for businesses. One of the key reasons for this popularity is the platform's robust security features, which allow companies to store and analyse their sensitive data while also keeping it safe from unauthorised access. 

Implementing robust security measures like those provided within Snowflake, or utilising systems like Snowflake where these features are already built in, can help organisations mitigate security risks and prevent potential data breaches and losses.

Snowflake contains a number of security features including end to end encryption, role based access control, object level access control, multi-factor authentication, federated authentication, time based access control, compliance, data masking, and secure data sharing. 

This article will go into more detail on some of these features and demonstrate their usage. We’ll be covering loading data into Snowflake and data masking, encryption, multi-factor authentication and access controls.

Bringing Data into Snowflake

One important aspect of Snowflake's security is its ability to load external data into the platform in a secure manner. In the following section we will discuss how Snowflake handles external data loading and how it ensures the security of sensitive data.

When consuming data in Snowflake, organisations can hold the source data externally (in an existing cloud provider) or load the data internally to Snowflake. 

This is separate from loading data to tables in Snowflake, which is discussed further in a later section of this document. Using internal storage means that Snowflake will manage the storage infrastructure, so there’s no need to provision or manage cloud storage resources separately, such as AWS S3 buckets or Azure blob storage. 

When deciding between internal and external storage in Snowflake, the following factors should be considered:

  • Performance and Scalability: Snowflake's internal storage is optimised for its platform, and you may see better performance with internal storage. However, the performance difference may vary depending on your specific workload and data.
  • Data Location: If your data is already stored in an external cloud storage system, using external storage can eliminate the need for data migration and potential data duplication - Snowflake can query that external data directly as long as it is in a compatible format (json, csv, parquet, etc).
  • Cost: Snowflake's pricing model includes both storage and compute costs. With internal storage, Snowflake bundles storage costs along with compute costs, while with external storage, you may have separate costs for storage and Snowflake compute.
  • Integration: If you have existing tools, services, or processes that work directly with the external storage provider, using external storage can provide better overall integration across your solution.

Data Loading

When loading data into Snowflake, one option is to load the data in an encrypted format or to enable masking, discussed in more detail later on in this post. This allows businesses to take advantage of Snowflake's analytics capabilities, in the same way as when loading the data to Snowflake’s internal storage. 

However, this can be done while keeping sensitive data external to Snowflake. In cases like this, Snowflake acts as a secure gateway for the data, utilising industry standard transport level security protocols. 

This allows businesses to analyse the data without storing it in Snowflake's internal storage, addressing concerns about managing security and access across multiple copies of sensitive data across multiple systems. 

Snowflake IAM User

Utilising External tables enables querying existing data stored in external cloud storage for analysis without first loading it into Snowflake, with the source of truth for the data remaining in the external cloud storage. 

Let’s take an example of a healthcare company that wants to analyse medical data from a third party. The third party is willing to share the data but needs to ensure patient personal information is kept confidential. 

In this situation, the healthcare company could stage the data externally to Snowflake, keeping the source data in their chosen cloud storage platform (AWS, GCP, or Azure). This way the data can be analysed using Snowflake, directly accessing the external storage provider (with granular user and/or role level security on both sides of the transaction), or by loading only specific data fields into Snowflake's internal storage. 

With external storage, this means that if the third party has their data stored in a cloud service and stored in structured or semi structured files, you have the ability to directly query that data (Snowflake supports file types of CSV, XML, JSON, Parquet, Avro and ORC) before bring any data into tables in a database within Snowflake. 

Data Masking

Data that is loaded internally into Snowflake can also be ‘masked’. Data masking is a technique used to protect sensitive data by obscuring or replacing it with dummy data. This technique is commonly used to comply with data privacy regulations (such as HIPAA or GDPR) or simply to protect confidential information from unauthorised access. Data masking in Snowflake is a column level feature that makes use of definable policies to selectively obfuscate data in table columns and views. These policies can then be applied at either a user or role level.

Below is a quick example of how data masking can be applied. Let’s take a simple example of some basic patient data being stored in a database (using fabricated data).  In the example we have an administrative or otherwise specialised user, maybe on a production environment, who can query the patient data to be able to see the records. 

Patient data stored in a database

Now let’s assume we have developers, who should not have access to personally identifiable information. 

Masking policies are applied to specific data items in this database such that developer level database users can still query the data, but the specific values in columns of data considered to contain PII are obfuscated. Running exactly the same query as above, against the same data, will return the same rows with these data points masked.

Example of data masking applied

The data considered PII above in this example is masked but not encrypted - the underlying data is still in the Snowflake instance in plain text form, the values are just hidden from specific users or roles when queried by those individuals. Snowflake is also capable of encrypting data and authentication and access controls, adding an additional level of security.

Encryption

Snowflake encrypts all data, both at rest and in-transit. 

Data at rest (data stored in database tables/objects and internal data storage) is encrypted using  industry-standard AES-256 encryption. This is managed automatically and requires no management or configuration by the user - this means that all data stored in Snowflake is encrypted. Snowflake also uses SSL/TLS encryption to secure data in transit between the client and the server. This means when connecting to Snowflake, using Snowflake’s web UI, or through separate tools using standard open database connectivity, queries and commands issued or data returned are always encrypted. 

Additionally, Snowflake also provides client side encryption. Client side encryption means that a client encrypts data before copying it into a cloud storage staging area. The client side encryption follows the defined protocol of the cloud storage service. By using client side encryption, you retain control over the encryption keys and can ensure that sensitive data is encrypted before it reaches the Snowflake environment. This can be particularly beneficial when dealing with highly sensitive data or when you have specific compliance or security requirements. It's important to note that client side encryption in Snowflake is a client driven process, and you are responsible for managing the encryption and decryption operations outside of Snowflake. Snowflake itself does not have access to the encryption keys or perform any cryptographic operations on the client side data. 

The following summarises the client-side encryption protocol :

  1. The user establishes a confidential master key, collaborating with Snowflake in the process.
  2. Utilising the client interface offered by the cloud storage service, a randomly generated encryption key is created and employed to encrypt the file before transferring it to the cloud storage. Subsequently, the random encryption key undergoes encryption using the customer's master key.
  3. Following this process, both the encrypted file and the encrypted random key are transmitted to the cloud storage service. The encrypted random key is stored alongside the file's metadata.

Downloading and using the encrypted file is a multistep process. First, the encrypted file and encrypted key need to be downloaded. Then, the encrypted key can be decrypted using the customer’s master key.  Now that the key is decrypted, it can be  used to decrypt the encrypted file.  All of this encryption and decryption happens on the client side - neither the cloud provider nor the ISP will see the unencrypted data. Users are able to upload data encrypted on the client side using any tool or system supporting client side encryption.

Authentication

Snowflake supports multi-factor authentication (MFA) for all user accounts. 

This means users have to pass more than one login challenge in order to gain access to a Snowflake account (such as a password and confirmation via a mobile device app utilising push notifications or generating passcodes). Snowflake also supports other types of authentication such as federated authentication, single sign on, and key pair authentication, with single sign on supporting OAuth 2.0. External OAuth integrates the customer’s OAuth 2.0 server to provide a seamless SSO experience, enabling external client access to Snowflake. Snowflake supports external authorisation servers, custom clients, and partner applications including Okta, Microsoft Azure AD, Microsoft Power BI, and Sigma. Snowflake’s integration with External OAuth servers is cloud-agnostic. 

For the supported providers, the external authorisation process can be summarised in the following steps.

  1. Set up a trusted relationship by configuring both the external authorization server and the security integration within Snowflake.
  2. When a user endeavours to access Snowflake data using their business intelligence application, the application initiates user verification.
  3. Upon successful verification, the authorisation server dispatches a JSON Web Token (OAuth token) to the client application.
  4. The Snowflake driver transmits a connection string to Snowflake, incorporating the OAuth token.
  5. Snowflake authenticates the OAuth token to ensure its validity.
  6. Following token validation, Snowflake conducts a user lookup.
  7. Once verified, Snowflake establishes a session for the user, enabling access to data based on their designated role.

These features help to ensure only authorised users have access to the data stored in or accessible to Snowflake. 

Access Controls

Snowflake allows organisations to define access to their data at a very granular level. This can be defined using role based access as well as discretionary access controls. This means organisations can specify which users or groups of users can access certain data or database objects, as well as what actions users can perform (such as read only or full access), enabling organisations to ensure only fully authorised users can access sensitive data. Snowflake additionally employs object level access and time based access controls. Object level control means that permissions can be granted or denied on specific objects such as tables, views or columns. Time based access controls allows administrators to control when users can access data. This function is particularly useful for administering temporary users who require access to data for a specified limited time.

Below is a very quick demonstration of how to create a role to which users can be assigned. We’ll create a role, then give that role permission to use a warehouse.  The warehouse is a Snowflake compute engine which a user/role must have access to in order to execute sql statements. We will then give that role access to a database and a schema. 

1 - Create a new role called ‘developer’

CREATE ROLE DEVELOPER;

2 - Give that new role access to a Snowflake compute engine named ‘compute_wh’

GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE DEVELOPER;

3 - Allow the new role to use a database named ‘snowflake_projects’

GRANT USAGE ON DATABASE SNOWFLAKE_PROJECTS TO ROLE DEVELOPER;

4 - Allow the role to use the schema ‘demo’ and have the permissions to create new tables in that schema

GRANT USAGE, CREATE TABLE ON SCHEMA SNOWFLAKE_PROJECTS.DEMO TO ROLE DEVELOPER;

5 - check the grants that have been assigned to the role ‘developer’

SHOW GRANTS TO ROLE DEVELOPER;

The image below shows the results from step 5 and confirms the grants that have been assigned to the developer role.

Results from step 5, confirming  the grants that have been assigned to the developer role

Auditing

All of the topics covered so far discuss configuring Snowflake to prevent security breaches. Auditing is another essential aspect of data management and security. This usually involves tracking and recording access to data as well as changes made to the data to ensure they are authorised and comply with data protection regulations. Snowflake logs all user activity including user logins and query history. This history includes information such as who has queried the data, the time of execution, and the results obtained. Access history additionally records all user logins and logouts, including the IP address and device used for the access. 

Below are some screenshots showing Snowflake’s activity tracking. The first image is from the first screen on the Query History section. This screen shows the sql statements that have been executed. This screen will  show both DML and DDL statements that have been executed, the example below shows a simple read query.

Query History section

By clicking on the SQL Text shown, you are taken to more detailed information, including the results of the statement executed. 

Detailed view of SQL Text

Snowflake auditing is a powerful feature that enables organisations to track and monitor activity within their Snowflake data platform and to maintain a secure and compliant data environment. The detailed audit trail helps in identifying potential security breaches, ensuring regulatory compliance, and enhancing overall data governance. With Snowflake auditing you can gain insight into user behaviour, troubleshoot issues and maintain the integrity and security of your data. 

Final thoughts: summing up Snowflake’s security architecture

This technical article has provided an overview of Snowflake's cutting-edge security architecture and practices. Snowflake has demonstrated a relentless commitment to data protection, privacy, and compliance, making it a leading choice for organisations seeking a secure and scalable cloud data platform.

Throughout this post, we have delved into various security layers, including data encryption, access controls, authentication mechanisms, and auditing capabilities, highlighting how Snowflake leverages these elements to safeguard data against internal and external threats. We have also explored Snowflake's robust compliance framework, ensuring adherence to various industry standards and regulations, offering customers peace of mind in a complex regulatory landscape.

Snowflake's unique multi-cloud, multi-region architecture adds an additional layer of redundancy and resilience, reducing the risk of data loss or downtime. Its separation of compute and storage, coupled with an automatic scaling mechanism, ensures that performance and security are maintained without compromise.

As the data landscape continues to evolve the importance of secure, scalable, and compliant data management cannot be overstated. Snowflake's commitment to these principles, coupled with a persistent focus on innovation, positions it as a front-runner in the field of cloud data security.

In conclusion, Snowflake's comprehensive security features, commitment to compliance, and innovative approach to data management make it an ideal choice for organisations looking to navigate the complexities of the modern data ecosystem while keeping their data safe and accessible.

Thousands of customers deploy Snowflake Cloud Data Platform to derive all the insights from all their data by all their business users. Snowflake equips organisations with a single, integrated platform that offers the only data warehouse built for any cloud; instant, secure, and governed access to their entire network of data; and a core architecture to enable many other types of data workloads, such as developing modern data applications. Find out more here.

How can The Virtual Forge help?

If you are evaluating Snowflake as a possible data cloud platform, you are probably aware of some of the transformative advantages it can bring. Snowflake represents a contemporary solution to the evolving data challenges that have surfaced in recent decades due to the escalating volume of data and our growing dependence on it.

At The Virtual Forge, we are prepared to assist you in efficiently managing your data through the Snowflake Data Cloud platform. Whether your focus is on data migration, creation, curation, traditional warehousing, BI reporting, or implementing cutting-edge AI solutions, our expertise stands ready to aid you in your data-driven journey.

Our Most Recent Blog Posts

Discover our latest thoughts, tendencies, and breakthroughs in the realm of software development and data.

Swipe to View More

Get In Touch

Have a project in mind? No need to be shy, drop us a note and tell us how we can help realise your vision.

Please fill out this field.
Please fill out this field.
Please fill out this field.
Please fill out this field.

Thank you.

We've received your message and we'll get back to you as soon as possible.
Sorry, something went wrong while sending the form.
Please try again.