January 2021 – Data Engineering and BI BLOG

HOW TO IMPLEMENT snowflake role-based access control (RBAC)?

Posted on January 19, 2021January 19, 2023 by tekweirdo in Uncategorized

In Snowflake, the role-based access control (RBAC) model allows you to grant and revoke access to specific objects and operations within the Snowflake environment.

To implement the role-back access model in Snowflake, you can follow these steps:

Create roles with the appropriate access privileges for each level of access you want to grant. For example, you can create roles for read-only access, data loading, and administration.
Assign users to the appropriate roles based on their job responsibilities and access needs.
Create objects such as databases, schemas, tables, and views, and assign appropriate privileges to the roles created in step 1.
Use the GRANT and REVOKE statements to assign or revoke access to specific objects and operations.

You can also use the Snowflake web interface to manage access and monitor access activity.

It’s important to test the changes you’ve made before applying it to the production environment and also to have a plan to rollback in case of any issues.

It’s also important to keep in mind that Snowflake also supports different types of access control mechanisms, such as: Object-level access control, Column-level access control, and Row-level access control, so you have to understand the use case and choose the appropriate one.

Snowflake interview questions

Posted on January 19, 2021January 19, 2023 by tekweirdo in Snowflake

What is Snowflake and how does it differ from other data warehousing solutions?
How does Snowflake handle data loading and ETL processes?
Can you explain the architecture of Snowflake and how it achieves scalability and performance?
How does Snowflake handle data security and compliance?
How does Snowflake handle data warehousing and analytics workloads?
Can you walk me through a real-world use case where Snowflake was used to solve a business problem?
How does Snowflake integrate with other tools and technologies in a data ecosystem?
How does Snowflake handle data governance and metadata management?
How does Snowflake handle data archiving and data retention?
Can you explain Snowflake’s pricing model and how it differs from other data warehousing solutions?

data warehouse modeling techniques

Posted on January 18, 2021January 18, 2023 by tekweirdo in data warehouse

There are several data warehouse modeling techniques that can be used to design and optimize a data warehouse:

Star Schema: A star schema organizes data into a central fact table and a set of dimension tables. The fact table contains the measures or facts of the data, such as sales or revenue, while the dimension tables contain the attributes or characteristics of the data, such as time or location. This approach provides a simple and intuitive structure that can be easily understood by business users.
Snowflake Schema: A snowflake schema is a variant of the star schema, where dimension tables are normalized to reduce data redundancy. This approach reduces the storage space required and improves data integrity, but it can make the structure more complex and harder to understand.
Third Normal Form (3NF): A 3NF data warehouse model is based on the principles of normalization, which is a process of organizing data into separate tables to eliminate data redundancy and improve data integrity. This approach provides a logical, consistent, and stable structure, but it can make the data warehouse more complex and harder to understand.
Data Vault: Data vault is a data modeling technique that uses a hub-and-spoke structure to store historical data in a central location, with links to satellite tables containing the attributes of the data. This approach provides a scalable and flexible structure that can handle large amounts of data and accommodate changes to the data over time.
Kimball Dimensional Modeling: The Kimball dimensional modeling approach is a widely adopted method for designing data warehouses and data marts. It is based on the principles of the star schema, but emphasizes the importance of modeling data at the lowest level of granularity and using the business process as the driving force behind the design.

These are some of the common techniques used for data warehouse modeling, depending on the specific requirements, nature of the data, the size and complexity of the data and the use case, different techniques can be combined or used in isolation to build an efficient data warehouse

designing an enterprise data lake on AWS S3

Posted on January 18, 2021January 18, 2023 by tekweirdo in AWS, AWS DATA

An AWS S3-based data lake is a popular method for storing and managing large amounts of structured and unstructured data in a centralized, cost-effective and scalable way. Here are some strategies that can be used when designing an enterprise data lake on AWS S3:

Data Ingestion: Implement a robust data ingestion strategy that can handle the volume, variety, and velocity of data being ingested into the data lake. This can include using services like AWS Glue, AWS Kinesis, and AWS Lambda to automate the data ingestion process, as well as implementing data validation and quality checks.
Data Storage: Use S3 storage classes to store different types of data in the data lake. For example, use S3 Standard for frequently accessed data, S3 Infrequent Access for data that is accessed less frequently, and S3 Glacier for archival data. This can help to optimize storage costs and performance.
Data Governance: Implement data governance policies and procedures to ensure that data in the data lake is accurate, consistent, and compliant with regulatory requirements. This can include using AWS Glue Data Catalog for metadata management, AWS Lake Formation for data lake governance and security, and AWS KMS for encryption.
Data Processing: Use AWS Glue, AWS EMR or AWS Lambda to process data in the data lake, and use AWS Glue Data Catalog to keep track of the data lineage.
Data Access: Use services like Amazon Athena, Amazon Redshift, and Amazon QuickSight to allow business users and analysts to access and analyze data in the data lake.
Data Backup and Archiving: Use AWS Glue, AWS EMR or AWS Lambda to process data in the data lake, and use AWS Glue Data Catalog to keep track of the data lineage.
Data Security: Use AWS IAM, AWS KMS and other security features provided by AWS to secure the data lake and ensure that only authorized users and applications have access to the data.

These strategies can help you to create a robust and scalable enterprise data lake on AWS S3 that can handle large amounts of data, while providing cost-effective storage, efficient data processing and governance, and secure data access.

Snowflake: Get list of all DB and schemas

Posted on January 17, 2021January 17, 2023 by tekweirdo in Snowflake

To get list of all Snowflake Databases:

show databases ;

To get list of all schemas in a database:

show schemas ;

But issue with above is we cant get list of all schemas in the whole account or from all databases.

To get all schemas in an account, you can do this (note that it only reports on what the current role has privileges on):

show schemas in account;

If you want to filter the results, you can use the result_scan immediately after running the show, which is a metadata query. It may look something like this:

select "database_name" as DATABASE_NAME
"name" as SCHEMA_NAME
from table(result_scan(last_query_id()))
where SCHEMA_NAME not in ('INFORMATION_SCHEMA') -- optional filter(s)
;