🔍

Amazon Athena

AnalyticsInteractive SQL Queries on S3 Data

Athena is a serverless query engine that lets you ask SQL questions directly against files in S3. It provides an exploratory analysis layer without first loading that data into another store.

▶Architecture Diagram

📊 Data Flow

🪣S3 (데이터 레이크)

🔗Glue 카탈로그

🔍Athena

📋결과 (S3)

📊BI 도구

🔐IAM

Dashed line animations indicate the flow direction of data or requests

Why do you need it?

If logs and event files already live in S3 but every simple question still requires preparing a separate analytics cluster, the first step of analysis becomes heavier than it should be. When you cannot query files directly, even small investigations start late.

Why did this approach emerge?

In the past, even small queries against logs or data lakes required provisioning analysis servers. To reduce this setup cost, serverless analysis models like Athena, which query stored files as-is, emerged.

How does it work inside?

Athena scans S3 data directly, executes SQL queries, and stores results back in S3. Schema information references the Glue Data Catalog, and you can start querying without first building an ETL pipeline, with BI tools able to consume the results afterward.

Boundaries & Distinctions

Athena and Redshift both support SQL analytics, but the usage pattern is different. Athena is a serverless query engine that reads files directly, while Redshift is a warehouse where data is loaded for repeated analysis. If the key need is querying S3 files without loading them first, look at Athena; if the key need is repeated reporting on prepared data, look at Redshift.

When should you use it?

Well-suited for log exploration, ad-hoc data lake analysis, investigative SQL, and initial batch report validation where you need to ask questions directly over stored files. Not a good fit for repetitive heavy aggregations requiring fast performance.

Log analysisData lake queriesCost analysisAd-hoc analysis