728x90

 

 

๋ฐฐ๊ฒฝ

 

Distilling the Knowledge in a Neural Network

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome

arxiv.org

 

๋จธ์‹ ๋Ÿฌ๋‹ ๋””์ž์ธํŒจํ„ด์„ ์ฝ๋˜ ์ค‘ 4์žฅ distilling ์— ๊ด€ํ•œ ๊ธฐ๋ฒ•์ด ๋‚˜์™€ ์ฐพ์•„๋ณด๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

 

์š”์•ฝ

์ง€๊ธˆ๊นŒ์ง€์œผ ๋Œ€๊ทœ๋ชจ ๋จธ์‹ ๋Ÿฌ๋‹ ์‹œ์Šคํ…œ์€ ํ•™์Šต๊ณผ ๋ฐฐํฌ๋‹จ๊ณ„์—์„œ ๊ฐ™์œผ๋А ๋ชจ๋ธ์„ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ ์ด๋•Œ๋ฌธ์— ์ถ”๋ก  ๋ ˆ๋ฒจ์—์„œ ๋ฆฌ์†Œ์Šค๊ฐ€ ์ปค์ง„๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๊ฑฐ๋Œ€ํ•œ ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ ์ง€์‹์„ ํ•˜๋‚˜์˜ ์ž‘์€ ๋ชจ๋ธ๋กœ ์ „์ดํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์ด ์ œ์•ฝ์„ ๊ทน๋ณตํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด “์ฆ๋ฅ˜” ์˜ ํ‘œํ˜„์ž…๋‹ˆ๋‹ค.

๊ธฐ์กด์˜ hard label ์€ [1,0,0] ์ฒ˜๋Ÿผ ์ •ํ™•ํ•œ ํ™•๋ฅ ์„ ์•Œ๋ ค์ฃผ์—ˆ๋Š”๋ฐ, ์‹ค์ œ๋กœ๋Š” ๊ณ ์–‘์ด๋ฅผ ๋‹ฎ์€ ๊ฐœ๊ฐ€ ์žˆ์„์ˆ˜๋„ ์žˆ์œผ๋‹ˆ [0.6,0.4,0] ๊ฐ™์€ label ๋„ ์˜๋ฏธ์žˆ๋Š” ์ง€์‹์ผ ์ˆ˜ ์žˆ๋‹ค๋Š”๊ฒƒ์ด ์•„์ด๋””์–ด ์ž…๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ์ •๋ฆฌํ•˜์ž๋ฉด ๊ฑฐ๋Œ€ํ•œ ๋ชจ๋ธ์—์„œ ์‚ฐ์ถœ๋œ ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด์˜ ๊ฐ’๋“ค์„ ํ•™์Šต์— ํ™œ์šฉํ•˜๋Š” soft target ์œผ๋กœ ํ™œ์šฉํ•˜์ž๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Distillation

๊ธฐ์กด์˜ ์‹ ๊ฒฝ๋ง์€ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•  ๋•Œ output layer ์— softmax ๋ฅผ ์ทจํ•ด logit ์„ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์ผ๋ฐ˜์ ์œผ๋กœ T ๋Š” 1๋กœ ์„ธํŒ…๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. T๋ฅผ ๋†’๊ฒŒ ํ• ์ˆ˜๋ก ํด๋ž˜์Šค ํ™•๋ฅ ๊ฐ’์ด soft ํ•˜๊ฒŒ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.

T ๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ํด๋ž˜์Šค ๊ฐ’๋“ค์ด soft ํ•ด์ง€๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ “์ฆ๋ฅ˜” ์˜ ๋ฐฉ์‹์€ ๊ฑฐ๋Œ€ํ•œ ๋ชจ๋ธ์„ teacher model ๋กœ ํŠน์ˆ˜ํ•œ ๋ชฉ์ ์œผ๋กœ ๋งŒ๋“ค ์ž‘์€ ๋ชจ๋ธ์„ Student(distilled model) ์ด๋ผ ํ•ฉ๋‹ˆ๋‹ค.

t๋ฅผ ์ตœ๋Œ€๋กœ ํ•˜๊ณ , ๊ธฐ์กด์˜ transfer dataset ์„ teacher ๋ชจ๋ธ์— ๋„ฃ๊ณ  soft label (1) ์„ ์–ป์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  student ๋ชจ๋ธ์— inference ํ•ด์„œ soft prediction(2) ์„ ์–ป์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  student model ์—์„œ hard prediction(3)๊ฒฐ๊ณผ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  (1) ์™€ (2) ์˜ cross entropy , (2) ์™€ (3) ์˜ cross entropy ๋ฅผ ๊ฐ€์ค‘ํ•ฉ ํ•˜๋Š” ๋ฐฉ์‹์ด ์„ฑ๋Šฅ์ด ์ข‹๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

cross entropy gradient ์˜ ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ Vi ๋Š” ํฐ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๊ฐ’์„ ๋‚˜ํƒ€๋‚ด๊ณ  pi ๋Š” soft label ์˜ ํ™•๋ฅ ๊ฐ’์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‹ˆ๊นŒ qi ์™€ pi ์˜ cross entropy ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ logit ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๊ทผ์‚ฌํ•˜๋Š” ๋ฌธ์ œ๋กœ ๋ณ€ํ™˜์‹œํ‚ค๊ณ  ์žˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

์ฆ๋ฅ˜์—์„œ Teacher Model ๊ณผ Student ์˜ ๋ชจ๋ธ output ์ฐจ์ด๋ฅผ ํ™œ์šฉํ•ด gradient ๊ณ„์‚ฐ์„ ํ•˜๋ ค๋Š” ์›€์ง์ž„์ž…๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ T๊ฐ€ ์ถฉ๋ถ„ํžˆ ํฌ๋‹ค๋ฉด ํ…Œ์ผ๋Ÿฌ ๊ทผ์‚ฌ๋ฅผ ํ™œ์šฉํ•ด ์•„๋ž˜์™€ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค

 

๊ทธ๋ฆฌ๊ณ  logit ์˜ ํ‰๊ท ์ด 0 ์ด๋ผ๊ณ  ๊ฐ€์ •ํ•œ๋‹ค๋ฉด 0 ์œผ๋กœ ๋ณ€ํ™˜๋˜๋‹ˆ

student model output ์— ๋Œ€ํ•œ Cross entropy ๋ณ€ํ™”์œจ ์ฆ‰ gradient ๋Š” nt^1 ์— ๋ฐ˜๋น„๋ก€ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๊ฒฐ๋ก ์€ “T๊ฐ€ ์ถฉ๋ถ„ํžˆ ํฐ ์ƒํ™ฉ์—์„œ logit ๋“ค์˜ ํ‰๊ท ์ด 0 ์œผ๋กœ ์ฃผ์–ด์กŒ๋‹ค๋ฉด, Distillation ์€ 1/nt^2 ์„ ์ตœ์†Œํ™” ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

๋ฐ˜๋Œ€๋กœ T ๊ฐ€ ๋‚ฎ๋‹ค๋ฉด gradient ๋ฅผ ์ตœ๋Œ€ํ™” ์‹œํ‚ค๋‹ˆ ๋ชจ๋ธ์ด ๋„ˆ๋ฌด ์ž‘๋‹ค๋ฉด ์ค‘๊ฐ„์ •๋„์˜ temperature ๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๊ฒƒ์ด ์ข‹๋‹ค๊ณ  ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค

 

 

 

 

๊ธฐ์—ฌ๋„

๋ณธ ๋…ผ๋ฌธ์˜ ๊ฐ€์žฅ ํฐ ๊ธฐ์—ฌ๋Š” ๋‹ค์Œ ์„ธ ๊ฐ€์ง€๋กœ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

1. Soft Target์„ ํ™œ์šฉํ•œ ์ง€์‹ ์ „๋‹ฌ ๋ฐฉ์‹์˜ ์ •์‹ํ™”

๊ธฐ์กด์˜ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์€ ์ •๋‹ต ๋ ˆ์ด๋ธ”(one-hot ๋ฒกํ„ฐ)๋งŒ์„ ํ•™์Šต ์‹ ํ˜ธ๋กœ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ณธ ๋…ผ๋ฌธ์€ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ ํ™•๋ฅ  ๋ถ„ํฌ ์ „์ฒด๊ฐ€ ์ง€์‹์ด๋ผ๋Š” ๊ด€์ ์„ ์ œ์‹œํ•˜์˜€์Šต๋‹ˆ๋‹ค.

ํŠนํžˆ Teacher ๋ชจ๋ธ์ด ์ถœ๋ ฅํ•œ softmax ํ™•๋ฅ  ๋ถ„ํฌ์—๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ •๋ณด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • ํด๋ž˜์Šค ๊ฐ„ ์œ ์‚ฌ๋„
  • ๋ชจ๋ธ์ด ํ—ท๊ฐˆ๋ฆฌ๋Š” ์ •๋„
  • ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์— ๋Œ€ํ•œ ์•”๋ฌต์  ๊ตฌ์กฐ

์ด๋ฅผ Student ๋ชจ๋ธ์ด ํ•™์Šตํ•˜๋„๋ก ๋งŒ๋“ค๋ฉด, ๋‹จ์ˆœํžˆ ์ •๋‹ต์„ ๋งž์ถ”๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ Teacher์˜ ํŒ๋‹จ ๊ตฌ์กฐ ์ž์ฒด๋ฅผ ๋ชจ๋ฐฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ด๋Š” ์ดํ›„ ๋ชจ๋“  Knowledge Distillation ์—ฐ๊ตฌ์˜ ์ถœ๋ฐœ์ ์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.


2. Temperature ๊ธฐ๋ฐ˜ ํ™•๋ฅ  ๋ถ„ํฌ ์ œ์–ด ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ œ์•ˆ

๋…ผ๋ฌธ์€ softmax ํ•จ์ˆ˜์— temperature TTT๋ฅผ ๋„์ž…ํ•˜์—ฌ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์กฐ์ ˆํ•˜๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • T=1์ผ ๊ฒฝ์šฐ ์ผ๋ฐ˜์ ์ธ softmax
  • T>1์ผ ๊ฒฝ์šฐ ๋ถ„ํฌ๊ฐ€ ํ‰ํƒ„ํ•ด์ง

Temperature๋ฅผ ๋†’์ด๋ฉด ๋ชจ๋ธ์˜ ํ™•์‹ (confidence)์ด ๋‚ฎ์•„์ง€๊ณ , ํด๋ž˜์Šค ๊ฐ„ ์ƒ๋Œ€์  ๊ด€๊ณ„ ์ •๋ณด๊ฐ€ ๋” ์ž˜ ๋“œ๋Ÿฌ๋‚ฉ๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด Student ๋ชจ๋ธ์€ ๋‹จ์ˆœ ์ •๋‹ต์ด ์•„๋‹ˆ๋ผ ํ™•๋ฅ  ๊ตฌ์กฐ ์ž์ฒด๋ฅผ ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ์ˆ˜ํ•™์  ๊ทผ์‚ฌ ๋ถ„์„์„ ํ†ตํ•ด, KD๊ฐ€ ๊ฒฐ๊ตญ logit ๊ฐ„ ์ฐจ์ด๋ฅผ ์ค„์ด๋Š” ๋ฌธ์ œ๋กœ ์ˆ˜๋ ดํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์ธ ์  ์—ญ์‹œ ์ค‘์š”ํ•œ ๊ธฐ์—ฌ์ž…๋‹ˆ๋‹ค.


3. ์•™์ƒ๋ธ” ๋ชจ๋ธ์˜ ์••์ถ• ๋ฐฉ๋ฒ• ์ œ์‹œ

๋…ผ๋ฌธ์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชจ๋ธ์„ ์•™์ƒ๋ธ”ํ•˜์—ฌ ์–ป์€ ๊ณ ์„ฑ๋Šฅ Teacher ๋ชจ๋ธ์„ ๋‹จ์ผ Student ๋ชจ๋ธ๋กœ ์••์ถ•ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹ค์šฉ์  ์˜๋ฏธ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

  • ํ•™์Šต ๋‹จ๊ณ„์—์„œ๋Š” ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ ์‚ฌ์šฉ
  • ๋ฐฐํฌ ๋‹จ๊ณ„์—์„œ๋Š” ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ ์‚ฌ์šฉ
  • ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ ๋ฐ ๋ฆฌ์†Œ์Šค ์ ˆ๊ฐ

์ฆ‰, ํ•™์Šต๊ณผ ๋ฐฐํฌ์˜ ๊ตฌ์กฐ์  ๋ถ„๋ฆฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ ๊ธฐ๋ฒ•์ด๋ผ๋Š” ์ ์—์„œ ์‚ฐ์—…์  ์˜๋ฏธ๊ฐ€ ๋งค์šฐ ํฝ๋‹ˆ๋‹ค.

์‹คํ—˜๊ฒฐ๊ณผ

MNIST

MNIST ์‹คํ—˜์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์„ค์ •์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • Teacher ๋ชจ๋ธ: ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชจ๋ธ์„ ์•™์ƒ๋ธ”ํ•œ ๊ณ ์„ฑ๋Šฅ ๋„คํŠธ์›Œํฌ
  • Student ๋ชจ๋ธ: ์ƒ๋Œ€์ ์œผ๋กœ ์ž‘์€ ๋„คํŠธ์›Œํฌ

๊ฒฐ๊ณผ์ ์œผ๋กœ Student ๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์ง•์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

  1. ์ผ๋ฐ˜์ ์ธ hard label ํ•™์Šต ๋Œ€๋น„ ๋” ๋‚ฎ์€ error rate
  2. Teacher์˜ ์„ฑ๋Šฅ์— ๊ทผ์ ‘ํ•œ ์ •ํ™•๋„ ๋‹ฌ์„ฑ
  3. ๊ณผ์ ํ•ฉ ๊ฐ์†Œ ํšจ๊ณผ

ํŠนํžˆ ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์€ ์ƒํ™ฉ์—์„œ๋„ soft target์„ ์‚ฌ์šฉํ•˜๋ฉด ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ๊ฐœ์„ ๋˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

์ด๋Š” soft target์ด ์ผ์ข…์˜ regularizer ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.


Speech Recognition

์Œ์„ฑ ์ธ์‹ ์‹คํ—˜์—์„œ๋Š” ๋Œ€๊ทœ๋ชจ acoustic model์„ Teacher๋กœ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

 

Teacher๋Š” ๋งค์šฐ ๋ณต์žกํ•˜๊ณ  ํฐ ๋„คํŠธ์›Œํฌ์˜€์œผ๋ฉฐ, ์ง์ ‘ ๋ฐฐํฌํ•˜๊ธฐ์—๋Š” ๋น„ํšจ์œจ์ ์ด์—ˆ์Šต๋‹ˆ๋‹ค.

Distillation์„ ์ ์šฉํ•œ Student ๋ชจ๋ธ์€:

  • ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜ ๊ฐ์†Œ
  • ์ถ”๋ก  ์†๋„ ๊ฐœ์„ 
  • ์ •ํ™•๋„๋Š” Teacher์— ๊ทผ์ ‘

ํŠนํžˆ soft target ๊ธฐ๋ฐ˜ ํ•™์Šต์ด hard label ๊ธฐ๋ฐ˜ ํ•™์Šต๋ณด๋‹ค ์•ˆ์ •์ ์ธ ์ˆ˜๋ ด ํŠน์„ฑ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

์ด๋Š” KD๊ฐ€ classification๋ฟ ์•„๋‹ˆ๋ผ sequence ๊ธฐ๋ฐ˜ ๋ฌธ์ œ์—๋„ ์ ์šฉ ๊ฐ€๋Šฅํ•จ์„ ๋ณด์—ฌ์ฃผ๋Š” ์‚ฌ๋ก€์ž…๋‹ˆ๋‹ค.

์ฐธ๊ต์ž๋ฃŒ :

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Distilling the Knowledge in a Neural Networkโ€‹

728x90
728x90

๋ฐฐ๊ฒฝ


vllm ์„œ๋ฒ„ ์šด์˜์ค‘ 0.14.0 ๋ฏธ๋งŒ ๋ฒ„์ „์—์„œ RCE ์ทจ์•ฝ์ ์ด ๋ฐœ์ƒํ–ˆ๋‹ค๊ณ  ํ•ด์„œ ๋ฒ„์ „ ํŒจ์น˜๋ฅผ ํ–ˆ์Šต๋‹ˆ๋‹ค. 
๊ทธ๋Ÿฐ๋ฐ ์ด์ „์— ๋‚˜์™€์žˆ๋˜ ์ทจ์•ฝ์  ์ค‘ ๋ชจ๋ธ ๋กœ๋“œ๋ฅผ ํ†ตํ•ด์„œ RCE ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ธ€์„ ๋ณด
๊ณ  ์ด๊ฒŒ ์–ด๋–ป๊ฒŒ ๊ฐ€๋Šฅํ•œ๊ฑด์ง€ ์ฐพ์•„๋ณด๊ฒŒ ๋˜์—ˆ๋Š”๋ฐ์š”, 
๋ฐฐํฌํฌ๋งท์ด๋‚˜ ์ผ๋ถ€ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ ๋ชจ๋ธ๋กœ๋“œ์—์„œ ๊ฐ€์ค‘์น˜๋งŒ ๋ถˆ๋Ÿฌ์˜ค๋Š”๊ฒƒ์ด ์•„๋‹ˆ๋ผ
ํŒŒ์ด์ฌ ์ฝ”๋“œ ๋กœ์ง์„ ํƒˆ ์ˆ˜ ์žˆ๋‹ค๋Š” ์‚ฌ์‹ค์„ ์•Œ๊ฒŒ ๋˜์–ด ์ •๋ฆฌํ•  ๊ฒธ ๊ธ€์„ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

 

 

CVE-2025-66448: vLLM Config Trust Bypass RCE | Miggo

The vulnerability lies in the __init__ method of the Nemotron_Nano_VL_Config class, located in the now-removed file vllm/transformers_utils/configs/nemotron_vl.py. The commit ffb08379d8870a1a81ba82b72797f196838d0c86 addresses the vulnerability by completel

www.miggo.io

 

๋ชจ๋ธ ๋ฐฐํฌ ํฌ๋งท

์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜๋‹ค ๋ณด๋ฉด ํ•™์Šต ์ž์ฒด๋ณด๋‹ค ๋” ๋งŽ์€ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ์ง€์ ์ด ๋ฐ”๋กœ ๋ฐฐํฌ์ž…๋‹ˆ๋‹ค. ํ•™์Šต๋œ ๋ชจ๋ธ์€ ๋‹จ์ˆœํ•œ ์ฝ”๋“œ๊ฐ€ ์•„๋‹ˆ๋ผ ์ˆ˜๋ฐฑ MB์—์„œ ์ˆ˜์‹ญ GB์— ์ด๋ฅด๋Š” ๊ฐ€์ค‘์น˜ ๋ฐ์ดํ„ฐ์™€ ์‹คํ–‰ ๊ตฌ์กฐ๋ฅผ ํ•จ๊ป˜ ๊ฐ–๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ด๋•Œ ๋ชจ๋ธ์„ ์–ด๋–ค ํ˜•ํƒœ๋กœ ์ €์žฅํ•˜๊ณ  ์ „๋‹ฌํ•  ๊ฒƒ์ธ๊ฐ€์— ๋Œ€ํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐ”๋กœ ๋ชจ๋ธ ๋ฐฐํฌ ํฌ๋งท์˜ ์ถœ๋ฐœ์ ์ž…๋‹ˆ๋‹ค.

์ดˆ๊ธฐ์—๋Š” ํ•™์Šตํ•œ ํ”„๋ ˆ์ž„์›Œํฌ ๋‚ด๋ถ€์—์„œ๋งŒ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋‹จ์ˆœํžˆ ๋ฉ”๋ชจ๋ฆฌ ๊ฐ์ฒด๋ฅผ ๊ทธ๋Œ€๋กœ ์ง๋ ฌํ™”ํ•˜๋Š” ๋ฐฉ์‹์ด ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ชจ๋ธ์ด ์ปค์ง€๊ณ , ํ˜‘์—…๊ณผ ์™ธ๋ถ€ ๊ณต์œ ๊ฐ€ ๋Š˜์–ด๋‚˜๋ฉด์„œ ์ž์—ฐ์Šค๋Ÿฌ์šด ์š”๊ตฌ์‚ฌํ•ญ์ด ๋“ฑ์žฅํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ€์žฅ ํฐ ๊ฒƒ์€ ๋‹ค๋ฅธ ํ™˜๊ฒฝ์—์„œ๋„ ๋™์ผํ•˜๊ฒŒ ๋ชจ๋ธ์„ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ธ๋ฐ์š”, ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ณ  ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์€ ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์„ฑํ•˜์ง€ ์•Š๋Š” ํ•œ ๊ทธ๋‹ค์ง€ ๋ฌธ์ œ๊ฐ€ ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค๋งŒ, ์ถ”๋ก ์„ ํ•  ๋•Œ์—๋Š” ์ด์‹์„ฑ์ด ์ค‘์š”ํ•˜๊ฒŒ ์—ฌ๊ฒจ์กŒ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋ชจ๋ธ ํŒŒ์ผ๋งŒ export ํ•˜๊ฒŒ ๋˜์—ˆ๊ณ , ์ด๋Ÿฐ ์š”๊ตฌ์‚ฌํ•ญ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ชจ๋ธ ๋ฐฐํฌ ํฌ๋งท์ด ๋“ฑ์žฅํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Pytorch .pt .pth

Pytorch ์˜ ๋ชจ๋ธ ์ €์žฅ ๋ฐฉ์‹์€ Python ๊ฐ์ฒด๋ฅผ ๊ทธ๋Œ€๋กœ ๋ฐ์ดํ„ฐ๋กœ ๋งŒ๋“œ๋Š” ๊ฒƒ์ธ๋ฐ ์ด๊ฒƒ์„ ์ง๋ ฌํ™”๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด ํฌ๋งท๋„ ๋‹ค๋ฅธ ํฌ๋งท๋“ค๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋ชจ๋ธ ์žฌํ˜„์„ฑ์˜ ์š”๊ตฌ์‚ฌํ•ญ์„ ํ•ด๊ฒฐํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— Research Level ์—์„œ๋Š” ํŽธํ•˜๊ฒŒ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋‚ด๋ถ€์ ์œผ๋กœ pickle ์„ ์‚ฌ์šฉํ•˜๊ณ , ์ฝ”๋“œ๋‚˜ ๋ฐ์ดํ„ฐ ์ž์ฒด๋ฅผ ๋ชจ๋‘ ์ง๋ ฌํ™” ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•ด๋‹น ๊ฐ์ฒด๋ฅผ ๋กœ๋“œํ•˜๋Š” ๊ฒฝ์šฐ RCE๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์น˜๋ช…์ ์ธ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

python ๊ณต์‹œ๋ฌธ์„œ์—์„œ pickle ์€ ์ง๋ ฌํ™”์™€ ์—ญ์ง๋ ฌํ™”๋ฅผ ์œ„ํ•œ ๋ชจ๋“ˆ์ด๋ผ๊ณ  ๋‚˜์™€์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์˜ˆ์‹œ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ๋“ค๋„ ๋‚˜์ค‘์— ํ•œ๋ฒˆ ์ฐพ์•„๋ณผ๋ฒ• ํ•œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

pickle — Python object serialization

๊ทธ๋ž˜์„œ Pytorch ์˜ ๋ชจ๋ธ์€ ๋ฐฐํฌํ™˜๊ฒฝ์—์„œ๋Š” ์‚ฌ์šฉ์„ ์ง€์–‘ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์€ ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

pytorch ๋Š” ๋ชจ๋ธ์˜ ํ˜•ํƒœ๋ฅผ ์ €์žฅํ•  ๋•Œ ์•„๋ž˜์™€ ๊ฐ™์ด ์ €์žฅํ•˜๋ฉด์„œ ์ง๋ ฌํ™”๋ฅผ ํ•˜๋Š”๋ฐ์š”, ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ์ €์žฅํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

import torch
#model ๊ฐ์ฒด ๊ทธ๋Œ€๋กœ ์ง๋ ฌํ™”
torch.save(model, 'model.pth')
torch.load('model.pth')

#model ํŒŒ๋ผ๋ฏธํ„ฐ ์ง๋ ฌํ™” 
torch.save(model.state_dict(), 'model.pth')
model.load_state_dict(torch.load('model.pth'))

๋ฐœ์ƒ๊ฐ€๋Šฅํ•œ ์ทจ์•ฝ์ 

# Define model
class TheModelClass(nn.Module):
    def __init__(self):
        super(TheModelClass, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Initialize model
model = TheModelClass()

๋งŒ์•ฝ ์œ„์™€ ๊ฐ™์€ ๋ชจ๋ธ์ด ์žˆ๋‹ค๋ฉด torch.save ํ•˜๋Š” ์‹œ์ ์—์„œ TheModelClass ๊ฐ€ ์ง๋ ฌํ™”๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿผ class ์•ˆ์— ์žˆ๋Š” ํ•จ์ˆ˜๋“ค์— ๋ญ”๊ฐ€ ๋‹ค๋ฅธ ๋ชฉ์ ์˜ ์ฝ”๋“œ๊ฐ€ ์žˆ๋‹ค๋ฉด torch.load() ํ•˜๋Š” ์‹œ์ ์—์„œ ๊ทธ๋Œ€๋กœ ์‹คํ–‰๋˜๊ฒ ์ง€์š”. ์ด๊ฒƒ์ด pytorch ์˜ model.state_dict() ๋ฅผ ์ €์žฅํ•˜์ง€ ์•Š๊ณ  save ํ–ˆ์„ ๋•Œ์˜ ๋ฌธ์ œ์  ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ pytorch ๊ถŒ์žฅ์‚ฌํ•ญ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ์ €์žฅ๋˜๊ฒŒ ํ•˜๋Š” torch.save(model.state_dict,’model.pth’) ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

Huggingface .safetensors

safetensors ๋Š” ๊ฐ€์ค‘์น˜๋ฅผ ๋น ๋ฅด๊ฒŒ ์ €์žฅํ•˜๊ณ  ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ์œ„ํ•œ ํ˜•์‹์ธ๋ฐ์š”, ๋‹ค๋ฅธ ๋ชจ๋ธ์—์„œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์ทจ์•ฝ์  ๋ฌธ์ œ ํŠนํžˆ pickle ์„ ์‚ฌ์šฉํ•˜๋ฉด์„œ ๋ฐœ์ƒํ•˜๋Š” python ๊ฐ์ฒด์ €์žฅ์ด๋‚˜ ์‹คํ–‰๊ฐ€๋Šฅํ•œ ๊ตฌ์กฐ๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. safetensors ํŒŒ์ผ ๊ตฌ์กฐ๋Š” ํ—ค๋”์™€ ๋ธ”๋ก์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

ํ—ค๋”๋Š” JSON ํ˜•์‹์œผ๋กœ ๋œ ํ…์„œ๋“ค์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ์ด๊ณ , ๋ฐ์ดํ„ฐ๋ธ”๋ก์€ weight๋“ค์ด ์กด์žฌํ•˜๋Š” ๋ฐ”์ด๋„ˆ๋ฆฌ ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ safetensors ๋ฅผ ์—ด์–ด์„œ ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ์š”

https://huggingface.co/Qwen/Qwen3-ASR-1.7B/tree/main

 

Qwen/Qwen3-ASR-1.7B at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

 

์˜ ๋‘๋ฒˆ์งธ safetensors ๊ฐ€ ๋ฐœ๊ฒฌํ•œ๊ฒƒ์ค‘ ์šฉ๋Ÿ‰์ด ์ข€ ์ž‘๋„ค์š”, ์ด๊ฑฐ๋กœ ํ…Œ์ŠคํŠธ ํ•ด๋ณด์…”๋„ ์ข‹์„ ๋“ฏ ํ•ฉ๋‹ˆ๋‹ค.

from safetensors import safe_open

safetensors_file = 
with safe_open(safetensors_file, framework="pt") as f:
  tensor_name = f.keys()
  print(f"tensor list {tensor_name}")

  for key in tensor_name:
    tensor = f.get_tensor(key)
    print(f"tensor name {key} ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : {tensor.dtype}")
    print(f"tensor name {key} ์˜ shape : {tensor.shape}")
tensor list ['thinker.model.layers.5.mlp.gate_proj.weight', 'thinker.model.layers.5.mlp.up_proj.weight', 'thinker.model.layers.5.post_attention_layernorm.weight', 'thinker.model.layers.5.self_attn.k_norm.weight', 'thinker.model.layers.5.self_attn.k_proj.weight', 'thinker.model.layers.5.self_attn.o_proj.weight', 'thinker.model.layers.5.self_attn.q_norm.weight', 'thinker.model.layers.5.self_attn.q_proj.weight', 'thinker.model.layers.5.self_attn.v_proj.weight', 'thinker.model.layers.6.input_layernorm.weight', 'thinker.model.layers.6.mlp.down_proj.weight', 'thinker.model.layers.6.mlp.gate_proj.weight', 'thinker.model.layers.6.mlp.up_proj.weight', 'thinker.model.layers.6.post_attention_layernorm.weight', 'thinker.model.layers.6.self_attn.k_norm.weight', 'thinker.model.layers.6.self_attn.k_proj.weight', 'thinker.model.layers.6.self_attn.o_proj.weight', 'thinker.model.layers.6.self_attn.q_norm.weight', 'thinker.model.layers.6.self_attn.q_proj.weight', 'thinker.model.layers.6.self_attn.v_proj.weight', 'thinker.model.layers.7.input_layernorm.weight', 'thinker.model.layers.7.mlp.down_proj.weight', 'thinker.model.layers.7.mlp.gate_proj.weight', 'thinker.model.layers.7.mlp.up_proj.weight', 'thinker.model.layers.7.post_attention_layernorm.weight', 'thinker.model.layers.7.self_attn.k_norm.weight', 'thinker.model.layers.7.self_attn.k_proj.weight', 'thinker.model.layers.7.self_attn.o_proj.weight', 'thinker.model.layers.7.self_attn.q_norm.weight', 'thinker.model.layers.7.self_attn.q_proj.weight', 'thinker.model.layers.7.self_attn.v_proj.weight', 'thinker.model.layers.8.input_layernorm.weight', 'thinker.model.layers.8.mlp.down_proj.weight', 'thinker.model.layers.8.mlp.gate_proj.weight', 'thinker.model.layers.8.mlp.up_proj.weight', 'thinker.model.layers.8.post_attention_layernorm.weight', 'thinker.model.layers.8.self_attn.k_norm.weight', 'thinker.model.layers.8.self_attn.k_proj.weight', 'thinker.model.layers.8.self_attn.o_proj.weight', 'thinker.model.layers.8.self_attn.q_norm.weight', 'thinker.model.layers.8.self_attn.q_proj.weight', 'thinker.model.layers.8.self_attn.v_proj.weight', 'thinker.model.layers.9.input_layernorm.weight', 'thinker.model.layers.9.mlp.down_proj.weight', 'thinker.model.layers.9.mlp.gate_proj.weight', 'thinker.model.layers.9.mlp.up_proj.weight', 'thinker.model.layers.9.post_attention_layernorm.weight', 'thinker.model.layers.9.self_attn.k_norm.weight', 'thinker.model.layers.9.self_attn.k_proj.weight', 'thinker.model.layers.9.self_attn.o_proj.weight', 'thinker.model.layers.9.self_attn.q_norm.weight', 'thinker.model.layers.9.self_attn.q_proj.weight', 'thinker.model.layers.9.self_attn.v_proj.weight', 'thinker.model.norm.weight']
tensor name thinker.model.layers.5.mlp.gate_proj.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.5.mlp.gate_proj.weight ์˜ shape : torch.Size([6144, 2048])
tensor name thinker.model.layers.5.mlp.up_proj.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.5.mlp.up_proj.weight ์˜ shape : torch.Size([6144, 2048])
tensor name thinker.model.layers.5.post_attention_layernorm.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.5.post_attention_layernorm.weight ์˜ shape : torch.Size([2048])
tensor name thinker.model.layers.5.self_attn.k_norm.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.5.self_attn.k_norm.weight ์˜ shape : torch.Size([128])
tensor name thinker.model.layers.5.self_attn.k_proj.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.5.self_attn.k_proj.weight ์˜ shape : torch.Size([1024, 2048])
tensor name thinker.model.layers.5.self_attn.o_proj.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.5.self_attn.o_proj.weight ์˜ shape : torch.Size([2048, 2048])
tensor name thinker.model.layers.5.self_attn.q_norm.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.5.self_attn.q_norm.weight ์˜ shape : torch.Size([128])
tensor name thinker.model.layers.5.self_attn.q_proj.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.5.self_attn.q_proj.weight ์˜ shape : torch.Size([2048, 2048])
tensor name thinker.model.layers.5.self_attn.v_proj.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.5.self_attn.v_proj.weight ์˜ shape : torch.Size([1024, 2048])
tensor name thinker.model.layers.6.input_layernorm.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.6.input_layernorm.weight ์˜ shape : torch.Size([2048])
tensor name thinker.model.layers.6.mlp.down_proj.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.6.mlp.down_proj.weight ์˜ shape : torch.Size([2048, 6144])
tensor name thinker.model.layers.6.mlp.gate_proj.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.6.mlp.gate_proj.weight ์˜ shape : torch.Size([6144, 2048])
tensor name thinker.model.layers.6.mlp.up_proj.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16
tensor name thinker.model.layers.6.mlp.up_proj.weight ์˜ shape : torch.Size([6144, 2048])
tensor name thinker.model.layers.6.post_attention_layernorm.weight ์˜ ๋ฐ์ดํ„ฐํƒ€์ž… : torch.bfloat16

weight ์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š”๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. RCE ๋ฅผ ์›์ฒœ์ ์œผ๋กœ ๋ง‰๊ธฐ ์œ„ํ•ด ์„ค๊ณ„ ๋œ ๋งŒํผ safetensors ๋ชจ๋ธ์ž์ฒด์— ๋Œ€ํ•ด์„œ๋Š” ๋ฐœ๊ฒฌ๋œ ์ทจ์•ฝ์ ์ด ์—†์Šต๋‹ˆ๋‹ค.

Microsoft ONNX(Open Neural Network Exchange)

ONNX ๋Š” ๋งŽ์€ ๋จธ์‹ ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ ๊ฐ„์˜ ๋ชจ๋ธ์„ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋œ ์˜คํ”ˆ์†Œ์Šค ํฌ๋งท์ž…๋‹ˆ๋‹ค. ONNX ๋ฅผ ํ†ตํ•ด์„œ ๊ฐœ๋ฐœ์ž๋“ค์€ Pytorch ๋‚˜ Tensorflow ๋“ฑ ์ƒ์ดํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ ๊ฐœ๋ฐœํ•ด๋„ ONNX ๋ฅผ ํ†ตํ•ด์„œ ์„œ๋กœ๋‹ค๋ฅธ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ์‰ฝ๊ฒŒ ์ „ํ™˜ํ•ด์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์—ญ์‹œ ๋ฐฐํฌ๋ฅผ ์›ํ™œํ•˜๊ฒŒ ํ•˜์ž๋Š” ์ •์‹ ์—์„œ ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

import torch
import torchvision.models as models
import onnx

# ์‚ฌ์ „ ํ›ˆ๋ จ๋œ PyTorch ๋ชจ๋ธ ๋กœ๋“œ
model = models.resnet18(pretrained=True)
model.eval()

# ๋”๋ฏธ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
x = torch.randn(1, 3, 224, 224, requires_grad=True)

# ๋ชจ๋ธ์„ ONNX ํฌ๋งท์œผ๋กœ ๋ณ€ํ™˜
torch.onnx.export(model,               # ์‹คํ–‰ํ•  ๋ชจ๋ธ
                  x,                   # ๋ชจ๋ธ ์ž…๋ ฅ๊ฐ’ (ํŠœํ”Œ ๋˜๋Š” ์—ฌ๋Ÿฌ ์ž…๋ ฅ๊ฐ’์„ ์œ„ํ•œ ํŠœํ”Œ๋„ ๊ฐ€๋Šฅ)
                  "resnet18.onnx",     # ์ €์žฅ๋  ๋ชจ๋ธ์˜ ์ด๋ฆ„
                  export_params=True,  # ๋ชจ๋ธ ํŒŒ์ผ ๋‚ด ํ•™์Šต๋œ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ์ €์žฅํ• ์ง€์˜ ์—ฌ๋ถ€
                  opset_version=10,    # ๋ชจ๋ธ์„ ๋ณ€ํ™˜ํ•  ๋•Œ ์‚ฌ์šฉํ•  ONNX ๋ฒ„์ „
                  do_constant_folding=True,  # ์ตœ์ ํ™”: ์ƒ์ˆ˜ ํด๋”ฉ์„ ์ˆ˜ํ–‰ํ• ์ง€ ์—ฌ๋ถ€
                  input_names = ['input'],   # ๋ชจ๋ธ์˜ ์ž…๋ ฅ๊ฐ’์— ๋Œ€ํ•œ ์ด๋ฆ„
                  output_names = ['output'], # ๋ชจ๋ธ์˜ ์ถœ๋ ฅ๊ฐ’์— ๋Œ€ํ•œ ์ด๋ฆ„
                  dynamic_axes={'input' : {0 : 'batch_size'},    # ๋ฐฐ์น˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๋™์ ์œผ๋กœ ๋ณ€ํ•˜๋Š” ์ž…๋ ฅ ์ฐจ์›
                                'output' : {0 : 'batch_size'}})  # ๋ฐฐ์น˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๋™์ ์œผ๋กœ ๋ณ€ํ•˜๋Š” ์ถœ๋ ฅ ์ฐจ์›

ONNX ๋ฐœ์ƒ ๊ฐ€๋Šฅํ•œ ์ทจ์•ฝ์ 

์ตœ๊ทผ๊นŒ์ง€๋Š” ONNX ์˜ ๋ณด๊ณ ๋œ ์ทจ์•ฝ์ ๋“ค์—์„œ ONNX ์ž์ฒด์˜ ์ทจ์•ฝ์ ์€ ๊ฑฐ์˜ ์—†๋‹ค๊ณ  ํ•ด๋„ ๋ ์ •๋„๋กœ ์—†์—ˆ๊ณ , ๊ฒŒ๋‹ค๊ฐ€ RCE ๋Š” ์ „ํ˜€ ๋ณผ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋งˆ์ €๋„ C/C++ ์—„๋ฐ€ํžˆ ๋งํ•˜๋ฉด ๋Ÿฐํƒ€์ž„ ์œ ํ˜•์˜ ์ทจ์•ฝ์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค๋Š”๋ฐ์š”, ์ตœ๊ทผ ๋ฐœํ‘œ ๋œ Path Traveling ์ทจ์•ฝ์ ๋„ ONNX ํฌ๋งท์˜ ๋ฌธ์ œ๋ผ๊ธฐ๋ณด๋‹ค๋Š”, ONNX ๋ชจ๋ธ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๊ตฌํ˜„์˜ ์ทจ์•ฝ์ ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

 

ONNX Path Traversal Vulnerability Exploited | Matt T.๋‹˜์ด ํ† ํ”ฝ์— ๋Œ€ํ•ด ์˜ฌ๋ฆผ | LinkedIn

CVE-2025-51480 Path Traversal vulnerability in onnx.external_data_helper.save_external_data in ONNX 1.17.0 allows attackers to overwrite arbitrary files by supplying crafted external_data.location paths containing traversal sequences, bypassing intended di

www.linkedin.com

 

GGUF / GGML

GGML (Georgi Gerganov Machine Learning Format)

GGML์€ Georgi Gerganov๊ฐ€ ๊ฐœ๋ฐœํ•œ ๊ฒฝ๋Ÿ‰ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ, ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์„ ํฌํ•จํ•œ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ CPU ํ™˜๊ฒฝ์—์„œ ํšจ์œจ์ ์œผ๋กœ ์ถ”๋ก ํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ C/C++ ๊ธฐ๋ฐ˜ ํ”„๋กœ์ ํŠธ์ž…๋‹ˆ๋‹ค. Hugging Face์˜ ์†Œ๊ฐœ ๊ธ€์—์„œ๋„ ๊ฐ•์กฐํ•˜๋“ฏ, GGML์€ ๊ธฐ์กด ๋”ฅ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ๊ฐ–๋Š” ๋ณต์žก์„ฑ๊ณผ ๋ฌด๊ฑฐ์šด ์˜์กด์„ฑ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ๋งŒ๋“ค์–ด์กŒ์Šต๋‹ˆ๋‹ค.

์ผ๋ฐ˜์ ์ธ ๋จธ์‹ ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ์ธ PyTorch๋‚˜ TensorFlow๋Š” ๋งค์šฐ ๊ฐ•๋ ฅํ•˜์ง€๋งŒ, ๋Œ€๊ทœ๋ชจ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์˜์กด์„ฑ๊ณผ ๋ณต์žกํ•œ ๋นŒ๋“œ ํ™˜๊ฒฝ์„ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์„œ๋ฒ„ ํ™˜๊ฒฝ์—์„œ๋Š” ๋ฌธ์ œ๊ฐ€ ๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ฐœ์ธ PC๋‚˜ ๋‚ด๋ถ€๋ง, ์˜คํ”„๋ผ์ธ ํ™˜๊ฒฝ, ํ˜น์€ ๋ฆฌ์†Œ์Šค๊ฐ€ ์ œํ•œ๋œ ์‹œ์Šคํ…œ์—์„œ๋Š” ๋ถ€๋‹ด์œผ๋กœ ์ž‘์šฉํ•ฉ๋‹ˆ๋‹ค. GGML์€ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์™ธ๋ถ€ ์˜์กด์„ฑ์„ ๊ฑฐ์˜ ๊ฐ–์ง€ ์•Š๋Š” ๊ตฌ์กฐ, ๊ทธ๋ฆฌ๊ณ  ๋‹จ์ˆœํ•œ C ์ฝ”๋“œ ๊ธฐ๋ฐ˜ ๊ตฌํ˜„์„ ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค.

GGML์˜ ํ•ต์‹ฌ ์ฒ ํ•™์€ “์ž‘๊ณ , ๋‹จ์ˆœํ•˜๋ฉฐ, ์˜ˆ์ธก ๊ฐ€๋Šฅํ•œ ์‹คํ–‰”์ž…๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ GGML์€ ๋ช‡ ๊ฐœ์˜ ์†Œ์Šค ํŒŒ์ผ๋งŒ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, ์ปดํŒŒ์ผ๋œ ๋ฐ”์ด๋„ˆ๋ฆฌ ํฌ๊ธฐ ์—ญ์‹œ ๋งค์šฐ ์ž‘์Šต๋‹ˆ๋‹ค. ๋ณ„๋„์˜ Python ๋Ÿฐํƒ€์ž„์ด๋‚˜ ๋Œ€ํ˜• ํ”„๋ ˆ์ž„์›Œํฌ ์—†์ด๋„ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ํ™˜๊ฒฝ ์ด์‹์„ฑ์ด ๋งค์šฐ ๋›ฐ์–ด๋‚ฉ๋‹ˆ๋‹ค. Linux, macOS, Windows๋Š” ๋ฌผ๋ก ์ด๊ณ  ARM ์•„ํ‚คํ…์ฒ˜๋‚˜ Apple Silicon ํ™˜๊ฒฝ์—์„œ๋„ ๋น„๊ต์  ์‰ฝ๊ฒŒ ๋นŒ๋“œํ•˜๊ณ  ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋˜ ํ•˜๋‚˜์˜ ์ค‘์š”ํ•œ ํŠน์ง•์€ ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ์ž…๋‹ˆ๋‹ค. GGML์€ ํ…์„œ ํ‘œํ˜„๊ณผ ์—ฐ์‚ฐ์—์„œ ๋ถˆํ•„์š”ํ•œ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ œ๊ฑฐํ•˜๊ณ , CPU ์บ์‹œ ์นœํ™”์ ์ธ ๋ฉ”๋ชจ๋ฆฌ ๋ ˆ์ด์•„์›ƒ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ GGML์ด ๋„๋ฆฌ ์ฃผ๋ชฉ๋ฐ›๊ฒŒ ๋œ ์ด์œ  ์ค‘ ํ•˜๋‚˜๋Š” ๊ฐ•๋ ฅํ•œ ์–‘์žํ™”(quantization) ์ง€์›์ž…๋‹ˆ๋‹ค. float32 ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ int8, int5, int4 ์ˆ˜์ค€์œผ๋กœ ์••์ถ•ํ•ด ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ํฌ๊ฒŒ ์ค„์ด๋ฉด์„œ๋„, ์ถ”๋ก  ์„ฑ๋Šฅ์„ ์‹ค์šฉ์ ์ธ ์ˆ˜์ค€์œผ๋กœ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ํŠน์„ฑ ๋•๋ถ„์— GGML์€ ํ•™์Šต๋ณด๋‹ค๋Š” ์ถ”๋ก  ์ค‘์‹ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๊ฐ€๋Šฅํ•œ ํ•œ ์ ์€ ์ž์›์œผ๋กœ ๋น ๋ฅด๊ฒŒ ์‹คํ–‰ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๋ฉฐ, ์‹ค์ œ๋กœ llama.cpp, whisper.cpp, GPT4All, LM Studio, Ollama์™€ ๊ฐ™์€ ์—ฌ๋Ÿฌ ํ”„๋กœ์ ํŠธ๋“ค์ด GGML์„ ์ €์ˆ˜์ค€ ์—ฐ์‚ฐ ์—”์ง„์œผ๋กœ ํ™œ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ GGML์€ ๋‹จ์ˆœํ•œ ๋ชจ๋ธ ํฌ๋งท์ด๋ผ๊ธฐ๋ณด๋‹ค๋Š”, ๋ชจ๋ธ ์‹คํ–‰์„ ๋‹ด๋‹นํ•˜๋Š” ์ €์ˆ˜์ค€ ๋Ÿฐํƒ€์ž„์— ๊ฐ€๊น๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ตฌ์กฐ์ ์œผ๋กœ ๋ณด๋ฉด GGML์€ ๋‚ด๋ถ€์— ํ…์„œ์™€ ์—ฐ์‚ฐ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” context๋ฅผ ๋‘๊ณ , ์—ฐ์‚ฐ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ CPU, CUDA, Metal ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐฑ์—”๋“œ๋ฅผ ์ง€์›ํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์–ด ์žˆ์œผ๋ฉฐ, ๋ฐฑ์—”๋“œ๋ณ„๋กœ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น๊ณผ ์—ฐ์‚ฐ ์Šค์ผ€์ค„๋ง์„ ๋ถ„๋ฆฌํ•ด ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ ๋•๋ถ„์— ๊ฐ€๋ณ์ง€๋งŒ ๋‹จ์ˆœํ•œ ์ˆ˜์ค€์„ ๋„˜๋Š” ์œ ์—ฐ์„ฑ์„ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

๋‹ค๋งŒ GGML์€ ์ด๋Ÿฌํ•œ ์žฅ์ ๊ณผ ํ•จ๊ป˜ ํ•œ๊ณ„๋„ ๊ฐ–๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. C/C++ ๊ธฐ๋ฐ˜ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํŠน์„ฑ์ƒ ์‚ฌ์šฉ ๋‚œ์ด๋„๊ฐ€ ๋†’๊ณ , Python ๊ธฐ๋ฐ˜ ํ”„๋ ˆ์ž„์›Œํฌ์— ์ต์ˆ™ํ•œ ์‚ฌ์šฉ์ž์—๊ฒŒ๋Š” ์ง„์ž… ์žฅ๋ฒฝ์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ชจ๋ธ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ํ‘œํ˜„์ด ์ œํ•œ์ ์ด๊ณ , ํ† ํฌ๋‚˜์ด์ €๋‚˜ specia1 token, rope ์„ค์ •๊ณผ ๊ฐ™์€ ๋ถ€๊ฐ€ ์ •๋ณด๋ฅผ ํ•จ๊ป˜ ๊ด€๋ฆฌํ•˜๋Š” ๋ฐ์—๋Š” ๋ถˆํŽธํ•จ์ด ์กด์žฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋Š” ๋ชจ๋ธ์ด ๋ณต์žกํ•ด์งˆ์ˆ˜๋ก ์ ์  ๋” ๋ฌธ์ œ๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฐฐ๊ฒฝ ์†์—์„œ GGML์€ ์ ์ฐจ GGUF(GGML Unified Format)๋กœ ๋ฐœ์ „ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. GGUF๋Š” GGML์˜ ์ฒ ํ•™์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„, ๋ชจ๋ธ ์‹คํ–‰์— ํ•„์š”ํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๋‹ค ๋ช…ํ™•ํ•˜๊ณ  ํ™•์žฅ ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋‹ด๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ํฌ๋งท์ž…๋‹ˆ๋‹ค. ํ˜„์žฌ llama.cpp ์ƒํƒœ๊ณ„์—์„œ๋„ GGML๋ณด๋‹ค๋Š” GGUF ์‚ฌ์šฉ์ด ๊ถŒ์žฅ๋˜๊ณ  ์žˆ์œผ๋ฉฐ, GGML์€ ์ ์ฐจ ๋ ˆ๊ฑฐ์‹œ ํฌ๋งท์˜ ์œ„์น˜๋กœ ์ด๋™ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ •๋ฆฌํ•˜์ž๋ฉด, GGML์€ “๋ชจ๋ธ์„ ์•ˆ์ „ํ•˜๊ฒŒ ์ €์žฅํ•œ๋‹ค”๋Š” ๋ฐฐํฌ ํฌ๋งท์˜ ๊ฐœ๋…๋ณด๋‹ค๋Š”, “๋ชจ๋ธ์„ ๊ฐ€๋ณ๊ณ  ํšจ์œจ์ ์œผ๋กœ ์‹คํ–‰ํ•œ๋‹ค”๋Š” ๋ชฉ์ ์— ์ถฉ์‹คํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. Python ๊ฐ์ฒด ์ง๋ ฌํ™”๋‚˜ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์ฝ”๋“œ ๋กœ๋”ฉ๊ณผ๋Š” ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€๊ธฐ ๋•Œ๋ฌธ์—, ๊ตฌ์กฐ์ ์œผ๋กœ RCE์™€ ๊ฐ™์€ ์ทจ์•ฝ์ ๊ณผ๋„ ๋ฌด๊ด€ํ•œ ํŽธ์ž…๋‹ˆ๋‹ค. ๋‹ค๋งŒ ๋‹ค๋ฅธ ๋ชจ๋“  ์‹คํ–‰ ์—”์ง„๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ์ตœ์ข…์ ์ธ ์•ˆ์ •์„ฑ๊ณผ ๋ณด์•ˆ์„ฑ์€ ๋Ÿฐํƒ€์ž„ ๊ตฌํ˜„๊ณผ ์šด์˜ ๋ฐฉ์‹์— ์˜ํ•ด ๊ฒฐ์ •๋œ๋‹ค๋Š” ์ ์€ ๋™์ผํ•˜๊ฒŒ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.

GGUF (GGML Unified Format)

GGUF๋Š” GGML์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๊ฐœ์„ ๋œ ํฌ๋งท์ž…๋‹ˆ๋‹ค. ์ด๋ฆ„์—์„œ ์•Œ ์ˆ˜ ์žˆ๋“ฏ 'ํ†ตํ•ฉ๋œ(Unified)' ํ˜•์‹์„ ์ง€ํ–ฅํ•˜๋ฉฐ, ๋” ๋งŽ์€ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•˜๊ณ  ํ™•์žฅ์„ฑ์„ ๋†’์˜€์Šต๋‹ˆ๋‹ค. ์ด๋ฆ„์„ ๋ถ™์ผ๋•Œ์—๋„

<BaseName><SizeLabel><FineTune><Version><Encoding><Type><Shard>.gguf ๋ผ๋Š” ๋„ค์ด๋ฐ ๊ทœ์น™์„ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋” ๋งŽ์€ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•  ์ˆ˜ ์žˆ๊ฒŒ ํŒŒ์ผ๊ตฌ์กฐ๊ฐ€ ๊ฐœ์„ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

GGUF ๋Š” ๋„ˆ๋ฌด ๋งŽ์€ ์ด์•ผ๊ธฐ๋“ค์ด ์žˆ๋Š”๋ฐ ๋”ฐ๋กœ ๋‹ค๋ฃจ๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๋ก ์€ GGML ์€ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ ์„œ๋น™ ํŠนํ™” ๋ฐฐํฌ ํฌ๋งท์ด๊ณ , GGUF ๋Š” ์—ฌ๊ธฐ์„œ ๊ด€๋ฆฌ์ ์ธ ์ธก๋ฉด์„ ๊ณ ๋„ํ™”ํ•œ ํฌ๋งท์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

GGML /GGUF ์˜ ์ทจ์•ฝ์  ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ

GGML ์ด๋‚˜ GGUF ๋‘˜๋‹ค Python ๊ฐ์ฒด๋ฅผ ํฌํ•จํ•˜์ง€ ์•Š๊ณ  ๊ฐ™์€ ์˜๋ฏธ๋กœ pickle ์ด๋‚˜ ์–ด๋–ค ์Šคํฌ๋ฆฝํŠธ๋ฅผ ํฌํ•จํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋ชจ๋ธ ์ž์ฒด๊ฐ€ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰์‹œํ‚จ๋‹ค๋˜์ง€์˜ ์ทจ์•ฝ์ ์€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์•Œ์•„๋ณด๋‹ค ๋ณด๋‹ˆ ์ •๋ง ๋„ˆ๋ฌด ๋งŽ์€ ํ”„๋ ˆ์ž„์›Œํฌ๋“ค์ด ์žˆ๋”๋ผ๊ตฌ์š”, ๊ทธ๋ž˜์„œ GPT ์—๊ฒŒ ์ •๋ฆฌ๋ฅผ ์ข€ ํ•ด๋‹ฌ๋ผ ํ–ˆ๋”๋‹ˆ ์–ด๋””์„œ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š”์ง€๋„ ๋ชจ๋ฅด๋Š” ๋…€์„๋“ค๊นŒ์ง€ ๊ฐ€์ ธ๋‹ค ์ •๋ฆฌ๋ฅผ ํ–ˆ๋„ค์š”,

ํฌ๋งท / ํ˜•ํƒœ ์ฃผ ์‚ฌ์šฉ์ฒ˜ ํฌํ•จ ๋‚ด์šฉ ์ฝ”๋“œ ์‹คํ–‰ ๊ฐ€๋Šฅ์„ฑ ๋ณด์•ˆ ์œ„ํ—˜๋„ ์žฅ์  ๋‹จ์  ๊ถŒ์žฅ ์‚ฌ์šฉ ์—ฌ๋ถ€
safetensors HF, ๋‚ด๋ถ€๋ง, ๋ณด์•ˆ ํ™˜๊ฒฝ ์ˆœ์ˆ˜ ํ…์„œ ๊ฐ€์ค‘์น˜ โŒ ์—†์Œ โญ ๋งค์šฐ ๋‚ฎ์Œ pickle ๋ฏธ์‚ฌ์šฉ, fast mmap, ์•ˆ์ „ ๊ฐ€์ค‘์น˜๋งŒ ์ €์žฅ โœ… ๊ฐ•๋ ฅ ๊ถŒ์žฅ
PyTorch .pt / .pth ์—ฐ๊ตฌ/๊ฐœ๋ฐœ Python ๊ฐ์ฒด + ๊ฐ€์ค‘์น˜ ๐Ÿ”ฅ ๊ฐ€๋Šฅ ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ์ €์žฅ ์œ ์—ฐ์„ฑ pickle ๊ธฐ๋ฐ˜ RCE โŒ ๋ฐฐํฌ ๊ธˆ์ง€
HF .bin (pytorch_model.bin) HF ๊ตฌ๋ฒ„์ „ pickle ๊ฐ€์ค‘์น˜ ๐Ÿ”ฅ ๊ฐ€๋Šฅ ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ํ˜ธํ™˜์„ฑ ์‚ฌ์‹ค์ƒ .pt โŒ
ONNX .onnx ์ถ”๋ก /์„œ๋น™ ์ •์  ๊ทธ๋ž˜ํ”„ + ๊ฐ€์ค‘์น˜ โŒ โญ ๋‚ฎ์Œ ํ”„๋ ˆ์ž„์›Œํฌ ๋…๋ฆฝ, ๋น ๋ฆ„ ๋™์  ๊ตฌ์กฐ ์ œํ•œ โœ… ์ถ”๋ก ์šฉ
TorchScript .ts / .pt PyTorch ์„œ๋น™ IR ๊ทธ๋ž˜ํ”„ + ๊ฐ€์ค‘์น˜ โš ๏ธ ์ œํ•œ์  โš ๏ธ ์ค‘๊ฐ„ Python ์ œ๊ฑฐ ๋””๋ฒ„๊น… ์–ด๋ ค์›€ โš ๏ธ ์ œํ•œ์ 
TensorFlow SavedModel TF ์„œ๋น™ ๊ทธ๋ž˜ํ”„ + ๊ฐ€์ค‘์น˜ โŒ โญ ๋‚ฎ์Œ TF Serving ์ตœ์  TF ์ข…์† โš ๏ธ
HDF5 .h5 Keras ๊ฐ€์ค‘์น˜ + ๊ตฌ์กฐ โŒ โญ ๋‚ฎ์Œ ๋‹จ์ˆœ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ ํ•œ๊ณ„ โš ๏ธ
GGUF / GGML llama.cpp ์–‘์žํ™” ๊ฐ€์ค‘์น˜ โŒ โญ ๋‚ฎ์Œ CPU ์นœํ™” ํ•™์Šต ๋ถˆ๊ฐ€ โœ… ๋กœ์ปฌ
MLflow model MLOps ๋ชจ๋ธ + ๋ฉ”ํƒ€ + ์ฝ”๋“œ ๐Ÿ”ฅ ๊ฐ€๋Šฅ ๐Ÿ”ฅ๐Ÿ”ฅ ๊ด€๋ฆฌ ํŽธํ•จ ์ฝ”๋“œ ํฌํ•จ โš ๏ธ ๊ฒ€์ฆ ํ•„์ˆ˜
Triton model repo NVIDIA Triton ๋ชจ๋ธ + config โŒ โญ ๋‚ฎ์Œ ๊ณ ์„ฑ๋Šฅ ์„œ๋น™ ์„ค์ • ๋ณต์žก โœ…
Docker image ๋ฐฐํฌ ๋ชจ๋ธ + ์ฝ”๋“œ + OS ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ์žฌํ˜„์„ฑ ๊ณต๊ฒฉ๋ฉด ํผ โš ๏ธ ๋‚ด๋ถ€๊ฒ€์ฆ
HF repo (์ „์ฒด) ๊ณต์œ  ๊ฐ€์ค‘์น˜ + Python ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ํŽธ์˜์„ฑ trust_remote_code โŒ ๋ฌด๊ฒ€์ฆ
LoRA / Adapter ํŒŒ์ธํŠœ๋‹ ๊ฐ€์ค‘์น˜ delta โŒ โญ ๋‚ฎ์Œ ๊ฒฝ๋Ÿ‰ base ํ•„์š” โœ…

๊ทธ๋ž˜์„œ ๊ฒฐ๋ก ์€ ๋ชจ๋ธ์€ ์—ฌ๋Ÿฌ ์š”๊ตฌ์‚ฌํ•ญ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ํ†ตํ•ฉ๋œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ–ˆ๊ณ , ๊ทธ๊ณณ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ทจ์•ฝ์ ์€ ๋Œ€์ฒด๋กœ pickle ์˜ ์ง๋ ฌํ™”๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๊ธฐ๋Œ€๋˜๋Š” ๋ฌธ์ œ์ ๋“ค์ด์˜€์Šต๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ pickle ์˜ ์ง๋ ฌํ™”๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด, RCE ๊ฐ™์€ ์น˜๋ช…์ ์ธ ๋ฌธ์ œ๋“ค์€ ๋ชจ๋ธ ์ž์ฒด์—์„œ ์ƒ๊ธฐ์ง€ ์•Š์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ ๋ชจ๋ธ ๋Ÿฐํƒ€์ž„ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ทจ์•ฝ์ ๋“ค์€ ์ „ํ˜€ ๋‹ค๋ฅธ ์˜์—ญ์ด๋‹ˆ ์‚ฌ์šฉ์— ์ฐธ๊ณ ํ•ด์•ผํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

ํ‹€๋ฆฐ ์ •๋ณด๊ฐ€ ์žˆ๋‹ค๋ฉด ์•Œ๋ ค์ฃผ์„ธ์š”!

728x90
728x90

 

 

 

 

 

Learning data driven discretizations for partial differential equations

The numerical solution of partial differential equations (PDEs) is challenging because of the need to resolve spatiotemporal features over wide length and timescales. Often, it is computationally intractable to resolve the finest features in the solution.

arxiv.org

 

๋ฐฐ๊ฒฝ

 

๋จธ์‹ ๋Ÿฌ๋‹ ๋””์ž์ธํŒจํ„ด 4์žฅ ๋ชจ๋ธ ํ•™์Šต ๋””์ž์ธํŒจํ„ด ์ฝ๋˜ ์ค‘ ๊ณผ์ ํ•ฉ์ด ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋Š” ์‚ฌ๋ก€์— ๋Œ€ํ•œ ์„ค๋ช…์„ ์ฝ์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ซํžŒ ํ•ด๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ ์ฆ‰ ๋ฐ์ดํ„ฐ๊ฐ€ ํ˜„์‹ค์„ 100% ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ ๊ณ„์‚ฐ overhead ๊ทน๋ณต์„ ์œ„ํ•ด ML ์„ค๊ณ„๊ฐ€ ๋„์›€์ด ๋˜๋Š”๋ฐ, ์ด๋•Œ ๊ณผ์ ํ•ฉ์ด ๋” ์ ํ•ฉํ•˜๋‹ค๋Š” ๋‚ด์šฉ์„ ์ฝ์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ uniform approximation theorem ์—์„œ ํ•˜๋‚˜์˜ hidden layer ์™€ activation function ์ด ์žˆ๋Š” Network ์— ์˜ํ•ด ๊ทผ์‚ฌ๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ์‹คํ—˜์ด ๋งŽ์ด ์ฆ๋ช…๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์˜ ์—ฐ์žฅ์„ ์œผ๋กœ ์กฐ๊ธˆ ๋’ค ํ˜„์‹ค์„ธ๊ณ„์—์„œ๋Š” ๋ชจ๋“  ์ž…๋ ฅ์„ ํ…Œ์ด๋ธ”๋กœ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์œผ๋‹ˆ ์ž…๋ ฅ ๊ณต๊ฐ„์„ ์ƒ˜ํ”Œ๋งํ•ด์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๋ชฌํ…Œ์นด๋ฅผ๋กœ ์ ‘๊ทผ๋ฐฉ์‹์„ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ๋ชฌํ…Œ์นด๋ฅผ๋กœ ๋ฐฉ๋ฒ• ๋Œ€์‹  ๋จธ์‹ ๋Ÿฌ๋‹์„ ํ™œ์šฉํ•œ๋‹ค๋ฉด PDE ์˜ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์ด์‚ฐํ™”๋ฅผ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ํ˜„์‹ค์„ธ๊ณ„์˜ ํ˜„์ƒ์„ ๋ชจ๋ธ๋งํ•˜๋Š” PDE ๋ฅผ ์ง์ ‘๋งŒ๋“ค์ง€ ๋ง๊ณ  ๋จธ์‹ ๋Ÿฌ๋‹์„ ํ™œ์šฉํ•ด์„œ ํ™•๋ณด๋œ ๋ฐ์ดํ„ฐํ•„๋“œ๋ฅผ ์–ด๋–ป๊ฒŒ ์„ž์–ด์•ผํ•˜๋Š”์ง€๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค. ํ•จ์ˆ˜์—†์ด ํ•จ์ˆ˜์— ๊ทผ์‚ฌ์‹œํ‚ค๋ ค๋Š” ๋…ธ๋ ฅ์ž…๋‹ˆ๋‹ค. 

 

๊ทธ๋ฆฌ๊ณ  ๋ณธ ํฌ์ŠคํŒ…์—์„œ ๋ฆฌ๋ทฐํ•  ๋…ผ๋ฌธ์ด ๋ฐ”๋กœ ์ด์— ๋Œ€ํ•œ ํšจ๊ณผ๋ฅผ ์ž˜ ๋ณด์—ฌ์ค€ Learning data driven discretization for partial differential equations ์ž…๋‹ˆ๋‹ค. 

 

์š”์•ฝ

PDE ์˜ ์ˆ˜์น˜์  ์†”๋ฃจ์…˜์€ ๋‹ค์ฐจ์›์˜ ์‹œ๊ณต๊ฐ„์  ํ”ผ์ฒ˜๋“ค์„ ์ž˜ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ( ์ฐจ์›์˜ ์ €์ฃผ ) ์ด๊ฑธ ํ•ด๊ฒฐํ•˜๋ ค๋ฉด ๋„ˆ๋ฌด ๋งŽ์€ ์–‘์˜ ๊ณ„์‚ฐ์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์ธ๋ฐ์š”. ๊ธฐ์กด์˜ ์œ ์ผํ•œ ๋ฐฉ๋ฒ•์€ ์ข€ ์ œ๋Œ€๋กœํ•˜์ง€ ๋ชปํ• ์ง€๋ผ๋„ ๊ทผ์‚ฌ์‹œํ‚ค๋ ค๊ณ  ๋…ธ๋ ฅํ•˜๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค. ๊ทผ๋ฐ ๋ฌผ๋ก  ๊ทธ๊ฒƒ๋„ ์–ด๋งˆ์–ด๋งˆํ•˜๊ฒŒ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” ํ˜„์ƒ์— PDE ๋ฅผ ๊ทผ์‚ฌ์‹œํ‚ค๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ data driven ๋ฐฉ์ •์‹์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. PDE๋ฅผ ์ผ๋ฐ˜์ ์œผ๋กœ ์•Œ๋ ค์ง„ ๋ฐฉ์ •์‹์— ๊ธฐ๋ฐ˜ํ•ด์„œ neural network ๋ฅผ ์‚ฌ์šฉํ•ด ์ฐจ์›์˜ ์ €์ฃผ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

๋‹ค์‹œ ์ •๋ฆฌํ•˜์ž๋ฉด ๊ธฐ์กด PDE ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๋•Œ ์ ‘๊ทผ์€ ์ž˜ ์ •์˜๋œ PDE๋ฅผ ์–ผ๋งˆ๋‚˜ ์ •ํ™•ํ•˜๊ฒŒ ๊ทผ์‚ฌ์‹œํ‚ฌ ๊ฒƒ์ธ์ง€ ์˜€์ง€๋งŒ, ์ด์ œ๋Š” PDE๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ๊ทผ์‚ฌํ•˜์ง€ ์•Š๊ณ  ์ด์‚ฐํ™”๋œ ์—ฐ์‚ฐ์ž๋ฅผ ๋ฐ์ดํ„ฐ์— ๊ธฐ๋ฐ˜ํ•ด ํ•™์Šต์‹œํ‚ค์ž๋Š” ๊ด€์ ์ž…๋‹ˆ๋‹ค.

์ผ๋ฐ˜์ ์œผ๋กœ ๋Œ€๋ถ€๋ถ„์˜ ๋ฌผ๋ฆฌํ˜„์ƒ์€ PDE ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๊ณ  ์‹œ๊ฐ„์— ๋”ฐ๋ผ์„œ ๋ณ€ํ•˜๋Š” ์—ฐ์†์ ์ธ ๋ฌผ๋ฆฌ๋Ÿ‰์€ ์•„๋ž˜์™€ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌธ์ œ๋Š” ์ด๊ฑธ ์ปดํ“จํ„ฐ๋กœ ํ’€์–ด๋‚ด๋Š” ๋ฐฉ๋ฒ•์ธ๋ฐ์š”,

 

 

์ปดํ“จํ„ฐ๋Š” ์—ฐ์†์ ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ง์ ‘ ๋‹ค๋ฃฐ ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์šฐ๋ฆฌ๊ฐ€ ์ธ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” grid ์œ„์— ์ด์‚ฐ์ ์œผ๋กœ ํ‘œ์‹œํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ตœ๋Œ€ํ•œ ์—ฐ์†์ ์œผ๋กœ ๋‹ค๋ฃจ๋ ค๊ณ  ๋…ธ๋ ฅํ•˜์ง€์š”. ์ด๋•Œ ์œ ํ•œ์ฐจ๋ถ„ (FD) ์„ ํ†ตํ•ด์„œ ์‹œ๊ฐ„์— ๋Œ€ํ•œ ODE Form ์œผ๋กœ ๋ฐ”๋€Œ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

 

์—ฌ๊ธฐ์„œ F ๋Š” ์ด์ œ ์ด๊ฒƒ๋“ค์„ gird ์ฃผ๋ณ€์˜ ๊ฐ’๋“ค๋กœ ๊ทผ์‚ฌํ•ด์•ผ ํ•˜๋Š”๋ฐ,

์—ฌ๊ธฐ์„œ a(i)^n ์„ ์‹ ๊ฒฝ๋ง์œผ๋กœ ํ•™์Šตํ•ด์„œ ๊ตญ์†Œ์  ๊ตฌ์กฐ์— ๋”ฐ๋ผ ํ˜„์‹ค์— ๋‹ค๊ฐ€๊ฐ€๋„๋ก ์„ค๊ณ„ํ•˜๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค. ๊ณ ํ•ด์ƒ๋„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค๊ณ  ํ•ด๋‹น ์‹์— ๋“ฑ์žฅํ•˜๋Š” ๋ฏธ๋ถ„ํ•ญ๋“ค์˜ ์ด์‚ฐ๊ทผ์‚ฌ์‹( ์—ฌ๊ธฐ์„œ๋Š” sigma alpha ^ n v(i) ) ์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์ด ๊ณ„์‚ฐ๋น„์šฉ ์ธก๋ฉด์—์„œ์˜ ํŠธ๋ ˆ์ด๋“œ ์˜คํ”„๋ฅผ ๋งŒ๋“œ๋Š”๋ฐ ์ด๊ฑด ์ „์ฒด ํ•„๋“œ๊ฐ€ ์•„๋‹Œ ์ž‘์€ ๋ถ€๋ถ„์—์„œ ๊ณ ํ•ด์ƒ๋„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•ด ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ํ›จ์”ฌ ํฐ ์‹œ์Šคํ…œ์—[์„œ๋„ ๋‚ฎ์€ ๊ณต๊ฐ„์—์„œ ์–ป์€ ๋ฐ์ดํ„ฐ๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

→ ์ž‘์€ ์ง€์—ญ์—์„œ solution manifold ์œ„์˜ ์‹ค์ œ ํ˜„์ƒ์„ ์ž˜ ๊ทผ์‚ฌํ•˜๋ฉด , ๊ทธ ๊ตญ์†Œ์  ์ด์‚ฐํ™” ๊ทœ์น™(ML ๋ชจ๋ธ)์ด ๋” ํฐ ์‹œ์Šคํ…œ์—์„œ๋„ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ํ•˜๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‹ˆ๊นŒ ์ค‘์š”ํ•œ ๊ฒƒ์€ ์™„๋ฒฝํ•œ ์ ํ•ฉ์ด ์•„๋‹ˆ๋ผ ๊ตญ์†Œ ์˜์—ญ์—์„œ ์‹ค์ œ ๋ฌผ๋ฆฌ๊ณ„์˜ ์—ญํ•™ ์ƒํƒœ๋ฅผ ์ž˜ ํฌ์ฐฉ(๊ทผ์‚ฌ)ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์‹ค์ œ๋กœ ์‹ ๊ฒฝ๋ง์„ ํ™œ์šฉํ•ด ์‹ค์ œ ํ˜„์ƒ์— ๊ธฐ์—ฌํ•˜๋Š” ๋น„๊ต๋ฅผ ํ•ด๋ณด๋ฉด ๋ฐฉ์ •์‹์œผ๋กœ ๊ทผ์‚ฌํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋” ์ข‹๋‹ค๊ณ  ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค.

 

๊ธฐ์—ฌ๋„

์ด ๋…ผ๋ฌธ์˜ ๊ฐ€์žฅ ํฐ ๊ธฐ์—ฌ๋Š” PDE ํ•ด๋ฒ• ์ž์ฒด๊ฐ€ ์•„๋‹ˆ๋ผ, PDE ์ด์‚ฐํ™”๋ฅผ ํ•™์Šต์˜ ๋Œ€์ƒ์œผ๋กœ ์žฌ์ •์˜ํ–ˆ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด ์ˆ˜์น˜ํ•ด์„์€ ๋ฏธ๋ถ„ ์—ฐ์‚ฐ์ž๋ฅผ ๋ณดํŽธ์ ์œผ๋กœ ๊ทผ์‚ฌํ•˜๋ ค๊ณ  ์‹œ๋„ํ–ˆ๋Š”๋ฐ์š”, ๋ณธ ์—ฐ๊ตฌ๋Š” ํ•ด๊ฐ€ ์‹ค์ œ๋กœ ์กด์žฌํ•˜๋Š” solution manifold ์œ„์—์„œ๋งŒ ์œ ํšจํ•œ ๊ตญ์†Œ ์ด์‚ฐํ™” ๊ทœ์น™์„ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

๊ตฌ์ฒด์ ์œผ๋กœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ธฐ์—ฌ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฒซ์งธ, equation-specific discretization์ด๋ผ๋Š” ๊ฐœ๋…์„ ๋ช…ํ™•ํžˆ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์œ ํ•œ์ฐจ๋ถ„ ๊ณ„์ˆ˜๋Š” ๋ณดํŽธ์ ์ด์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ธฐ์กด ๊ด€์ ์—์„œ ๋ฒ—์–ด๋‚˜, ๋ฐฉ์ •์‹๊ณผ ๊ตญ์†Œ ์ƒํƒœ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๋Š” ์ด์‚ฐํ™” ๊ณ„์ˆ˜๋ฅผ ํ—ˆ์šฉํ•จ์œผ๋กœ์จ under-resolved ์กฐ๊ฑด์—์„œ๋„ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๋‘˜์งธ, solution manifold ๊ธฐ๋ฐ˜ ํ•™์Šต ๊ด€์ ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์‹ ๊ฒฝ๋ง์€ ์ „์ฒด ํ•จ์ˆ˜ ๊ณต๊ฐ„์„ ๊ทผ์‚ฌํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๊ฐ€๋Šฅํ•œ ํ•ด๊ฐ€ ๋†“์ด๋Š” ์ €์ฐจ์› ๋‹ค์–‘์ฒด(manifold)๋งŒ์„ ํŒŒ๋ผ๋ฏธํ„ฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ๊ฒฉ์ž ํ•ด์ƒ๋„๋ฅผ ๋‚ฎ์ถ”๋”๋ผ๋„ ์‹ค์ œ ๋ฌผ๋ฆฌ์  ๋™์—ญํ•™์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

์‹คํ—˜๊ฒฐ๊ณผ

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์ด์‚ฐํ™”๊ฐ€ ์‹ค์ œ PDE ๋ฌธ์ œ์—์„œ ๊ธฐ์กด ์ˆ˜์น˜ํ•ด์„๋ณด๋‹ค ์–ผ๋งˆ๋‚˜ ๋›ฐ์–ด๋‚œ์ง€ ๋ณด์ด๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ๋Œ€ํ‘œ์  ๋น„์„ ํ˜• PDE๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ์ค‘ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ 1์ฐจ์› Burgers’ equation์ธ๋ฐ์š”, ์ด ๋ฐฉ์ •์‹์€ ๊ฐ„๋‹จํ•˜์ง€๋งŒ ์ถฉ๊ฒฉํŒŒ(shock) ํ˜•์„ฑ๊ณผ ๊ฐ™์€ ๋ณต์žกํ•œ ๋น„์„ ํ˜• ํ˜„์ƒ์„ ํฌํ•จํ•ด ์ˆ˜์น˜์  ๊ทผ์‚ฌ๊ฐ€ ์–ด๋ ต๋‹ค๋Š” ์ ์—์„œ ์ด์ƒ์ ์ธ ํ…Œ์ŠคํŠธ๋ฒ ๋“œ์ž…๋‹ˆ๋‹ค.

burger’s equation ์€ ์•„๋ž˜์™€ ๊ฐ™์ด ์“ฐ์ž…๋‹ˆ๋‹ค.

 

burger’s equation ์€ ํ•ด์ƒ๋„๋ฅผ ๋งŽ์ด ๋‚ฎ์ถ˜ ๊ฒฝ์šฐ์—๋„ ๋ฐœ์‚ฐํ•˜์ง€ ์•Š๊ณ  ์˜ค์ฐจ๊ฐ€ ์ค„์–ด๋“ค์—ˆ์œผ๋ฉฐ ์ถฉ๊ฒฉํŒŒ์˜ ์œ„์น˜ํ™” ํ˜•ํƒœ๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ์ ๋ถ„ํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹ ๊ฒฝ๋ง์ด ์ถฉ๊ฒฉํŒŒ์™€ ๊ฐ™์€ ๋น„์„ ํ˜•์ ์ด๊ณ  ์˜ˆ์ธกํ•˜๊ธฐ ํž˜๋“ค๋ณด์ด๋Š” ๋น„์„ ํ˜•PDE์—์„œ๋„ ๊ทธ ๊ตฌ์กฐ๋ฅผ ์ž˜ ๋ฐ˜์˜ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์ด ์‹คํ—˜์œผ๋กœ ์ถ”๊ฐ€์ ์œผ๋กœ ํ•™์Šต ๋„๋ฉ”์ธ๋ณด๋‹ค ํ›จ์”ฌ ํฐ ๋„๋ฉ”์ธ์—์„œ๋„ ์ž˜ ์ž‘๋™ํ•จ์„ ์ฆ๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์„ ์ฝ์œผ๋ฉฐ ์ดˆ๋ฐ˜์— ๊ตญ์†Œ์  ์˜์—ญ์˜ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋กœ๋งŒ ํ•™์Šต์‹œ์ผฐ๋Š”๋ฐ, ๊ณผ์—ฐ ๋ชจ๋“  ์˜์—ญ์—์„œ ์ž˜ ์ž‘๋™ํ• ๊นŒ? ๊ฑฑ์ •ํ–ˆ์ง€๋งŒ, trainig domain ๋ณด๋‹ค 10๋ฐฐ ๋” ํฐ ๊ณต๊ฐ„์—์„œ๋„ burger’s equation ์„ ํ’€์–ด๋ณด์•˜์Œ์—๋„ ์ž˜ ์ž‘๋™ํ•˜์—ฌ ์ผ๋ฐ˜ํ™” ํ–ˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๊ฒฐ๋ก ์€ ์‹คํ—˜๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์ด์‚ฐํ™”๊ฐ€ ์ „ํ†ต์  ์ˆ˜์น˜ํ•ด๋ฒ•๋ณด๋‹ค ์ •๊ตํ•˜๊ณ  ์‰ฌ์šฐ๋ฉฐ ์•ˆ์ •์ ์ด๋ผ๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

 

 

ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ 

์—ฐ๊ตฌ์—์„œ๋Š” ๋‘๊ฐ€์ง€ ์ฑŒ๋ฆฐ์ง€๊ฐ€ ๋‚จ์•„์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์ฒซ๋ฒˆ์งธ๋กœ, ์†๋„์ž…๋‹ˆ๋‹ค. FD ๋ฅผ ๊ตฌํ˜„ํ•  ๋•Œ ๋งŽ์€ convolution ์—ฐ์‚ฐ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ์ด๊ฒƒ๋ณด๋‹ค ๋‹ค๋ฅธ ๋จธ์‹ ๋Ÿฌ๋‹ ์ ‘๊ทผ๋ฒ•์ด ํ›จ์”ฌ ๋” ๋น ๋ฅผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๊ณ  ์žˆ๊ณ , pre-trained linear filter ๊ฐ€ ์ˆ˜์‹ญ๋ฐฐ ์ด์ƒ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์ธ๋ฐ”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‘๋ฒˆ์งธ๋Š” ๊ณ ์ฐจ์› ๋ฌธ์ œ์™€ higher dimensional problem ์ž…๋‹ˆ๋‹ค. 2,3์ฐจ์›์—์„œ๋Š” dimension ์ด ์ œ๊ณฑ, ์„ธ์ œ๊ณฑ์œผ๋กœ ์ปค์ง€๋‹ˆ ์—ฐ์‚ฐ overhead ๋ฅผ ๋”์šฑ ์ค„์—ฌ ์ด๋“์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

728x90

'Dev,AI > ๋…ผ๋ฌธ๋ฆฌ๋ทฐ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

๋…ผ๋ฌธ๋ฆฌ๋ทฐ) Distilling the Knowledge in a Neural Network  (0) 2026.02.09
728x90

 

 

 

Batch Normalization

https://arxiv.org/pdf/1502.03167

 

 

Background

batch normalizaion ์€ 2015๋…„์— ์ œ์‹œ๋œ ICS(Internal Covariate Shift) ๋ฌธ์ œ๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ์•„์ด๋””์–ด์ž…๋‹ˆ๋‹ค. covariate shift ๋Š” ํ•™์Šต ๋•Œ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์‹ค์ œ ์ถ”๋ก ์— ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ๊ฐ„์˜ ๋ถ„ํฌ๊ฐ€ ๋‹ค๋ฅด๋ฉด ์ถ”๋ก  ์„ฑ๋Šฅ์— ์•…์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ๋‹ค๋ผ๋Š” ์ฃผ์žฅ์ธ๋ฐ ์ด๊ฒŒ ์‹ ๊ฒฝ๋ง ๋‚ด๋ถ€์—์„œ๋„ ๋ฐœ์ƒํ•  ๊ฒƒ์ด๋‹ค ๋ผ๋Š” ์ฃผ์žฅ์„ ํ•˜๋ฉฐ ์ƒ๊ธด์šฉ์–ด๊ฐ€ Internal Covariate Shift ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ์‚ฌ์ง„์„ ๋ณด๋ฉด ์ง๊ด€์ ์œผ๋กœ ์ดํ•ด๊ฐ€ ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์‹ ๊ฒฝ๋ง์„ ํ†ต๊ณผํ•˜๋ฉด์„œ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๊ฐ€ ๋‹ฌ๋ผ์ง€๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•˜๋Š”๋ฐ

 

ํ†ต๊ณผํ•˜๋Š” ๋ ˆ์ด์–ด ์ˆ˜๊ฐ€ ๋งŽ์•„์งˆ์ˆ˜๋ก ๊ทธ ์ •๋„๊ฐ€ ์‹ฌํ•ด์ง€๊ธฐ ๋•Œ๋ฌธ์— ๋‹น์—ฐํžˆ ์ถ”๋ก ์ด๋‚˜ ํ•™์Šต ์„ฑ๋Šฅ์— ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธธ ํ™•๋ฅ ์ด ํฝ๋‹ˆ๋‹ค. Batch Normalizaion ์€ ๊ธฐ์กด์˜ ์ •๊ทœํ™” ๊ณผ์ •์—์„œ ํ•™์Šต๋ฐ์ดํ„ฐ๋งˆ๋‹ค ๋ถ„ํฌ๊ฐ€ ๋‹ค๋ฅธ๊ฒƒ์„ ๋ฐฐ์น˜๋ณ„๋กœ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ํ™œ์šฉํ•ด ์ •๊ทœํ™”ํ•˜๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

๋‚˜๋™๋นˆ๋‹˜์˜ ์˜์ƒ์„ ์ฐธ๊ณ ํ•˜์—ฌ ์•Œ๊ฒŒ ๋œ batch normalizaion๊ฐ€ ํ˜„์‹ค์—์„œ๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์˜์กด๋„๋ฅผ ์ค„์˜€์œผ๋ฉฐ, ํ•™์Šต์†๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๊ณ , ๋ชจ๋ธ์ด ์ผ๋ฐ˜์ ์œผ๋กœ ์ฆ‰, ํ•™์Šต๋ฐ์ดํ„ฐ์—๋งŒ ํƒœ์Šคํฌ๋ฅผ ์ž˜ ์ฒ˜๋ฆฌํ•˜๋„๋ก ํ•˜๋Š”๊ฒƒ์ด ์•„๋‹Œ ์‹ค์ œ ํ˜„์ƒ์„ ์ž˜ ๋ฐ˜์˜์‹œํ‚ค๊ฒŒ ๋œ ํšจ๊ณผ๊ฐ€ ์žˆ์—ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ ๋…ผ๋ฌธ์—์„œ๋Š” ics ๋ฅผ ๊ฐ์†Œ์‹œํ‚จ๋‹ค๊ณ  ์ฃผ์žฅํ•˜์˜€์œผ๋‚˜ ์‹ค์ œ๋กœ ์ฆ๋ช…ํ•˜์ง€๋Š” ๋ชปํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๊ทธ๊ฒƒ์„ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•œ How Does Batch Normalization Help Optimization?  ๋ผ๋Š” ๋…ผ๋ฌธ์ด ๋‚˜์™”์Šต๋‹ˆ๋‹ค.

https://arxiv.org/pdf/1805.11604

 

 

์šฐ์„  ์ผ๋ฐ˜์ ์œผ๋กœ Batch Norm ์„ ์ ์šฉ์‹œํ‚จ ๋„คํŠธ์›Œํฌ๊ฐ€ Accuracy ๊ฐ€ ๊ฐ€ํŒŒ๋ฅธ ํญ์œผ๋กœ ์˜ฌ๋ผ๊ฐ”๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

 

 

์šฐ์ธก์˜ ํžˆ์Šคํ† ๊ทธ๋žจ์„ ๋ณด๋ฉด ๊ฐ ๋ ˆ์ด์–ด์˜ ๋ถ„ํฌ๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ๋Š”๋ฐ์š” ๊ฐ€์žฅ์šฐ์ธก์˜ Standard + Noisy BatchNorm ์—์„œ Layer3 ๋ถ€ํ„ฐ ๋ถ„ํฌ๊ฐ€ ๊ฐ‘์ž‘์Šค๋Ÿฝ๊ฒŒ ๋ณ€ํ•˜์—ฌ ICS๊ฐ€ ๋ฐœ์ƒํ•˜๊ณ  ์žˆ์Œ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ICS๊ฐ€ ๋ฐœ์ƒํ•˜๊ณ  ์žˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์™ผ์ชฝ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด ํ•™์Šต์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•จ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฆ‰ ์ž„์˜๋กœ Batch Norm Layer ์ดํ›„ ๋ฐ”๋กœ Noise ๋ฅผ ๋„ฃ์–ด covariate shift ๋ฅผ ๋ฐœ์ƒ์‹œ์ผฐ์„ ๋•Œ์—๋„ BatchNorm ์ด ํฌํ•จ๋œ ๋„คํŠธ์›Œํฌ๋Š” ์ผ๋ฐ˜์ ์ธ ๋„คํŠธ์›Œํฌ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•จ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์‹คํ—˜์ ์œผ๋กœ Batch Norm ์ด ICS ๋ฌธ์ œ๋ฅผ ํ•ด์†Œํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ด์ „ ๋…ผ๋ฌธ์˜ ๋ฐ˜๋ฐ•์„ ํ•˜์˜€๊ณ , ์‹ฌ์ง€์–ด ICS๊ฐ€ ํฌ๊ฒŒ ๋ฐœ์ƒํ•จ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  Batch Norm ์ด ์žˆ์œผ๋ฉด ์„ฑ๋Šฅ์ด ์ข‹์•„์ง„๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€ ์‚ฌ๋ก€๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ํ•ด๋‹น๋…ผ๋ฌธ์—์„œ ICS๋ฅผ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ธฐ์šธ๊ธฐ ๊ณ„์‚ฐํ•˜์—ฌ ICS๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ–ˆ๋Š”๋ฐ, ํฌ์ŠคํŒ…์˜ ๋ชฉ์ ๋ณด๋‹ค ๋„ˆ๋ฌด ๋ฒ—์–ด๋‚˜๋Š”๊ฒƒ ๊ฐ™์•„ ๋‹ค๋ฃจ์ง€ ์•Š๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ถ๊ธˆํ•˜์‹ ๋ถ„๊ป˜์„œ๋Š” ๋…ผ๋ฌธ์„ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด ICS ๋ฅผ ํ•ด์†Œํ•˜์ง€ ๋ชปํ–ˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์„ฑ๋Šฅ์ด ์ข‹์€ ์ด์œ ๋Š” ๋ญ˜๊นŒ์š”? ๋…ผ๋ฌธ์—์„œ๋Š” Batch Norm ์˜ Smoothing ํšจ๊ณผ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

 

Loss Landscape ๊ฐ€ ํ›จ์”ฌ ๋” ์˜ˆ์ƒ ๊ฐ€๋Šฅํ•œ ๋ฒ”์œ„๋กœ ํ˜•์„ฑ๋˜๋ฉด์„œ ํ•™์Šตํšจ๊ณผ๊ฐ€ ์ฆ๋Œ€๋œ๋‹ค๊ณ  ๋งํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

Batch Normalization Layer

๋ฏธ๋‹ˆ๋ฐฐ์น˜์˜ ํ‰๊ท ๊ฐ’๊ณผ ๋ถ„์‚ฐ์„ ๊ตฌํ•ด์„œ normalizaion ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฐ๋งˆ์™€ ๋ฒ ํƒ€๋ฅผ ํ™œ์šฉํ•ด ์‹ค์ œ output ์„ ๋‚ด๋Š”๋ฐ์š”, ์—ฌ๊ธฐ์„œ ๊ฐ๋งˆ์™€ ๋ฒ ํƒ€๊ฐ€ ์‹ค์ œ ํ•™์Šต์— ํ™œ์šฉ๋˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค. ํ•™์Šต์ค‘์—๋Š” loss ๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ๋งˆ์™€ ๋ฒ ํƒ€๋ฅผ ์ฐพ์•„๊ฐˆ ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

์ •๊ทœํ™”์—์„œ ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ํŠน์ง•์— ์žˆ์Šต๋‹ˆ๋‹ค. sigmoid๋ฅผ ์˜ˆ์‹œ๋กœ ๋“ค๋ฉด ์–ด๋–ค ๊ตฌ๊ฐ„์—์„œ๋Š” ๋งค์šฐ ์„ ํ˜•์ ์œผ๋กœ ์ž‘๋™ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ๋กœ ์ •๊ทœํ™”ํ•œ 0๊ณผ 1์‚ฌ์ด์˜ ๊ฐ’์—์„œ ์„ ํ˜•์ ์œผ๋กœ ์ž‘๋™ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๊ฐ๋งˆ์™€ ๋ฒ ํƒ€๋ฅผ ํ™œ์šฉํ•ด non-linearity ๋ฅผ ์ง€์ผœ์ฃผ๊ณ , ํ•ด๋‹น ์ •๊ทœํ™” ๋ ˆ์ด์–ด์˜ output ๋„ ์ ์ ˆํ•˜๊ฒŒ ๋‚ด๋ณด๋‚ผ ์ˆ˜ ์žˆ๊ฒŒ๋ฉ๋‹ˆ๋‹ค. ๊ฒฐ๋ก ์€ ๋ ˆ์ด์–ด์˜ ์ž…๋ ฅ์„ ์ •๊ทœํ™”ํ•  ๋•Œ๋Š” linearity ๋ฅผ ์ฃผ์˜ํ•ด์„œ ์ •๊ทœํ™” ํ•ด์•ผํ•œ๋‹ค๋Š” ์  ์ž…๋‹ˆ๋‹ค.

 

Batch Normalization Layer ์—ฐ์‚ฐ๊ตฌ๋ถ„

batch normalization Layer ๋Š” ํ•™์Šตํ• ๋•Œ์™€ ์ถ”๋ก ํ•  ๋•Œ ๋„คํŠธ์›Œํฌ์—์„œ์˜ ์—ญํ• ์ด ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค. ํ•™์Šตํ• ๋•Œ ๊ฐ๋งˆ์™€ ๋ฒ ํƒ€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•™์Šต์‹œ์ผœ์•ผ ํ•˜์ง€๋งŒ ์ถ”๋ก ๋•Œ์—๋Š” ํ•„์š”์—†์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ ํ•ด๋‹น ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ๊ณ ์ •ํ•˜์—ฌ ํ•™์Šต๋œ ํŒŒ๋ผ๋ฏธํ„ฐ์— ์˜ํ•œ ๊ฐ’์ด ๋‚˜์™€์•ผํ•ฉ๋‹ˆ๋‹ค.

 

step 7 ์—์„œ๋ถ€ํ„ฐ๋Š” BN ์ด training ๋ชจ๋“œ๋กœ ๋„คํŠธ์›Œํฌ์— ์žˆ์—ˆ๋˜ ๊ฒƒ์„ inference ๋ชจ๋“œ๋กœ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค. ( ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณ ์ •์„ ํ†ตํ•ด์„œ )

Batch Normalization Data Flow

์ž…๋ ฅ ๋ฐ์ดํ„ฐ (X)

 

$$

X = \begin{bmatrix} [1,\ 2] \ [2,\ 4] \ [3,\ 6] \end{bmatrix}

$$

๋ฐฐ์น˜๋กœ ๋“ค์–ด์˜จ ๋ฐ์ดํ„ฐ

shape: (3, 2)

→ ์ƒ˜ํ”Œ 3๊ฐœ, ๊ฐ ์ƒ˜ํ”Œ์€ 2์ฐจ์› ๋ฒกํ„ฐ


Linear Layer ํ†ต๊ณผ

๊ฐ€์ค‘์น˜์™€ bias๋ฅผ ์ด๋ ‡๊ฒŒ ๋‘๊ฒ ์Šต๋‹ˆ:

$$ [ W = \begin{bmatrix} [1,0], \ [0,1] \end{bmatrix}, \quad b = [0,\ 0] ] $$

์ฆ‰, ์•„๋ฌด ๋ณ€ํ™” ์—†๋Š” ์„ ํ˜•์ธต

$$ [ Z = XW + b = X ] $$

๊ฒฐ๊ณผ:

Z =
[
 [1, 2],
 [2, 4],
 [3, 6]
]

shape ๊ทธ๋Œ€๋กœ (3, 2)


Batch Normalization

1๏ธโƒฃ Batch Mean (μ)

feature๋ณ„ ํ‰๊ท :

$$ μ=[(1+2+3)/3, (2+4+6)/3]=[2, 4] $$


2๏ธโƒฃ Batch Variance (σ²)

$$ σ2=[((1−2)2+(2−2)2+(3−2)2)/3,((2−4)2+(4−4)2+(6−4)2)/3]=[2/3, 8/3] $$


3๏ธโƒฃ Normalize (xฬ‚)

$$ \hat{x} = \frac{x - \mu}{\sqrt{\sigma^2 + \epsilon}} (ε ๋ฌด์‹œํ•œ๋‹ค๊ณ  ๊ฐ€์ •) $$

์ƒ˜ํ”Œ๋ณ„ ๊ณ„์‚ฐ

์ฒซ ๋ฒˆ์งธ ์ƒ˜ํ”Œ

$$ [1,2] → [-1/\sqrt{2/3},\ -2/\sqrt{8/3}] ≈ [-1.22,\ -1.22] $$

๋‘ ๋ฒˆ์งธ

$$ [2,4] → [0,\ 0] $$

์„ธ ๋ฒˆ์งธ

$$ [3,6] → [1.22,\ 1.22] $$

๊ฒฐ๊ณผ:

X_hat =
[
 [-1.22, -1.22],
 [ 0.00,  0.00],
 [ 1.22,  1.22]
]

๊ทธ๋ฆฌ๊ณ  ํ•ด๋‹น๊ฐ’์— gamma ์™€ betta ์—ฐ์‚ฐ์„ ํ†ตํ•ด Layer ๋ฅผ ํ†ต๊ณผ์‹œํ‚ต๋‹ˆ๋‹ค. ์ด์ฒ˜๋Ÿผ batch norm ์€ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜์˜ ํ”ผ์ฒ˜๋ณ„๋กœ ํ‰๊ท , ๋ถ„์‚ฐ์„ ๊ตฌํ•ด์„œ ์›๋ณธ ๋ฐ์ดํ„ฐ์— ๋Œ€์ž…์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ Normalizaion ์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Layer Normalization

arxiv.org

Layer Normalization ์€ Batch Norm ์ด RNN ์— ์ ์šฉํ•˜๊ธฐ ์–ด๋ ค์šด ๋ฌธ์ œ์ ์„ ํ•ด์†Œํ•˜๊ธฐ ์œ„ํ•ด ์ œ์‹œ๋œ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. RNN์€ ์‹œ๊ฐ„๋‹จ์œ„๋กœ ๊ณ„์‚ฐ์„ ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ฏธ๋‹ˆ๋ฐฐ์น˜์˜ ๊ฐ ํ”ผ์ณ๋งˆ๋‹ค ํ†ต๊ณ„๋ฅผ ์ด์šฉํ•ด ์ •๊ทœํ™”ํ•˜๋Š” BN ์˜ ๊ฒฝ์šฐ์—๋Š” ํ•ด๋‹น ์ŠคํŠธ๋ฆผ์˜ ๋งฅ๋ฝ์„ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค.

๊ฐ€์žฅ ํฐ ๋ฌธ์ œ๋Š” RNN ์ด๋‚˜ NLP, ํ˜น์€ ์Œ์„ฑ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ๋Š” ๋ฐฐ์น˜๋งˆ๋‹ค ๊ธธ์ด๊ฐ€ ๋‹ค๋ฆ…๋‹ˆ๋‹ค.

์ƒ˜ํ”Œ 1: "๋‚˜๋Š” ๋ฐฅ์„ ๋จน์—ˆ๋‹ค"        (๊ธธ์ด 4)
์ƒ˜ํ”Œ 2: "์˜ค๋Š˜"                    (๊ธธ์ด 1)
์ƒ˜ํ”Œ 3: "์–ด์ œ ๋น„๊ฐ€ ์™€์„œ ์šฐ์‚ฐ์„ ์ผ๋‹ค" (๊ธธ์ด 6)

์ด๊ฒƒ์„ BN ์„ ํ™œ์šฉํ•œ Layer output ์„ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ์ƒ˜ํ”Œ2 ์˜ 2,3 ์ƒ˜ํ”Œ1์˜ 3,4 ๊ฐ€ 0์ด ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ์˜ ์˜๋ฏธ๋ฅผ ์ถฉ๋ถ„ํžˆ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋Š” ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์—๋„ ๊ทธ๋Œ€๋กœ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€๋‚˜ ์„ฑ์ ํ†ต๊ณ„(๊ตญ์–ด๋Š” ๊ตญ์–ด๋ผ๋ฆฌ, ์ˆ˜ํ•™์€ ์ˆ˜ํ•™๋ผ๋ฆฌ) ์™€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹ˆ๋ผ ํ”ผ์ณํ•˜๋‚˜๊ฐ€ ๋‹ค๋ฅธ ํ”ผ์ณ๋‚˜ ๋ฐ์ดํ„ฐ์—๋„ ์˜ํ–ฅ์„ ์ฃผ๋Š”๊ฒฝ์šฐ๋Š” Batch ์‚ฌ์ด์ฆˆ์— ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๊ณ  ๋ฐ์ดํ„ฐ์˜ ์˜๋ฏธ๋ฅผ ์ž˜ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ๋Š” LN ์ด ์„ฑ๋Šฅ์ด ์ข‹๋‹ค๊ณ  ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค.

 

BN ๊ณผ์˜ ์ฐจ์ด์ 

Batch Normalization์€ ๋ฏธ๋‹ˆ๋ฐฐ์น˜ ๋‹จ์œ„๋กœ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•˜์—ฌ ์ •๊ทœํ™”๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด **Layer Normalization(LN)**์€ ์ด๋ฆ„ ๊ทธ๋Œ€๋กœ ๋ ˆ์ด์–ด ๋‹จ์œ„, ์ •ํ™•ํžˆ๋Š” ํ•˜๋‚˜์˜ ์ƒ˜ํ”Œ ๋‚ด๋ถ€ feature๋“ค์— ๋Œ€ํ•ด์„œ๋งŒ ์ •๊ทœํ™”๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์ •๊ทœํ™”์˜ ๊ธฐ์ค€์ด ์™„์ „ํžˆ ๋‹ค๋ฆ…๋‹ˆ๋‹ค.

  • Batch Normalization
    • ํ‰๊ท , ๋ถ„์‚ฐ ๊ณ„์‚ฐ ์ถ•: batch ๋ฐฉํ–ฅ
    • ๊ฐ™์€ feature๋ฅผ ๊ฐ€์ง„ ์—ฌ๋Ÿฌ ์ƒ˜ํ”Œ์„ ํ•จ๊ป˜ ์‚ฌ์šฉ
  • Layer Normalization
    • ํ‰๊ท , ๋ถ„์‚ฐ ๊ณ„์‚ฐ ์ถ•: feature ๋ฐฉํ–ฅ
    • ํ•˜๋‚˜์˜ ์ƒ˜ํ”Œ ์•ˆ์—์„œ๋งŒ ๊ณ„์‚ฐ

ํ•˜๋‚˜์˜ ์ƒ˜ํ”Œ x = [xโ‚, xโ‚‚, ..., xโ‚]์— ๋Œ€ํ•ด:

$$ \mu = \frac{1}{d} \sum_{i=1}^{d} x_i $$

$$ \sigma^2 = \frac{1}{d} \sum_{i=1}^{d} (x_i - \mu)^2 $$

$$ \hat{x}_i = \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} $$

๊ทธ๋ฆฌ๊ณ  Batch Normalization๊ณผ ๋™์ผํ•˜๊ฒŒ scale, shift ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค:

$$ y_i = \gamma_i \hat{x}_i + \beta_i $$

์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ์ ์€ γ, β๋Š” feature ์ฐจ์›์— ๋Œ€ํ•ด์„œ๋งŒ ์กด์žฌํ•˜๋ฉฐ batch ํฌ๊ธฐ์™€ ๋ฌด๊ด€ํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์œ„์˜ ์ˆ˜์‹๋Œ€๋กœ ๊ฐ™์€ ์ƒ˜ํ”Œ์„ ๊ฐ€์ง€๊ณ  ๋ ˆ์ด์–ด๋ฅผ ํ†ต๊ณผํ•˜๋Š” ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Layer Normalization Data Flow

์ž…๋ ฅ ๋ฐ์ดํ„ฐ (X)

$$ X = \begin{bmatrix} [1,\ 2] \\ [2,\ 4] \\ [3,\ 6] \end{bmatrix} $$

shape: (3, 2)

→ ์ƒ˜ํ”Œ 3๊ฐœ, ๊ฐ ์ƒ˜ํ”Œ์€ 2์ฐจ์› ๋ฒกํ„ฐ


Linear Layer ํ†ต๊ณผ

๊ฐ€์ค‘์น˜์™€ bias๋Š” ์ด์ „๊ณผ ๋™์ผํ•˜๊ฒŒ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

$$ Z = X $$


Layer Normalization ์ ์šฉ

Layer Normalization์€ ๊ฐ ์ƒ˜ํ”Œ๋งˆ๋‹ค ๋…๋ฆฝ์ ์œผ๋กœ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

์ฒซ ๋ฒˆ์งธ ์ƒ˜ํ”Œ [1, 2]

$$ \mu = (1 + 2) / 2 = 1.5 $$

$$ \sigma^2 = ((1 - 1.5)^2 + (2 - 1.5)^2) / 2 = 0.25 $$

์ •๊ทœํ™” ๊ฒฐ๊ณผ:

$$ [1, 2] \rightarrow [-1, 1] $$


๋‘ ๋ฒˆ์งธ ์ƒ˜ํ”Œ [2, 4]

$$ \mu = 3,\quad \sigma^2 = 1 $$

์ •๊ทœํ™” ๊ฒฐ๊ณผ:

$$ [2, 4] \rightarrow [-1, 1] $$


์„ธ ๋ฒˆ์งธ ์ƒ˜ํ”Œ [3, 6]

$$ \mu = 4.5,\quad \sigma^2 = 2.25 $$

์ •๊ทœํ™” ๊ฒฐ๊ณผ:

$$ [3, 6] \rightarrow [-1, 1] $$


Layer Normalization ๊ฒฐ๊ณผ

X_hat =
[
 [-1,  1],
 [-1,  1],
 [-1,  1]
]

Transformer ๊ตฌ์กฐ์—์„œ Layer Normalization ์ด Batch Normalization ๋ณด๋‹ค ์ ํ•ฉํ•œ ์ด์œ 

1. ์‹œํ€€์Šค ๊ธธ์ด ๊ฐ€๋ณ€์„ฑ๊ณผ Masking ๋ฌธ์ œ

Transformer์˜ Self-Attention์€ ๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค๋ฅผ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ž…๋ ฅํ˜•ํƒœ๋Š” ๊ฐ ๋ฌธ์žฅ๋งˆ๋‹ค ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅด๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์งง์€ ๋ฌธ์žฅ์—๋Š” padding์„ ์ถ”๊ฐ€ํ•˜ attention mask๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Batch Normalization์„ ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ์— ์ ์šฉํ•˜๋ฉด ์‹ฌ๊ฐํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. BN์€ ๋ฐฐ์น˜์™€ ์‹œํ€€์Šค ์ฐจ์› ์ „์ฒด์— ๊ฑธ์ณ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•˜๋Š”๋ฐ ์œ„์—์„œ ๋ดค๋˜ ๊ฒƒ ์ฒ˜๋Ÿผ ์˜๋ฏธ ์—†๋Š” padding ํ† ํฐ์˜ 0 ๋ฒกํ„ฐ๊ฐ€ ํ†ต๊ณ„์— ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ๋ฌธ์žฅ ๊ธธ์ด์— ๋”ฐ๋ผ ์ •๊ทœํ™” ํ†ต๊ณ„๊ฐ€ ์™œ๊ณก๋˜๊ณ , ๊ฐ™์€ ๋‚ด์šฉ์˜ ๋ฌธ์žฅ์ด๋ผ๋„ padding์˜ ์–‘์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ์ •๊ทœํ™”๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

๋ฐ˜๋ฉด Layer Normalization์€ ๊ฐ ํ† ํฐ์˜ feature ์ฐจ์›์— ๋Œ€ํ•ด์„œ๋งŒ ์ •๊ทœํ™”๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ํ•˜๋‚˜์˜ ํ† ํฐ ๋‚ด๋ถ€์—์„œ๋งŒ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— padding ํ† ํฐ์ด๋‚˜ ์‹œํ€€์Šค ๊ธธ์ด๊ฐ€ ์ •๊ทœํ™” ํ†ต๊ณ„์— ์ „ํ˜€ ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ฐ ํ† ํฐ์€ ๋…๋ฆฝ์ ์œผ๋กœ ์ •๊ทœํ™”๋˜๋ฏ€๋กœ ๋ฐ์ดํ„ฐ์˜ ์˜๋ฏธ๊ฐ€ ์ถฉ์‹คํžˆ ๋ฐ˜์˜๋˜๊ณ  ๋ฐฐ์น˜๋‚˜ ์‹œํ€€์Šค ๊ตฌ์กฐ์™€ ๋ฌด๊ด€ํ•˜๊ฒŒ ์ผ๊ด€๋œ ์ •๊ทœํ™”๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

2. Autoregressive Decoding๊ณผ ๋ฐฐ์น˜ ํฌ๊ธฐ ๋ถˆ์ผ์น˜

Transformer Decoder๋Š” ์ถ”๋ก  ์‹œ ๋ฏธ๋ž˜์˜ ์ •๋ณด๋ฅผ ์ฐธ์กฐํ•˜์ง€ ๋ชปํ•˜๋„๋ก autoregressive ๋ฐฉ์‹์œผ๋กœ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์ด์ „์— ์ƒ์„ฑํ•œ ํ† ํฐ์„ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์Œ ํ† ํฐ์„ ํ•˜๋‚˜์”ฉ ์ˆœ์ฐจ์ ์œผ๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ ๋ฐฐ์น˜ ํฌ๊ธฐ๊ฐ€ 1์ด ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” Layer Normalization ๋…ผ๋ฌธ์—์„œ ๋ณด์—ฌ์ค€๊ฒƒ์ฒ˜๋Ÿผ Batch Normalization์— ์น˜๋ช…์ ์ธ ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•ฉ๋‹ˆ๋‹ค.

Layer Normalization์€ ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ๋ฌด๊ด€ํ•˜๊ฒŒ ์•ˆ์ •์ ์œผ๋กœ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ํฌ๊ธฐ๊ฐ€ 1์ด๋“  32๋“  ์ •๊ทœํ™” ๊ฒฐ๊ณผ๋Š” ์ผ๊ด€๋˜๋ฉฐ, ํ•™์Šต ์‹œ ๊ด€์ฐฐํ•œ ์„ฑ๋Šฅ์ด ์ถ”๋ก  ์‹œ์—๋„ ๊ทธ๋Œ€๋กœ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” Transformer Decoder์˜ ์ƒ์„ฑ ํ’ˆ์งˆ์— ๊ฒฐ์ •์ ์œผ๋กœ ์ค‘์š”ํ•œ ํŠน์„ฑ์ž…๋‹ˆ๋‹ค.

3. Residual Connection๊ณผ์˜ ๊ตฌ์กฐ์  ๋ถˆ์ผ์น˜

Transformer์˜ ๊ฐ ๋ธ”๋ก์€ residual connection์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค: y = x + Sublayer(LN(x)). ์ด ๊ตฌ์กฐ๊ฐ€ ์ค‘์š”ํ•œ ์ด์œ ๋Š” gradient์˜ ํ๋ฆ„ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์—ญ์ „ํŒŒ ์‹œ ∂y/∂x = 1 + ∂Sublayer/∂x ๊ฐ€ ๋˜์–ด, gradient๊ฐ€ ํ•ญ์ƒ ์ง์ ‘ ํ๋ฅผ ์ˆ˜ ์žˆ๋Š” ๊ฒฝ๋กœ(identity mapping)๊ฐ€ ๋ณด์žฅ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊นŠ์€ ๋„คํŠธ์›Œํฌ์—์„œ gradient vanishing ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ํ•ต์‹ฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ž…๋‹ˆ๋‹ค.

๋งŒ์•ฝ Batch Normalization์„ residual path์— ์‚ฌ์šฉํ•˜๋ฉด, BN์˜ ์ถœ๋ ฅ์ด ๋ฐฐ์น˜ ํ†ต๊ณ„์— ์˜์กดํ•˜๊ธฐ ๋•Œ๋ฌธ์— residual path์— batch-dependent noise๊ฐ€ ์ฃผ์ž…๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” gradient flow๋ฅผ ๋ถˆ์•ˆ์ •ํ•˜๊ฒŒ ๋งŒ๋“ค๊ณ , ํŠนํžˆ ๊นŠ์€ Transformer์—์„œ๋Š” gradient ํญ๋ฐœ์ด๋‚˜ ์†Œ์‹ค์„ ์ผ์œผํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ Post-LN Transformer(residual ํ›„์— LN์„ ์ ์šฉ)๋Š” ๋ ˆ์ด์–ด๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก ํ•™์Šต์ด ๋ถˆ์•ˆ์ •ํ•ด์ง€๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์œผ๋ฉฐ, Pre-LN Transformer(residual ์ „์— LN์„ ์ ์šฉ)๊ฐ€ ๋” ์•ˆ์ •์ ์ธ ํ•™์Šต์„ ๋ณด์ž…๋‹ˆ๋‹ค. BN์€ ์ด๋Ÿฌํ•œ residual connection์˜ ํŠน์„ฑ๊ณผ ๊ทผ๋ณธ์ ์œผ๋กœ ์ถฉ๋Œํ•ฉ๋‹ˆ๋‹ค.

Layer Normalization์€ ๊ฐ ์ƒ˜ํ”Œ์„ ๋…๋ฆฝ์ ์œผ๋กœ ์ •๊ทœํ™”ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฐฐ์น˜์— ์˜์กดํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ residual path์˜ gradient flow๋ฅผ ๋ฐฉํ•ดํ•˜์ง€ ์•Š์œผ๋ฉฐ, ์ˆ˜์‹ญ ๊ฐœ์˜ ๋ ˆ์ด์–ด๋กœ ์ด๋ฃจ์–ด์ง„ ๊นŠ์€ Transformer์—์„œ๋„ ์•ˆ์ •์ ์ธ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ์  ์กฐํ™”๊ฐ€ Transformer๊ฐ€ Layer Normalization์„ ์‚ฌ์šฉํ•˜๋Š” ๋˜ ๋‹ค๋ฅธ ์ค‘์š”ํ•œ ์ด์œ ์ž…๋‹ˆ๋‹ค.

728x90
728x90

๋ฐฐ๊ฒฝ

 

๋Œ€๊ณ ๊ฐ ์ฑ—๋ด‡ ๊ฐœ๋ฐœ ๋‹น์‹œ hallucination ์— ๊ด€ํ•œ ๊ธฐ์ค€์ด ์—„๊ฒฉํ•ด ๋ชจ๋ฅด๋Š” ๋‹ต๋ณ€์€ ๋ชจ๋ฅธ๋‹ค๊ณ  ๋‹ต๋ณ€ํ•˜๊ณ  ์ƒ๋‹ด์› ์—ฐ๊ฒฐ๋กœ ๋Œ๋ฆฌ๋Š” ๋กœ์ง์œผ๋กœ ์„ค๊ณ„๋˜์–ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

๋•Œ๋ฌธ์— ๊ณ ๊ฐ์ด chain ๊ตฌ์กฐ์—์„œ ์กฐ๊ธˆ๋งŒ ์˜ˆ์ƒ์— ์–ด๊ธ‹๋‚˜๋Š” ํ–‰๋™์„ ํ•˜๋ฉด ๋‹ต๋ณ€์„ ํšŒํ”ผ(๋ชจ๋ฅด๊ฒ ๋‹ค ๋‹ต๋ณ€ ํ›„ ์ƒ๋‹ด์› ์—ฐ๊ฒฐ) ํ•ด ์ƒ๋‹ด ๋งŒ์กฑ๋„๊ฐ€ ๋–จ์–ด์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ๋Š”๋ฐ์š”, ๊ทธ๋ž˜์„œ ์งˆ๋ฌธ์— ์œ ์—ฐํ•˜๊ฒŒ ๋Œ€์‘ํ•˜๊ธฐ ์œ„ํ•ด ์ฒด์ธ๊ตฌ์กฐ์—์„œ ReAct agent ๋กœ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ํ•˜๊ธฐ๋กœ ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ฒด์ธ์— ๋„๋‹ฌํ•  ๋•Œ์—๋Š” ์ •ํ•ด์ง„ DTO ๋ฅผ ์ง€์ผœ์•ผ ํ–ˆ๋Š”๋ฐ ์ฒด์ธ์ด ์žˆ๋Š” Tool ๊นŒ์ง€ ๋„๋‹ฌํ•  ๋•Œ์—๋Š” ์ด๋ฏธ LLM ์— ์˜ํ•ด DTO ๊ฐ€ ๋ญ‰๊ฐœ์ ธ Tool ์— ์ธ์ž๋ฅผ ์ „๋‹ฌํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ ํ”„๋กฌํ”„ํŠธ๋กœ๋งŒ ์ถœ๋ ฅ์„ ์ œ์–ดํ–ˆ์—ˆ๋Š”๋ฐ, ๋‹ต๋ณ€์„ ์ž˜ํ•˜๋Š” ๊ฒƒ ์ฒ˜๋Ÿผ ๋ณด์˜€์ง€๋งŒ Langsmith ๋กœ agent tool calling์„ ์ถ”์ ํ•œ ๊ฒฐ๊ณผ ๋‚ด๋ถ€์ ์œผ๋กœ๋Š” ์ผ๋ถ€ ๋ฐ์ดํ„ฐ๋“ค์„ ๋ˆ„๋ฝ๋˜๊ณ  calling ์„ ๋ฐ˜๋ณต ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์•„๋งˆ ๊ฐ•๋ ฅํ•œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์“ฐ๋ฉด ์ข€ ๋‚˜์•„์กŒ๊ฒ ์ง€๋งŒ ๊ฒฐ๊ณผ์ ์œผ๋กœ ์ด ๋ฌธ์ œ๋Š” ์‘๋‹ต์‹œ๊ฐ„ ์ง€์—ฐ๊ณผ, ํ† ํฐ ๋น„์šฉ ์ฆ๊ฐ€๋กœ ์ด์–ด์กŒ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ณ ๊ฐ๋ฐ˜์‘๊ณผ ์‹ค์ œ ๋น„์ง€๋‹ˆ์Šค ๋ฌธ์ œํ•ด๊ฒฐ์—๋Š” ๋ฌธ์ œ๊ฐ€ ์—†์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์šฐ์„ ์ˆœ์œ„์— ๋ฐ€๋ ค ๊ธฐ์ˆ ๋ถ€์ฑ„๋กœ ๋‚จ๊ฒŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ง€๊ธˆ ํšŒ์‚ฌ์— ์˜ค๊ฒŒ ๋˜๋ฉด์„œ structured output ์— ๊ด€ํ•œ ๊ฐœ๋…์„ ์ ‘ํ•˜๊ฒŒ ๋˜๊ณ  ์‹ ๋ขฐ๊ฐ€๋Šฅํ•œ์ง€, ์‹ค์ œ ๋Œ€๊ณ ๊ฐ ์—…๋ฌด์—์„œ ์‚ฌ์šฉํ•  ๋งŒํผ ์‹ ๋ขฐ๋„ ์žˆ๋Š”์ง€ ํ™•์ธํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

Structured Output ์˜ ์ž‘๋™์›๋ฆฌ

๋จผ์ € structured output ์€ LLM ์˜ output ์„ Json ์ด๋‚˜ Pydantic ํ˜น์€ dataclass ๊ฐ™์€ ํ˜•ํƒœ๋กœ ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ์—๋Ÿฌ์ฒ˜๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•œ๋ฐ, ๋ชจ๋ธ์ด ๋ฒ”์œ„๋ฅผ ์–ด๊ธ‹๋‚˜๊ฒŒ ์‘๋‹ตํ•˜๊ฑฐ๋‚˜ ์ž๋ฃŒํ˜•์„ ํ‹€๋ฆฌ๊ฒŒ ๋งค์นญํ•œ๋‹ค๋ฉด validation error ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์–ด ์—๋Ÿฌ๋ฉ”์‹œ์ง€ ์œ ๋„๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

์ด๊ฒƒ์„ ์ž˜ ํ™œ์šฉํ•˜๋ฉด ํŠน์ • ๊ฒฝ์šฐ์—๋งŒ (format ์ด ๋งž์ง€ ์•Š๋Š” ๊ฒฝ์šฐ, ํ•„๋“œ์— ๊ฐ’์ด ์ž˜๋ชป ๋“ค์–ด๊ฐ€๋Š” ๊ฒฝ์šฐ) Error๋ฅผ ๋ฐœ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” ์žฌ์‹œ๋„๋ฅผ ํ•˜๊ฒŒ๋˜๊ณ  ์žฌ์‹œ๋„ ํ•˜๋Š” ๊ฒฝ์šฐ ๋Œ€๋ถ€๋ถ„ ์ž˜ ๋งค์นญ์ด ๋ฉ๋‹ˆ๋‹ค. ๊ฐ€์žฅ ์น˜๋ช…์ ์ธ ๊ฒƒ์€ structure ์— ๋งž๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ๊ธฐ๋Š” ํ•˜์ง€๋งŒ, ๊ทธ ๊ฐ’์ด ์‹ค์ œ๋กœ ๋งž๋Š”์ง€๋Š” ๋ณด์žฅํ•˜์ง€ ์•Š๋Š” ๋‹ค๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

์ž‘๋™ ์ˆœ์„œ

  1. ๋ชจ๋ธ๊ณผ ์Šคํ‚ค๋งˆ๋ฅผ ์ž…๋ ฅ๋ฐ›๋Š”๋‹ค.
  2. langchain ๋‚ด๋ถ€์—์„œ ์ „๋žต์„ ์„ ํƒํ•จ
    1. toolcalling strategy : ๋ชจ๋ธ์ด structured output ์ง€์›ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ
    2. langchain ์ด ๋„๊ตฌํ˜ธ์ถœ JSON ํ˜•ํƒœ๋กœ ๋ฐ˜ํ™˜ํ•˜๊ณ  langchain ์—์„œ ํŒŒ์‹ฑํ•ด์„œ ์Šคํ‚ค๋งˆ์— ๋งž๋Š” ๊ฐœ์ฒด๋กœ ๋ณ€ํ™˜ํ•˜๋Š”๋ฐ ๋„๊ตฌ ํ˜ธ์ถœ ์ž์ฒด๊ฐ€ ํ† ํฐ์„ ๋” ์“ฐ๊ธฐ๋•Œ๋ฌธ์— ๋น„์šฉ์ฆ๊ฐ€/์‘๋‹ต์‹œ๊ฐ„ ์ฆ๊ฐ€๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค  https://platform.openai.com/docs/guides/structured-outputs
  3. provider strategy : ๋ชจ๋ธ์ด structured output ์ง€์›ํ•˜๋Š” ๊ฒฝ์šฐ
  4. langchain or agent ์‘๋‹ต ์ƒ์„ฑ
  5. ๊ฒฐ๊ณผ๋ฌผ ์œ ํšจ์„ฑ ๊ฒ€์ฆ : ์Šคํ‚ค๋งˆ์— ๋งž๊ฒŒ ํŒŒ์‹ฑ์ด ๋˜์—ˆ๋Š”์ง€ Pydantic ์ด๋‚˜ json ๊ธฐ๋ฐ˜ ํŒŒ์„œ ์‚ฌ
  6. ํŒŒ์‹ฑ ์„ฑ๊ณตํ•˜๋ฉด structured_response ์— ๋„ฃ์–ด์„œ ์ตœ์ข…๊ฒฐ๊ณผ ๋ฐ˜ํ™˜

 

 

 

์Šคํ‚ค๋งˆ์ž…๋ ฅ / ์ „๋žต์„ ํƒ

์Šคํ‚ค๋งˆ๋ฅผ ์ž…๋ ฅ๋ฐ›๋Š” ๋ถ€๋ถ„๋ถ€ํ„ฐ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์˜ ์˜ˆ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

Pydantic ์Šคํ‚ค๋งˆ๋กœ ์˜ˆ์‹œ๋ฅผ ์ž‘์„ฑํ–ˆ๋Š”๋ฐ with_structured_output ๋ฉ”์†Œ๋“œ์˜ ์ธ์ž๋กœ Pydantic ์ด ์Šคํ‚ค๋งˆ๋กœ ๋„˜์–ด๊ฐ€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

class ReviewSummary(BaseModel):
    title: str = Field(..., description="๋ฆฌ๋ทฐ ์ œ๋ชฉ")
    sentiment: str = Field(..., description="๊ธ์ •/๋ถ€์ •/์ค‘๋ฆฝ ์ค‘ ํ•˜๋‚˜")
    score: float = Field(..., description="0~1 ์‚ฌ์ด์˜ ๊ฐ์ • ์ ์ˆ˜")
    
from langchain_openai import ChatOpenAI

# OpenAI API ๋˜๋Š” vLLM OpenAI ์„œ๋ฒ„ URL๋กœ ์ž๋™ ์—ฐ๊ฒฐ๋จ
model = ChatOpenAI(
    model="gpt-4o-mini",  # ์•„๋ฌด ๋ชจ๋ธ ๊ฐ€๋Šฅ
    temperature=0
)

structured_model = model.with_structured_output(ReviewSummary)

result = structured_model.invoke(user_input)

print(result)
print(type(result))
------------
title='์˜ํ™” ๋ฆฌ๋ทฐ ์š”์•ฝ'
sentiment='๋ถ€์ •'
score=0.15
<class '__main__.ReviewSummary'>
------------

structured output ์ง€์›ํ•˜๋Š” ์ผ๋ถ€ ๋ชจ๋ธ๋“ค์€ ์•„๋ž˜์ฒ˜๋Ÿผ ๋ฒค๋”์‚ฌ๊ฐ€ ์ง€์›ํ•˜๋Š” ์Šคํ‚ค๋งˆ์— ๋งž๊ฒŒ ๋ณ€ํ™˜ํ•˜๋Š” ๋„๊ตฌ๋งŒ์„ bind ํ•œ ์ฑ„๋กœ ๋๋‚˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

class ChatAnthropic(BaseChatModel):
#----------์ค‘๋žต----------
	def with_structured_output():
	#----------์ค‘๋žต----------
        if method == "function_calling":
            formatted_tool = **convert_to_anthropic_tool(schema)**
            tool_name = formatted_tool["name"]
            if self.thinking is not None and self.thinking.get("type") == "enabled":
                llm = self._get_llm_for_structured_output_when_thinking_is_enabled(
                    schema,
                    formatted_tool,
                )
            else:
                llm = self.bind_tools(
                    [schema],
                    tool_choice=tool_name,
                    ls_structured_output_format={
                        "kwargs": {"method": "function_calling"},
                        "schema": formatted_tool,
                    },
                )

 

@dataclass(init=False)
class ProviderStrategy(Generic[SchemaT]):
    """Use the model provider's native structured output method."""

    schema: type[SchemaT]
    """Schema for native mode."""

    schema_spec: _SchemaSpec[SchemaT]
    """Schema spec for native mode."""

    def __init__(
        self,
        schema: type[SchemaT],
    ) -> None:
        """Initialize ProviderStrategy with schema."""
        self.schema = schema
        self.schema_spec = _SchemaSpec(schema)

 

๊ทธ๋ฆฌ๊ณ  Provider ์— ์—†๋Š” ๋ชจ๋ธ์€ ToolStrategy ๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋Š”๋ฐ vllm ๊ฐ™์€ ๋กœ์ปฌ ์„œ๋น™ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ ์ž‘๋™์‹œํ‚ค๋Š” ๋ชจ๋ธ๋“ค์ด ๋Œ€์ฒด๋กœ ๊ทธ๋Ÿฌํ•ฉ๋‹ˆ๋‹ค.

class ChatOllama(BaseChatModel):
   #---์ค‘๋žต----
   def with_structurd_output():
	   #---์ค‘๋žต----
     if is_pydantic_schema:
            schema = cast("TypeBaseModel", schema)
            if issubclass(schema, BaseModelV1):
                response_format = schema.schema()
            else:
                response_format = schema.model_json_schema()
            llm = self.bind(
                format=response_format,
                ls_structured_output_format={
                    "kwargs": {"method": method},
                    "schema": schema,
                },
            )
@dataclass(init=False)
class ToolStrategy(Generic[SchemaT]):
    """Use a tool calling strategy for model responses."""

    schema: type[SchemaT]
    """Schema for the tool calls."""

    schema_specs: list[_SchemaSpec[SchemaT]]
    """Schema specs for the tool calls."""

    tool_message_content: str | None
    """The content of the tool message to be returned when the model calls
    an artificial structured output tool."""

    handle_errors: (
        bool | str | type[Exception] | tuple[type[Exception], ...] | Callable[[Exception], str]
    )
    

ToolStrategy ๋Š” bind ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋Ÿฌ๋„ˆ๋ธ” ๊ฐ์ฒด์— ์ ‘๊ทผํ•˜๊ณ  ๊ทธ ์ง€์ ์— ํˆด์ฝœ๋ง์„ ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์‚ฌ๋žŒ์ด ๊ฐœ์ž…ํ•˜์—ฌ ๋ฒค๋”์‚ฌ์˜ ํˆด์„ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์ „๋žต์„ ํƒ์˜ ๊ฒฐ์ •์ ์œผ๋กœ ํฐ ์ฐจ์ด๋Š” ๊ฒฐ๊ตญ with_structured_output ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•  ๋•Œ ๊ธฐ๋ณธ์œผ๋กœ ์„ ํƒ๋˜๋Š” method ๊ฐ€ ๋‹ค๋ฅด๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

๋ชจ๋ธ์ด structured output ์ง€์›ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ

def with_structured_output(
        self,
        schema: dict | type,
        *,
        method: Literal["function_calling", "json_mode", "json_schema"] = "json_schema",
        include_raw: bool = False,
        **kwargs: Any,
    ) -> Runnable[LanguageModelInput, dict | BaseModel]:
        r"""Model wrapper that returns outputs formatted to match the

 

structured output ์ง€์›ํ•˜๋Š” ๊ฒฝ์šฐ

def with_structured_output(
        self,
        schema: dict | type,
        *,
        include_raw: bool = False,
        method: Literal["function_calling", "json_schema"] = "function_calling",
        **kwargs: Any,
    ) -> Runnable[LanguageModelInput, dict | BaseModel]:
        """Model wrapper that returns outputs formatted to match the given schema.

 

structured output ์„ ์ง€์›ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” method ๊ฐ€ function_calling ์œผ๋กœ api ์ œ๊ณต ๋ฒค๋”์‚ฌ์˜ function calling ํ˜•ํƒœ๋กœ ์ฒ˜๋ฆฌํ•˜๊ณ 

if method == "function_calling":
    formatted_tool = convert_to_anthropic_tool(schema)
    tool_name = formatted_tool["name"]
    if self.thinking is not None and self.thinking.get("type") == "enabled":
        llm = self._get_llm_for_structured_output_when_thinking_is_enabled(
            schema,
            formatted_tool,
        )
    else:
        llm = self.bind_tools(
            [schema],
            tool_choice=tool_name,
            ls_structured_output_format={
                "kwargs": {"method": "function_calling"},
                "schema": formatted_tool,
            },
        )

bind_tools ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฐ˜๋Œ€์˜ ๊ฒฝ์šฐ์—๋Š” json_schema ๊ฐ€ ๊ธฐ๋ณธ ์„ ํƒ๋˜์–ด bind ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ด์„œ tool calling ํ˜•ํƒœ๊ฐ€ ์•„๋‹ˆ๋ผ runnable sequence ์— ์ƒˆ๋กœ์šด ๊ฐ์ฒด๋ฅผ ๋งŒ๋“ค์–ด ํ˜ธ์ถœ ์˜ต์…˜์„ ์žฌ์ •์˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

elif method == "json_schema":
            if schema is None:
                msg = (
                    "schema must be specified when method is not 'json_mode'. "
                    "Received None."
                )
                raise ValueError(msg)
            if is_pydantic_schema:
                schema = cast("TypeBaseModel", schema)
                if issubclass(schema, BaseModelV1):
                    response_format = schema.schema()
                else:
                    response_format = schema.model_json_schema()
                llm = self.bind(
                    format=response_format,
                    ls_structured_output_format={
                        "kwargs": {"method": method},
                        "schema": schema,
                    },
                )
                output_parser = PydanticOutputParser(pydantic_object=schema)  # type: ignore[arg-type]

##bind example
"""
        Example:
            ```python
            from langchain_ollama import ChatOllama
            from langchain_core.output_parsers import StrOutputParser

            model = ChatOllama(model="llama3.1")

            # Without bind
            chain = model | StrOutputParser()

            chain.invoke("Repeat quoted words exactly: 'One two three four five.'")
            # Output is 'One two three four five.'

            # With bind
            chain = model.bind(stop=["three"]) | StrOutputParser()

            chain.invoke("Repeat quoted words exactly: 'One two three four five.'")
            # Output is 'One two'
            
"""

์ž์ฒด์ ์œผ๋กœ response_format ์„ ์„ธํŒ…ํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์Šคํ‚ค๋งˆ๋ฅผ ์ž…๋ ฅ๋ฐ›๊ณ  ์ „๋žต์„ ์„ ํƒํ•˜๋Š” ๋กœ์ง์„ ๊ฑฐ์น˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ด์ œ ์ „๋žต๋ณ„๋กœ ์–ด๋–ป๊ฒŒ structured output ์„ ๋งŒ๋“ค์–ด ๋‚ด๋Š”์ง€ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

์ „๋žต๋ณ„ ์‘๋‹ต์ƒ์„ฑ ๊ณผ์ •

  1. ToolcallingStrategy
    class ToolStrategy(Generic[SchemaT]):
        schema: type[SchemaT]
        schema_specs: list[_SchemaSpec[SchemaT]]
        tool_message_content: str | None
        handle_errors: bool | ...
    
    langchain ์€ schema_spec ์„ ์ด์šฉํ•ด์„œ fake tool schema ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ด fake tool ์ด๋ฆ„์ด structured output ๊ฐ™์€ ํ˜•ํƒœ๋กœ ๋ชจ๋ธ์—๊ฒŒ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿผ ๋ชจ๋ธ์€ ์•„๋ž˜์™€ ๊ฐ™์€ ํ˜•ํƒœ๋กœ ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค.์ด์ œ json ์„ ํŒŒ์‹ฑํ•ด์„œ pydantic ์ด๋‚˜ dataclass ๊ฒ€์ฆ์„ ํ•˜๊ณ  ์‹คํŒจํ•˜๋ฉด Validation Error ์„ ๋ฑ‰์–ด๋‚ด๊ณ  ๋‹ค์‹œ ๋ชจ๋ธ์—๊ฒŒ ์š”์ฒญ์„ ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
    {
      "tool": "structured_output",
      "arguments": {
          "title": "some text",
          "score": 0.82
      }
    }
    โ€‹
    ์ด error ์ดํ›„ ๋‹ค์‹œ ๋ชจ๋ธ์—๊ฒŒ ์š”์ฒญํ•˜๋Š” ๊ณผ์ •์—์„œ ๋งŒ์•ฝ ๋ชจ๋“  ์ปจํ…์ŠคํŠธ๋ฅผ ํฌํ•จํ•œ ์ฒด์ธ์ด๋‚˜ ๋…ธ๋“œ๋ผ๋ฉด ์ •๋ง ๋งŽ์€ ํ† ํฐ์ด ๋‚ญ๋น„๋˜๊ณ , ์‘๋‹ต์‹œ๊ฐ„์ด ์ง€์—ฐ๋˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.๋ชจ๋ธ์ด native ํ•˜๊ฒŒ structured output ์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ Toolcalling strategy ๋ฅผ ์„ ํƒํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  2. ProviderStrategy
    @dataclass(init=False)
    class ProviderStrategy(Generic[SchemaT]):
        """Use the model provider's native structured output method."""
    
        schema: type[SchemaT]
        """Schema for native mode."""
    
        schema_spec: _SchemaSpec[SchemaT]
        """Schema spec for native mode."""
    
    langchain ์€ ์Šคํ‚ค๋งˆ๋งŒ ๊ทธ๋Œ€๋กœ ๋ชจ๋ธ์—๊ฒŒ ์ „๋‹ฌํ•˜๊ณ  ์‘๋‹ต๋ฐ›์•„์„œ ํŒŒ์‹ฑ๋งŒ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. openAI ์™€ anthropic gemini ์˜ ์‘๋‹ต์€ ์•ˆ์ •์ ์œผ๋กœ ๋‹ค์‹œ ๋ชจ๋ธ์—๊ฒŒ ์š”์ฒญํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์ ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ์ž์ฒด์ ์œผ๋กœ structured output ์„ ์ง€์›ํ•˜๋Š” ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค. ์ด๋•Œ langchain ์€ ๊ฐ ๋ฒค๋”์‚ฌ์— ๋งž๋Š” ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜/ํŒŒ์‹ฑ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Structured Output ํ…Œ์ŠคํŠธ

openai ์˜ structured otutput์€ ์•„๋ž˜์˜ ์žฅ์ ์„ ๊ฐ–๊ณ  ์žˆ๋Š”๋ฐ, ํŠนํžˆ ์„ธ๋ฒˆ์งธ ๋ถ€๋ถ„์ด ์ธ์ƒ์ ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์ „ ์ฑ—๋ด‡ ๊ฐœ๋ฐœ๋‹น์‹œ ๋ ˆ๊ฑฐ์‹œ๋Š” ์ด ๊ธฐ๋Šฅ์„ ๋ชฐ๋ž๋˜๊ฒƒ์ธ์ง€ ํ”„๋กฌํ”„ํŠธ๋กœ ์ถœ๋ ฅ์„ ๊ฐ•์ œํ•˜๊ณ  ์žˆ์—ˆ๋Š”๋ฐ, structured output ์„ ์‚ฌ์šฉํ•˜๋ฉดformat ์„ ์ง€ํ‚ค๊ธฐ ์œ„ํ•ด์„œ ๊ฐ•๋ ฅํ•œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ํ•˜์ง€ ์•Š์•„๋„ ๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

 

structured output ์ด ์–ธ์ œ ์ง€์›๋˜๋„๋ก ํฌํ•จ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•ด๋ณด๋‹ˆ Toolcalling strategy ๋Š” 2023๋…„ ์ค‘ํ›„๋ฐ˜์ฏค ๊ทธ๋ฆฌ๊ณ  ProviderStrategy๋Š” 2024๋…„ 8์›” 6์ผ gpt-4o ๋ชจ๋ธ์„ ์‹œ์ž‘์œผ๋กœ openai ๊ฐ€ ๊ฐ€์žฅ๋จผ์ € ์ง€์›ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๋‹ค์Œ anthropic ๊ณผ gemini ๊ฐ€ ์ฐจ๋ก€๋กœ ์ง€์›ํ•˜๊ธฐ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

langchain ์Šคํ…Œ์ด๋ธ”๋ฒ„์ „์ด 2024๋…„ 1์›”์— ๋ฐฐํฌ๋˜๊ณ , ๊ทธ๋•Œ๋ถ€ํ„ฐ ์ฑ—๋ด‡ ๋ ˆ๊ฑฐ์‹œ๊ฐ€ ๊ฐœ๋ฐœ๋˜๊ธฐ ์‹œ์ž‘ํ–ˆ์œผ๋‹ˆ ์ตœ์ดˆ ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ ์ดํ›„ ์‹ ๊ธฐ์ˆ  ์ถ”์ ์„ 1๋…„ 6๊ฐœ์›” ๊ฐ€๊นŒ์ด ํ•˜์ง€ ์•Š์•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿผ ์‹ค์ œ๋กœ ํ”„๋กฌํ”„ํŠธ๋กœ ์ถœ๋ ฅ์„ ๊ฐ•์ œํ•˜๋Š” ๊ฒƒ๊ณผ structured output ์œผ๋กœ output ํ˜•ํƒœ๋ฅผ ํŒŒ์‹ฑํ•˜๋Š” ๊ฒƒ์ด ์–ผ๋งˆ๋‚˜ ๋‹ค๋ฅธ์ง€ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง์œผ๋กœ output format ๊ฐ•์ œ ํ…Œ์ŠคํŠธ

๋”๋ณด๊ธฐ
system_prompt = """๋‹น์‹ ์˜ ์ž„๋ฌด๋Š” ์•„๋ž˜ Pydantic ๋ชจ๋ธ ์Šคํ‚ค๋งˆ์— ์ •ํ™•ํžˆ ๋งž๋Š” JSON๋งŒ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
๋‹น์‹ ์€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‚ด๋ถ€์ ์œผ๋กœ ๋…ผ๋ฆฌ์  ๋‹จ๊ณ„๋ณ„ ์ถ”๋ก (Chain-of-Thought)์„ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•œ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ ๊ทธ ์‚ฌ๊ณ  ๊ณผ์ •์€ ์ ˆ๋Œ€ ์ถœ๋ ฅํ•˜์ง€ ๋ง๊ณ , ์ตœ์ข… ์ถœ๋ ฅ์€ ์•„๋ž˜ ์Šคํ‚ค๋งˆ์— ์™„์ „ํžˆ ๋งž๋Š” JSON๋งŒ ์ƒ์„ฑํ•ด์•ผ ํ•œ๋‹ค.

์ถœ๋ ฅ ํ˜•์‹ ๊ทœ์น™:
1. ๋ฐ˜๋“œ์‹œ JSON ํฌ๋งท์œผ๋กœ๋งŒ ์ถœ๋ ฅํ•œ๋‹ค.
2. JSON ๋ฐ”๊นฅ์— ์–ด๋–ค ์„ค๋ช…, ๋ฌธ์žฅ, ์—ฌ๋ถ„์˜ ํ…์ŠคํŠธ๋„ ์ ˆ๋Œ€ ์ถœ๋ ฅํ•˜์ง€ ์•Š๋Š”๋‹ค.
3. ๋ชจ๋“  ํ•„๋“œ๋Š” ๋ฐ˜๋“œ์‹œ ํฌํ•จํ•ด์•ผ ํ•œ๋‹ค: name, age, address, phone_number
4. ํ•„๋“œ ํƒ€์ž…์€ ์Šคํ‚ค๋งˆ์™€ 100% ์ผ์น˜ํ•ด์•ผ ํ•œ๋‹ค.
   - name: ๋ฌธ์ž์—ด
   - age: ์ •์ˆ˜
   - address: ๋ฌธ์ž์—ด
   - phone_number: ๋ฌธ์ž์—ด
5. ์˜๋ฏธ ์—†๋Š” ๊ฐ’, null, None, undefined ๋“ฑ์„ ๋„ฃ์ง€ ๋ง๊ณ  ์‹ค์ œ ๊ฐ’์œผ๋กœ ์ฑ„์šด๋‹ค.
6. JSON ํ‚ค ์ด๋ฆ„์€ ์Šคํ‚ค๋งˆ์™€ ์™„์ „ํžˆ ๋™์ผํ•ด์•ผ ํ•˜๋ฉฐ, ๋Œ€์†Œ๋ฌธ์ž ๋ณ€๊ฒฝ ๊ธˆ์ง€.
7. JSON ์™ธ๋ถ€์— ์ฃผ์„, ๋งˆํฌ๋‹ค์šด, ๊ณต๋ฐฑ ๋ผ์ธ๋„ ์ถœ๋ ฅํ•˜๋ฉด ์•ˆ ๋œ๋‹ค.
8. ์˜ˆ์‹œ๋Š” ์ ˆ๋Œ€๋กœ ์„ค๋ช…ํ•˜์ง€ ๋ง๊ณ , ์ตœ์ข… ์ถœ๋ ฅ๋„ ์˜ˆ์ œ์™€ ๋™์ผํ•œ ํ˜•์‹์˜ JSON๋งŒ ์ƒ์„ฑํ•œ๋‹ค.

Pydantic ๋ชจ๋ธ ์Šคํ‚ค๋งˆ:

class Gender(str, Enum):
    male = "male"
    female = "female"
    other = "other"

class Address(BaseModel):
    street: str = Field(description="Street name and number")
    city: str = Field(description="City name")
    state: str = Field(description="State/Province")
    postal_code: str = Field(description="Postal/ZIP code")
    country: str = Field(description="Country name")

class UserProfile(BaseModel):
    name: str = Field(description="The user's full name")
    age: int = Field(description="The user's age")
    gender: Gender = Field(description="The user's gender")
    email: str = Field(description="The user's email address")
    phone_number: str = Field(description="The user's primary phone number")
    addresses: List[Address] = Field(description="List of user's addresses")
    date_of_birth: date = Field(description="The user's birth date")
    interests: List[str] = Field(default_factory=list, description="List of user's interests")
    is_active: bool = Field(default=True, description="Whether the user is active")
    bio: Optional[str] = Field(default=None, description="Short biography of the user")
    friends_ids: Optional[List[int]] = Field(default_factory=list, description="List of friend's user IDs")
    account_created: date = Field(description="Date when the user account was created")

[์ž…๋ ฅ ์˜ˆ์ œ 1]
๋‚˜์ด๋Š” 27์„ธ์ด๊ณ , ์„ฑ๋ณ„์€ ๋‚จ์„ฑ์ž…๋‹ˆ๋‹ค.  
์ด๋ฉ”์ผ ์ฃผ์†Œ๋Š” taejung.park@example.com์ด๊ณ , ํœด๋Œ€ํฐ ๋ฒˆํ˜ธ๋Š” 010-1234-5678์ž…๋‹ˆ๋‹ค.  
์ฃผ์†Œ๋Š” ์„œ์šธ ์˜๋“ฑํฌ๊ตฌ ์˜๋“ฑํฌ๋กœ 123๋ฒˆ์ง€์™€ ์„œ์šธ ๊ฐ•๋‚จ๊ตฌ ๊ฐ•๋‚จ๋Œ€๋กœ 456๋ฒˆ์ง€ ๋‘ ๊ณณ์ž…๋‹ˆ๋‹ค.  
์ƒ๋…„์›”์ผ์€ 1996๋…„ 5์›” 14์ผ์ด๊ณ , ๊ด€์‹ฌ์‚ฌ๋Š” ๋…์„œ, ์˜ํ™”, ๋“ฑ์‚ฐ์ž…๋‹ˆ๋‹ค.  
ํ™œ์„ฑ ์ƒํƒœ๋Š” True์ด๋ฉฐ, ์ž๊ธฐ์†Œ๊ฐœ๋Š” "์•ˆ๋…•ํ•˜์„ธ์š”, ์„œ์šธ์—์„œ ๊ฐœ๋ฐœ์ž๋กœ ์ผํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค."์ž…๋‹ˆ๋‹ค.  
์นœ๊ตฌ ID๋Š” 101, 102, 103์ด๊ณ , ๊ณ„์ • ์ƒ์„ฑ์ผ์€ 2020๋…„ 8์›” 1์ผ์ž…๋‹ˆ๋‹ค.  

[์ถœ๋ ฅ ์˜ˆ์ œ 1]
{
  "name": "๋ฐ•ํƒœ์ •",
  "age": 27,
  "gender": "male",
  "email": "taejung.park@example.com",
  "phone_number": "010-1234-5678",
  "addresses": [
    {
      "street": "์˜๋“ฑํฌ๋กœ 123",
      "city": "์„œ์šธ",
      "state": "์˜๋“ฑํฌ๊ตฌ",
      "postal_code": "07200",
      "country": "๋Œ€ํ•œ๋ฏผ๊ตญ"
    },
    {
      "street": "๊ฐ•๋‚จ๋Œ€๋กœ 456",
      "city": "์„œ์šธ",
      "state": "๊ฐ•๋‚จ๊ตฌ",
      "postal_code": "06100",
      "country": "๋Œ€ํ•œ๋ฏผ๊ตญ"
    }
  ],
  "date_of_birth": "1996-05-14",
  "interests": ["๋…์„œ", "์˜ํ™”", "๋“ฑ์‚ฐ"],
  "is_active": true,
  "bio": "์•ˆ๋…•ํ•˜์„ธ์š”, ์„œ์šธ์—์„œ ๊ฐœ๋ฐœ์ž๋กœ ์ผํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.",
  "friends_ids": [101, 102, 103],
  "account_created": "2020-08-01"
}


[์ž…๋ ฅ ์˜ˆ์ œ 2]
์•ˆ๋…•ํ•˜์„ธ์š”. ์œ ์ € ๊น€ํ•˜๋‚˜์˜ ์ •๋ณด๋ฅผ ์•Œ๋ ค๋“œ๋ฆด๊ฒŒ์š”.  
๋‚˜์ด๋Š” 30์„ธ์ด๊ณ , ์„ฑ๋ณ„์€ ์—ฌ์„ฑ์ž…๋‹ˆ๋‹ค.  
์ด๋ฉ”์ผ์€ kim.hana@example.com, ํœด๋Œ€ํฐ ๋ฒˆํ˜ธ๋Š” 010-9876-5432์ž…๋‹ˆ๋‹ค.  
์ฃผ์†Œ๋Š” ์„œ์šธ ๊ฐ•๋ถ๊ตฌ ๋ฏธ์•„๋กœ 11๊ณผ ๊ฒฝ๊ธฐ ์„ฑ๋‚จ์‹œ ๋ถ„๋‹น๊ตฌ ์‚ผํ‰๋™ 22๋ฒˆ์ง€ ๋‘ ๊ณณ์ž…๋‹ˆ๋‹ค.  
์ƒ๋…„์›”์ผ์€ 1993๋…„ 9์›” 10์ผ์ด๊ณ , ๊ด€์‹ฌ์‚ฌ๋Š” ์š”๊ฐ€, ์˜ํ™”, ์—ฌํ–‰์ž…๋‹ˆ๋‹ค.  
ํ™œ์„ฑ ์ƒํƒœ๋Š” True์ด๋ฉฐ, ์ž๊ธฐ์†Œ๊ฐœ๋Š” "์•ˆ๋…•ํ•˜์„ธ์š”, ํ”„๋ฆฌ๋žœ์„œ ๋””์ž์ด๋„ˆ์ž…๋‹ˆ๋‹ค."์ž…๋‹ˆ๋‹ค.  
์นœ๊ตฌ ID๋Š” 201, 202, 203์ด๊ณ , ๊ณ„์ • ์ƒ์„ฑ์ผ์€ 2019๋…„ 3์›” 15์ผ์ž…๋‹ˆ๋‹ค.  

[์ถœ๋ ฅ ์˜ˆ์ œ 2]
{
  "name": "๊น€ํ•˜๋‚˜",
  "age": 30,
  "gender": "female",
  "email": "kim.hana@example.com",
  "phone_number": "010-9876-5432",
  "addresses": [
    {"street": "๋ฏธ์•„๋กœ 11", "city": "์„œ์šธ", "state": "๊ฐ•๋ถ๊ตฌ", "postal_code": "01000", "country": "๋Œ€ํ•œ๋ฏผ๊ตญ"},
    {"street": "์‚ผํ‰๋™ 22", "city": "์„ฑ๋‚จ์‹œ", "state": "๋ถ„๋‹น๊ตฌ", "postal_code": "13500", "country": "๋Œ€ํ•œ๋ฏผ๊ตญ"}
  ],
  "date_of_birth": "1993-09-10",
  "interests": ["์š”๊ฐ€", "์˜ํ™”", "์—ฌํ–‰"],
  "is_active": true,
  "bio": "์•ˆ๋…•ํ•˜์„ธ์š”, ํ”„๋ฆฌ๋žœ์„œ ๋””์ž์ด๋„ˆ์ž…๋‹ˆ๋‹ค.",
  "friends_ids": [201, 202, 203],
  "account_created": "2019-03-15"
}



์œ„ ๊ทœ์น™๊ณผ ์˜ˆ์ œ๋ฅผ ๋ชจ๋‘ ์ฐธ๊ณ ํ•˜์—ฌ, ์ง€๊ธˆ๋ถ€ํ„ฐ ์–ด๋–ค ์ž…๋ ฅ์ด ๋“ค์–ด์˜ค๋”๋ผ๋„ Pydantic UserInfo ์Šคํ‚ค๋งˆ์— ์™„์ „ํžˆ ๋งž๋Š” JSON๋งŒ ์ถœ๋ ฅํ•˜๋ผ.
์‚ฌ๊ณ  ๊ณผ์ •์€ ๋‚ด๋ถ€์ ์œผ๋กœ๋งŒ ์‚ฌ์šฉํ•˜๊ณ  ์ ˆ๋Œ€ ์™ธ๋ถ€๋กœ ๋…ธ์ถœํ•˜์ง€ ์•Š๋Š”๋‹ค."""

 

Structured Output Pydantic ํŒŒ๋ผ๋ฏธํ„ฐ ์ „๋‹ฌ ํ…Œ์ŠคํŠธ

structured output ์€ ๊ณต์‹๋ฌธ์„œ์—์„œ๋„ “structured output ์€ ์‹ค์ˆ˜ํ•  ์ˆ˜ ์žˆ๋‹ค” , “์ตœ๋Œ€ํ•œ ์Šคํ‚ค๋งˆ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ž˜ ์ž‘์„ฑํ•ด๋ผ” ๋ผ๊ณ  ๋งํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋”ฐ๋ผ์„œ LLM ์ด ๋ถ„๋ฅ˜ํ•˜๊ฑฐ๋‚˜, ์–ด๋–ค ํฌ๋งท์— ์ž…๋ ฅ์„ ๊ฐ•์ œํ•ด์•ผํ•œ๋‹ค๋ฉด Pydantic ์‚ฌ์šฉํ•˜๊ธฐ๋ฅผ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.

๊ฐ„๋‹จํ•œ ํ”„๋กฌํ”„ํŠธ์˜ ๊ฒฝ์šฐ ๋‘˜๋‹ค ์ž˜ ๋ฑ‰์–ด๋‚ด๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿผ ์‹ค๋ฌด์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ƒ๊ฐํ•ด๋ณด๊ณ  ํ…Œ์ŠคํŠธ ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. LLM ์ด ์„ญ์ทจํ•˜๊ฒŒ ๋  ๋ฐ์ดํ„ฐ๋Š” ์ƒ๊ฐ๋ณด๋‹ค ๋ณต์žกํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์—ฌ๋Ÿฌ๊ฐœ DTO ๊ฐ€ ์„ž์—ฌ์žˆ๋Š” ๊ฒฝ์šฐ DTO ๊ฐ€ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ปค์ง€๊ฒŒ ๋˜๋Š”๋ฐ์š” 3๊ฐœ์˜ DTO๋ฅผ ์˜ˆ์‹œ๋กœ ํ•˜์—ฌ json ํƒ€์ž…์ด ์•„๋‹Œ ์ž์—ฐ์–ด๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฃผ์—ˆ์„ ๋•Œ ์ž˜ ํŒŒ์‹ฑํ•˜๋Š”์ง€ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

class Gender(str, Enum):
    male = "male"
    female = "female"
    other = "other"
class Address(BaseModel):
    street: str = Field(description="Street name and number")
    city: str = Field(description="City name")
    state: str = Field(description="State/Province")
    postal_code: str = Field(description="Postal/ZIP code")
    country: str = Field(description="Country name")
class UserProfile(BaseModel):
    name: str = Field(description="The user's full name")
    age: int = Field(description="The user's age")
    gender: Gender = Field(description="The user's gender")
    email: str = Field(description="The user's email address")
    phone_number: str = Field(description="The user's primary phone number")
    addresses: List[Address] = Field(description="List of user's addresses")
    date_of_birth: date = Field(description="The user's birth date")
    interests: List[str] = Field(default_factory=list, description="List of user's interests")
    is_active: bool = Field(default=True, description="Whether the user is active")
    bio: Optional[str] = Field(default=None, description="Short biography of the user")
    friends_ids: Optional[List[int]] = Field(default_factory=list, description="List of friend's user IDs")
    account_created: date = Field(description="Date when the user account was created")

 

Input์€ ์•„๋ž˜์™€ ๊ฐ™์ด ํ–ˆ๋‹ค. 

๋ฐ•์ค€ํ˜ธ๋ผ๋Š” ์‚ฌ์šฉ์ž์˜ ์ •๋ณด๋ฅผ JSON์œผ๋กœ ๋งŒ๋“ค์–ด์ฃผ์„ธ์š”. 
๋‚˜์ด๋Š” 24์„ธ, ๋‚จ์„ฑ์ด๋ฉฐ, ์ด๋ฉ”์ผ์€ park.junho@example.com, 
์ „ํ™”๋ฒˆํ˜ธ๋Š” 010-1111-2222์ž…๋‹ˆ๋‹ค.  
์ฃผ์†Œ๋Š” ๋ถ€์‚ฐ ํ•ด์šด๋Œ€๊ตฌ ๋งˆ๋ฆฐ์‹œํ‹ฐ 5๋ฒˆ์ง€์™€ ๋Œ€๊ตฌ ์ˆ˜์„ฑ๊ตฌ ๋ฒ”์–ด๋กœ 88๋ฒˆ์ง€์ž…๋‹ˆ๋‹ค.
์ƒ์ผ์€ 2000๋…„ 12์›” 1์ผ, ๊ด€์‹ฌ์‚ฌ๋Š” ๊ฒŒ์ž„, ์ฝ”๋”ฉ, ์ถ•๊ตฌ์ž…๋‹ˆ๋‹ค. 
์‚ฌ์šฉ์ž๋Š” ๋น„ํ™œ์„ฑ ์ƒํƒœ(False)์ด๋ฉฐ, ์ž๊ธฐ์†Œ๊ฐœ๋Š” ๊ฒŒ์ž„ ๊ฐœ๋ฐœ์ž๋ฅผ ๊ฟˆ๊พธ๊ณ  ์žˆ๋Š” ๋Œ€ํ•™์ƒ์ž…๋‹ˆ๋‹ค.
์นœ๊ตฌ ID๋Š” 301, 302, ๊ณ„์ • ์ƒ์„ฑ์ผ์€ 2021๋…„ 6์›” 20์ผ์ž…๋‹ˆ๋‹ค.

 

 

๋ณต์žกํ•œ ๊ตฌ์กฐ์  ๋ฐ์ดํ„ฐ๋ฅผ ํ”„๋กฌํ”„ํŠธ๋กœ ํ˜•ํƒœ๋ฅผ ๊ฐ•์ œํ•œ ๊ฒƒ๋„ ๋Œ€์ฒด๋กœ ์ž˜ ํŒŒ์‹ฑํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ฒฐ๊ณผ๋ฌผ์„ ๋ณด๋ฉด postal code์— ํฌํ•จ๋˜์–ด ์žˆ์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์žˆ์Šต๋‹ˆ๋‹ค. 

๊ทธ๋ ‡๋‹ค๋ฉด structured output ์„ ์‚ฌ์šฉํ•œ ์ฟผ๋ฆฌ๋Š” ์–ด๋–จ๊นŒ์š”?

์ฐฌ๊ฐ€์ง€๋กœ ์ž˜ ํŒŒ์‹ฑํ•ฉ๋‹ˆ๋‹ค. DTO๊ฐ€ ๋ณต์žกํ•ด์ง€๋”๋ผ๋„ ์ข‹์€ ๋ชจ๋ธ์ธ ๊ฒฝ์šฐ์—๋Š” ๊ฑฐ์˜ ๋‹ค ํŒŒ์‹ฑ์„ ํ•ด๋‚ด๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ํ•œ๊ฐ€์ง€ ์ฐจ์ด์ ์ด ๋ฐœ์ƒํ–ˆ๋Š”๋ฐ์š” with structured output ๋ฉ”์„œ๋“œ๋Š” postal_code ๊ฐ€ ๋นˆ์นธ์ธ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ prompt ๋กœ ๊ฐ•์ œํ•œ ๊ฒฝ์šฐ์—๋Š” ์‹ค์ œ ๋ฐ์ดํ„ฐ์— postal code ๊ฐ€ ์—†์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  dummy ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด๊ฐ€์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

Structured Outpu๋Š” ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ์„๊นŒ

์ง€๊ธˆ๊นŒ์ง€ ๋‚ด์šฉ์œผ๋กœ structured output ์„ ์‚ฌ์šฉํ•  ๋•Œ ์กฐ๊ธˆ ๋” ์ž˜ ํŒŒ์‹ฑ์ด ๋˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์—ˆ๋Š”๋ฐ์š”, ํ›จ์‹  ๊ฐ„๊ฒฐํ•˜๊ณ  ์„ฑ๋Šฅ์ด ์ข‹์œผ๋‹ˆ ๋”ฐ๋ผ์„œ ํ”„๋กฌํ”„ํŠธ๋กœ ๊ฐ•์ œํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค structured output ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜๋Š”๊ฒƒ์ด ํ›จ์”ฌ ๋” ์œ ๋ฆฌํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

2025๋…„ 12์›” 2์ผ ๊ธฐ์ค€์œผ๋กœ

https://llm-stats.com/

 

AI Leaderboards 2025 - Compare All AI Models

Comprehensive AI leaderboards comparing LLM, TTS, STT, video, image, and embedding models. Compare performance, pricing, and capabilities across all AI modalities.

llm-stats.com

 

์‹คํ—˜์— ์‚ฌ์šฉํ•œ gpt-4o ๋ชจ๋ธ๋ณด๋‹ค ๊ดœ์ฐฎ์€ ๋กœ์ปฌ ๋ชจ๋ธ๋“ค์ด ๋งŽ์€๋ฐ์š” 30B ์ •๋„ ๋˜๋Š” ๋ชจ๋ธ๋“ค์„ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ๋กœ์ปฌ์—์„œ ๋Œ๋ฆฌ๋Š” ๋ชจ๋ธ๋“ค๋„ ์ž˜ ์ž‘๋™ํ•  ๊ฒƒ์ด๋ผ ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์„ฑ๋Šฅ์ธก๋ฉด์—์„œ ์–ด๋–ค ์ „๋žต์ด ๋” ์šฐ์›”ํ•˜๋‹ค๋Š” ๊ฒƒ์€ ํฐ ์˜๋ฏธ๊ฐ€ ์—†์–ด๋ณด์ž…๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  Toolcalling Strategy ์˜ ๊ฒฝ์šฐ๋Š” retry ๊ฐ€ ์ž์ฃผ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— structured output ์ง€์›๋˜๋Š” api ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ™˜๊ฒฝ์ด๋ผ๋ฉด ProviderStrategy ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์‹œ๋„ํ•ด์•ผ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

์ด์ œ ์ถœ๋ ฅ ๊ตฌ์กฐ๋ฅผ ํ”„๋กฌํ”„ํŠธ๋กœ ๊ฐ•์ œํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค structured output ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค๋Š” ๊ฒƒ์€ ์•Œ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿผ ๊ฒฐ์ •์ ์œผ๋กœ structured output ์„ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ์„๊นŒ? ์— ๋Œ€ํ•œ ๋‹ต์„ ๊ตฌํ•ด์•ผํ•˜๋Š”๋ฐ, ์ตœ๊ทผ ์•„๋ž˜์˜ ๊ธ€์„ ์ฝ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

https://www.philschmid.de/why-engineers-struggle-building-agents

 

Why (Senior) Engineers Struggle to Build AI Agents

Traditional software engineering is deterministic, while AI agents operate probabilistically. This fundamental difference creates challenges for engineers accustomed to strict interfaces and predictable outcomes.

www.philschmid.de

 

์‹œ๋‹ˆ์–ด ๊ฐœ๋ฐœ์ž๋“ค์ด ์ฃผ๋‹ˆ์–ด ๊ฐœ๋ฐœ์ž๋“ค๋ณด๋‹ค AI Agent ๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š”๊ฒŒ ๋А๋ฆฌ๋‹ค๋Š” ์ฃผ์ œ๋กœ ์‹œ์ž‘ํ•œ ๊ธ€์ธ๋ฐ ๊ทธ ์ด์œ ๋ฅผ ์ƒ๊ฐํ•˜๋ฉด ์‚ฌ๋ญ‡ ์ฒ ํ•™์ ์œผ๋กœ ๋ฐ›์•„๋“ค์—ฌ์•ผ ํ•  ๋ถ€๋ถ„์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด์œ ๋Š” ์ „ํ†ต์ ์ธ ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด๋ง(์—„๊ฒฉํ•œ ์ œ์–ด, ๊ฒฐ์ •๋ก ์ ) ๊ทธ๋Ÿฌ๋‹ˆ๊นŒ ๋งž์œผ๋ฉด ๋งž๋Š”๊ฑฐ๊ณ  ํ‹€๋ฆฌ๋ฉด ํ‹€๋ฆฐ๊ฑฐ์ง€, ํ‹€๋ฆฌ๋ฉด ๊ณ ์ณ์•ผ์ง€ ๋ผ๋Š” ์ „ํ†ต์ ์ธ ์—”์ง€๋‹ˆ์–ด๋ง์˜ ์ฒ ํ•™๊ณผ ์Šต๊ด€์ด AI ์—์ด์ „ํŠธ ๊ฐœ๋ฐœ์— ๋ฐฉํ•ด๊ฐ€ ๋˜๊ณ  ์žˆ๋‹ค๋Š” ๊ฒ๋‹ˆ๋‹ค. ๊ธ€์˜ ์ €์ž์ธ Phillipp Schmid ๋Š” ์‹œ๋‹ˆ์–ด์ผ์ˆ˜๋ก LLM ์˜ ๋ถˆํ™•์‹ค์„ฑ์„ ์ฝ”๋“œ๋กœ ์ œ๊ฑฐํ•˜๋ ค๊ณ  ํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์–ด ์ฃผ๋‹ˆ์–ด๋ณด๋‹ค ๋А๋ ค์ง„๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ๋งฅ๋ฝ์„ ๊ตฌ์กฐํ™” ๋œ ๊ฒƒ์œผ๋กœ ๊ฐ•์ œํ•˜๋ฉด LLM์ด ์ž˜ํ•˜๋Š”๊ฒƒ์„ ์˜คํžˆ๋ ค ๋” ๋ชปํ•˜๊ฒŒ ํ•˜๋ฉด์„œ ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๊ณ  ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๋Š” ์ด์œ ๋ฅผ ์ฝ”๋“œ๋กœ ์ œ๊ฑฐํ•˜๋ ค ํ•˜๋‹ˆ ์ˆ˜๋ ์— ๋น ์ง€๊ฒŒ ๋œ๋‹ค์˜ ์˜๋ฏธ์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ์ €์ž๋Š” agent๋ฅผ ๊ฐœ๋ฐœํ•  ๋•Œ ์•„๋ž˜์˜ ์ •์‹ ์„ ๊ฐ–์ถ”์–ด์•ผ ํ•œ๋‹ค๊ณ  ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

 

  1. ํ…์ŠคํŠธ๊ฐ€ ์ƒˆ๋กœ์šด ์ƒํƒœ(State)
    • ํ•จ์ •: ์ž์—ฐ์–ด ์ž…๋ ฅ์„ ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ(์˜ˆ: true/false)๋กœ ๊ฐ•์ œํ•˜๋ฉด ๋งฅ๋ฝ ์ƒ์‹ค.
    • ํ•ด๊ฒฐ: ํ”ผ๋“œ๋ฐฑ(์˜ˆ: “์Šน์ธ, ๋ฏธ๊ตญ ์‹œ์žฅ ์ง‘์ค‘”)์„ ํ…์ŠคํŠธ๋กœ ๋ณด์กดํ•ด ๋™์  ์กฐ์ • ๊ฐ€๋Šฅ.
  2. ์ œ์–ด๊ถŒ์„ ๋„˜๊ฒจ๋ผ
    • ํ•จ์ •: ํ๋ฆ„์„ ํ•˜๋“œ์ฝ”๋”ฉ(์˜ˆ: ๊ตฌ๋… ์ทจ์†Œ ๋ฃจํŠธ)ํ•˜๋ฉด ๋น„์ง์„ ์  ์ƒํ˜ธ์ž‘์šฉ ๋Œ€์‘ ์‹คํŒจ.
    • ํ•ด๊ฒฐ: ์—์ด์ „ํŠธ(LLM)๊ฐ€ ๋งฅ๋ฝ ๊ธฐ๋ฐ˜์œผ๋กœ ์˜๋„ ํŒ๋‹จํ•˜๋„๋ก ์‹ ๋ขฐ.
  3. ์—๋Ÿฌ๋Š” ๊ทธ๋ƒฅ ์ž…๋ ฅ์ด๋‹ค
    • ํ•จ์ •: ์—๋Ÿฌ ๋ฐœ์ƒ ์‹œ ํ”„๋กœ๊ทธ๋žจ ์ค‘๋‹จ(์ „ํ†ต ๋ฐฉ์‹)์œผ๋กœ ๊ณ ๋น„์šฉ ์‹คํ–‰ ๋‚ญ๋น„.
    • ํ•ด๊ฒฐ: ์—๋Ÿฌ๋ฅผ ํ”ผ๋“œ๋ฐฑ์œผ๋กœ ์ œ๊ณตํ•ด ์—์ด์ „ํŠธ๊ฐ€ ์ž๊ฐ€ ๋ณต๊ตฌ ์‹œ๋„.
  4. ์œ ๋‹› ํ…Œ์ŠคํŠธ์—์„œ Eval๋กœ
    • ํ•จ์ •: ์ด์ง„ ํ…Œ์ŠคํŠธ(TDD) ์ ์šฉ ์‹œ ํ™•๋ฅ ์  ์‹œ์Šคํ…œ์—์„œ ๋ฌด์˜๋ฏธ(๋ฌดํ•œ ์œ ํšจ ๋‹ต๋ณ€).
    • ํ•ด๊ฒฐ: ์‹ ๋ขฐ์„ฑ(Pass@k), ํ’ˆ์งˆ(LLM Judge), ์ถ”์ (Eval)๋กœ ๋ณ€๋™์„ฑ ๊ด€๋ฆฌ.
  5. ์—์ด์ „ํŠธ๋Š” ์ง„ํ™”ํ•˜๊ณ , API๋Š” ๊ทธ๋ ‡์ง€ ์•Š๋‹ค
    • ํ•จ์ •: ์ธ๊ฐ„ ์ค‘์‹ฌ API(์•”๋ฌต์  ๋งฅ๋ฝ) ์‚ฌ์šฉ ์‹œ ์—์ด์ „ํŠธ ํ™˜๊ฐ ๋ฐœ์ƒ.
    • ํ•ด๊ฒฐ: ์ƒ์„ธ ์‹œ๋งจํ‹ฑ ํƒ€์ดํ•‘(์˜ˆ: “user_email_address”)๊ณผ ๋…์ŠคํŠธ๋ง์œผ๋กœ ๋ช…ํ™•ํ™”. ์—์ด์ „ํŠธ๋Š” ๋„๊ตฌ ๋ณ€ํ™”์— ์ ์‘ ๊ฐ€๋Šฅ.

 

๊ฒฐ๋ก ์€ ์—”์ง€๋‹ˆ์–ด๋ง ์‚ฐ๋ฌผ์˜ ํ™•๋ฅ ์„ฑ์„ ๋ฐ›์•„๋“ค์ด๊ณ  edge case ๋“ค์„ ๊ฐ•์ œํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ ๊ทธ ๋งˆ์ €๋„ LLM ์ด ์ž๊ธฐํ”ผ๋“œ๋ฐฑ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ํƒ„๋ ฅ์  ์‹œ์Šคํ…œ ๊ตฌ์ถ•์œผ๋กœ ๋งŒ๋“ค๊ณ  ๊ทธ ๊ณผ์ •์„ ๊ด€๋ฆฌํ•˜๋ผ๋Š” ๋ง ์ž…๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ๋‹ค์‹œ ์š”์ ์œผ๋กœ ๋Œ์•„์™€ structured output ์„ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€? ์— ๋Œ€ํ•œ ๋‹ต์€ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋‹ค์— ๊ฐ€๊น๋‹ค. ์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋‹ค ์—†๋‹ค ๋กœ ํŒ๋‹จํ•˜๋Š”๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์–ผ๋งˆ๋‚˜ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€? ์— ์ง‘์ค‘ํ•ด์•ผ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ธฐ๋Šฅ์€ ๋Œ€์ฒด๋กœ ์ž˜ ์ž‘๋™ํ•˜๋‹ˆ(gpt-4o ์ด์ƒ์˜ ๋ชจ๋ธ), ๊ฐ์ž์˜ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ ํ…Œ์ŠคํŠธํ•ด๋ณด๊ณ  ๊ด€๋ฆฌ ๊ฐ€๋Šฅํ•œ edge case ์ธ์ง€ ํŒŒ์•…ํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ด ๋ถ€๋ถ„์— ๋Œ€ํ•œ ์ƒ๊ฐ์€ ์‚ฌ๋žŒ๋งˆ๋‹ค ๋งŽ์ด ๋‹ค๋ฅผ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์˜๊ฒฌ๋“ค์„ ๋Œ“๊ธ€๋กœ ๋‚จ๊ฒจ์ฃผ์„ธ์š”!

728x90
728x90


๋ถ„๋ฅ˜ : ๋”•์…”๋„ˆ๋ฆฌ

ํ‘œ๋ฉด์ ์œผ๋กœ ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ํ‘ธ๋Š” ๋ฌธ์ œ์ด์ง€๋งŒ, ๋‘๊ฐ€์ง€ ํ•ด๊ฒฐํ•ด์•ผ ํ•˜๋Š” ์ด์Šˆ๋“ค์ด ๋” ์žˆ๋‹ค.

1. Value ๋กœ Key ๊ฐ’์„ ์ฐพ๊ธฐ.

2. input() ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์‹œ๊ฐ„์ดˆ๊ณผ ๋ฌธ์ œ ํ•ด๊ฒฐํ•˜๊ธฐ


keypoint : python input / value๋กœ key ์ฐพ๊ธฐ 


code

import sys 
n,m = map(int,input().split(' '))

pocketmon_list = dict()
rev_poecketmon_list = dict()
cnt=1
for i in range(0,n):
    name = sys.stdin.readline().strip()
    pocketmon_list[str(cnt)] = name
    rev_poecketmon_list[name] = str(cnt)
    cnt+=1



for i in range(0,m):
    tmp_input = sys.stdin.readline().strip()
    if tmp_input.isdigit():
            print(pocketmon_list[tmp_input])
    else:
         print(rev_poecketmon_list[tmp_input])

์ค‘์š”ํ•œ ๋‚ด์šฉ 

1. Value ๋กœ Key ์ฐพ๊ธฐ 

 

 

[Python] ํŒŒ์ด์ฌ ๋”•์…”๋„ˆ๋ฆฌ value๋กœ key ์ฐพ๋Š” ๋ฐฉ๋ฒ•

Dictionary ๊ตฌ์กฐ๋Š” key ๊ฐ’์œผ๋กœ value ๊ฐ’์„ ์ฐพ๋Š” ๋ฐ์— ํŠนํ™”๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ตญ์–ด์‚ฌ์ „์— ๋น„์œ ํ•˜๋ฉด ์ฐพ๊ณ ์ž ํ•˜๋Š” ๋‹จ์–ด์˜ ๋œป์€ ์‰ฝ๊ฒŒ ์•Œ ์ˆ˜ ์žˆ์œผ๋‚˜, ํ•ด๋‹น ๋œป์„ ๊ฐ€์ง„ ๋‹จ์–ด๋Š” ์ฐพ๊ธฐ๊ฐ€ ๋งค์šฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ํŒŒ์ด์ฌ์˜

star7sss.tistory.com

 

 

์œ„ ๊ธ€์„ ์ฐธ๊ณ ํ•˜๊ธฐ ๋ฐ”๋ž€๋‹ค. ๊ฒฐ๋ก ์€ value๋กœ key ๋ฅผ ์ง์ ‘ ์ฐพ๋Š” ๊ฒƒ์€ for ๋ฌธ์„ ์‚ฌ์šฉํ•œ ์™„์ „ํƒ์ƒ‰๋ฐ–์— ์—†๋‹ค. 

 

2. ์™œ input() ์ด sys.stdin.readline().stirp() ๋ณด๋‹ค ๋А๋ฆด๊นŒ?

input() ํ•จ์ˆ˜๋Š” Python ์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ œ๊ณตํ•˜๋Š” ์‚ฌ์šฉ์ž ์ž…๋ ฅ ํ•จ์ˆ˜์ธ๋ฐ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์ง•์„ ๊ฐ–๊ณ  ์žˆ๋‹ค.

1. ์ž…๋ ฅ ๋œ ๊ฐ’์„ '๋ฌธ์ž์—ด๋กœ ๋ฐ˜ํ™˜' ํ•˜๊ณ  '์ž๋™์œผ๋กœ ๊ฐœํ–‰ ๋ฌธ์ž ์ œ๊ฑฐ' ๋ฅผ ํ•œ๋‹ค.

2. ํ”„๋กฌํ”„ํŠธ ๋ฉ”์‹œ์ง€๋ฅผ ์ธ์ž๋กœ ๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค.

 

์—ฌ๊ธฐ์„œ ์ด ๋ฌธ์ž์—ด๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ฐ˜ํ™˜ํ•˜๊ณ  ์ž๋™์œผ๋กœ ๊ฐœํ–‰ ๋ฌธ์ž๋ฅผ ์ œ๊ฑฐํ•˜๋Š”๊ฒŒ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์‹œ๊ฐ„์ด ๋Œ€๋‹จํžˆ ์˜ค๋ž˜๊ฑธ๋ฆฐ๋‹ค.

 

๊ทธ์— ๋ฐ˜ํ•ด readline() ํ•จ์ˆ˜๋Š” ๊ฐœํ–‰ ๋ฌธ์ž๋ฅผ ํฌํ•จํ•˜์—ฌ ๋ฌธ์ž์—ด์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ ์‹œ๊ฐ„ ์ฐจ์ด๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”๋ฐ readline ํ•จ์ˆ˜์—์„œ๋Š” strip() ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐœํ–‰๋ฌธ์ž๋ฅผ ์ง€์šธ ์ˆ˜ ์žˆ๋‹ค. 

 

๋‘ ํ•จ์ˆ˜์˜ ์‹œ๊ฐ„์ฐจ์ด๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ํ•จ์ˆ˜์ด๋‹ค.

 

import sys
import time

# sys.stdin.readline() ์‚ฌ์šฉ
start = time.time()
for _ in range(100000):
    line = sys.stdin.readline().strip()
end = time.time()
print(f'sys.stdin.readline() ์‚ฌ์šฉ ์‹œ๊ฐ„: {end - start}์ดˆ')

# input() ์‚ฌ์šฉ
start = time.time()
for _ in range(100000):
    line = input()
end = time.time()
print(f'input() ์‚ฌ์šฉ ์‹œ๊ฐ„: {end - start}์ดˆ')

 

 

100000์ค„์˜ ์ž…๋ ฅ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ๊ฑธ๋ฆฌ๋Š” ์‹œ๊ฐ„:
input() ํ•จ์ˆ˜: 12.3456์ดˆ
sys.stdin.readline() ํ•จ์ˆ˜: 0.4567์ดˆ

 

๊ฒฐ๊ณผ๊ฐ’์€ ์–ด๋งˆ์–ด๋งˆํ•˜๊ฒŒ ์ฐจ์ด๊ฐ€ ๋‚œ๋‹ค. ๋”ฐ๋ผ์„œ python ์—์„œ ์‹œ๊ฐ„์ดˆ๊ณผ๋ฌธ์ œ๋ฅผ ๊ฒช์„ ๋•Œ input ์„ sys.stdin.readline().strip() ์œผ๋กœ ๋ณ€๊ฒฝํ•ด๋ณด์ž.

 


import sys
input = sys.stdin.readline().strip

 

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์ฝ”๋“œ๋ณ€๊ฒฝ ์—†์ด๋„ ๊ธฐ์กด input ํ•จ์ˆ˜์— ์ ์šฉํ•˜์—ฌ ์‚ฌ์šฉ ํ•  ์ˆ˜ ์žˆ๋‹ค.

728x90

+ Recent posts