TECH

[BERT] Google Colab ์‚ฌ์šฉํ•˜์—ฌ BERT๋ฅผ ํ†ตํ•œ ์š”์•ฝ ๋ชจ๋ธ ํ•™์Šต (1)

ttaerrim 2021. 7. 24. 15:57

KoBertSum ๋ชจ๋ธ์„ ์ฐธ๊ณ ํ•˜์—ฌ ์ž‘์„ฑ๋œ ํฌ์ŠคํŒ…์ž…๋‹ˆ๋‹ค.

 

KoBertSum์€ ext ๋ฐ abs summarizatoin ๋ถ„์•ผ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋Š” BertSum๋ชจ๋ธ์„ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ˆ˜์ •ํ•œ ํ•œ๊ตญ์–ด ์š”์•ฝ ๋ชจ๋ธ์ด๋‹ค. ์กธ์—… ์ž‘ํ’ˆ ์ฃผ์ œ๋ฅผ ์ธ๊ณต์ง€๋Šฅ ํšŒ์˜๋ก ์„œ๋น„์Šค๋กœ ์ •ํ–ˆ๊ณ , ๊ทธ์ค‘ ํšŒ์˜๋ก ์š”์•ฝ ์„œ๋น„์Šค์— BERT ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์š”์•ฝ๋ฌธ์„ ์ œ๊ณตํ•˜๊ธฐ๋กœ ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— BERT๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ค‘์ด๋‹ค... (์ž˜๋ ์ง€๋Š” ๋ชจ๋ฅด๊ฒ ์Œ)

 

 

ํ”„๋กœ์ ํŠธ ์„ค์ •

 

๋จผ์ € ๋“œ๋ผ์ด๋ธŒ๋ฅผ ๋งˆ์šดํŠธํ•œ ํ›„ KoBertSum ๋ ˆํฌ์ง€ํ† ๋ฆฌ๋ฅผ ํด๋ก ํ•œ๋‹ค.

from google.colab import drive
drive.mount('/content/drive')
cd drive/MyDrive
!git clone https://github.com/uoneway/KoBertSum.git

 

ํ•œ๊ตญ์–ด ๋ฌธ์„œ ์ถ”์ถœ์š”์•ฝ ๊ฒฝ์ง„๋Œ€ํšŒ์˜ ๋ฐ์ดํ„ฐ๋„ ๋‹ค์šด๋ฐ›์•„ ext/data/raw์— ๋„ฃ์–ด ์ค€๋‹ค.

 

 

ํ•„์š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜

 

๋จผ์ € ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•œ๋‹ค

python main.py -task install

 

 

๋ฐ์ดํ„ฐ Preprocessing

 

python main.py -task make_data -n_cpus 2

๋ฅผ ์‹คํ–‰ํ•œ๋‹ค. ํŒจํ‚ค์ง€ uninstall ์˜ค๋ฅ˜๊ฐ€ ๋งŽ์ด ๋‚œ๋‹ค๋ฉด ์ˆ˜๋™์œผ๋กœ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•ด ์ค€๋‹ค. 

 

์ฝ”๋žฉ์œผ๋กœ ํ–ˆ์„ ๋•Œ๋Š” ๊ดœ์ฐฎ์•˜๋Š”๋ฐ ์ดˆ๊ธฐ์— IDE๋กœ ํ–ˆ์„ ๋•Œ๋Š” ์˜ค๋ฅ˜๊ฐ€ ๋งŽ์ด ๋‚˜์„œ ๋”ฐ๋กœ torch, multiprocess, transformers, pyrouge, sentencepiece๋ฅผ ์ถ”๊ฐ€๋กœ ์„ค์น˜ํ–ˆ๋‹ค.

pip install torch, multiprocess, transformers, pyrouge, sentencepiece

 

์ฝ”๋žฉ์—์„œ๋Š” mecab์ด ์„ค์น˜๋˜์ง€ ์•Š์•„ ๋”ฐ๋กœ ์„ค์น˜ํ•ด ์ฃผ์—ˆ๋‹ค.

!sudo apt-get install curl git
!bash <(curl -s https://raw.githubusercontent.com/konlpy/konlpy/master/scripts/mecab.sh)

 

ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ „๋ถ€ ์„ค์น˜ํ•˜๊ณ  ๋‹ค์‹œ ์‹คํ–‰ํ•˜๋ฉด 

๊ฒฐ๊ณผ๊ฐ€ ext/data/bert_data/train_abs์™€ ext/data/bert_data/valid_abs์— ์ €์žฅ๋œ๋‹ค.