# BERT finetuning tasks in 5 minutes with Cloud TPU

<table class="tfo-notebook-buttons" align="left" >
 <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>


**BERT**, or **B**idirectional **E**mbedding **R**epresentations from **T**ransformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. The academic paper can be found here: https://arxiv.org/abs/1810.04805.

This Colab demonstates using a free Colab Cloud TPU to fine-tune sentence and sentence-pair classification tasks built on top of pretrained BERT models.

**Note:**  You will need a GCP (Google Compute Engine) account and a GCS (Google Cloud 
Storage) bucket for this Colab to run.

Please follow the [Google Cloud TPU quickstart](https://cloud.google.com/tpu/docs/quickstart) for how to create GCP account and GCS bucket. You have [$300 free credit](https://cloud.google.com/free/) to get started with any GCP product. You can learn more about Cloud TPU at https://cloud.google.com/tpu/docs.

Once you finish the setup, let's start!

**Firstly**, we need to set up Colab TPU running environment, verify a TPU device is succesfully connected and upload credentials to TPU for GCS bucket usage.

In [0]:
import datetime
import json
import os
import pprint
import random
import string
import sys
import tensorflow as tf

assert 'COLAB_TPU_ADDR' in os.environ, 'ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!'
TPU_ADDRESS = 'grpc://' + os.environ['COLAB_TPU_ADDR']
print('TPU address is', TPU_ADDRESS)

from google.colab import auth
auth.authenticate_user()
with tf.Session(TPU_ADDRESS) as session:
  print('TPU devices:')
  pprint.pprint(session.list_devices())

  # Upload credentials to TPU.
  with open('/content/adc.json', 'r') as f:
    auth_info = json.load(f)
  tf.contrib.cloud.configure_gcs(session, credentials=auth_info)
  # Now credentials are set for all future sessions on this TPU.

TPU address is grpc://10.77.133.178:8470
TPU devices:
[_DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 11193819257324014272),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 11743471906924308223),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 8817390715541045037),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 13647766437076739427),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 12291117123528086541),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 5167520003982709723),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 1806457377803209548),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 10367945529533593525),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 1182173078

**Secondly**, prepare and import BERT modules.

In [0]:
!rm -rf bert

In [0]:
import sys

!test -d bert || git clone https://github.com/shalmolighosh/bert/
if not 'bert' in sys.path:
  sys.path += ['bert']

In [0]:
#!cat bert/run_classifier.py | grep -C 10 _read_csv

In [0]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [0]:
!ls gdrive/My\ Drive/BERT/Data

Atheism  CC  FM  HC  LA


**Thirdly**, prepare for training:

*  Specify task and download training data.
*  Specify BERT pretrained model
*  Specify GS bucket, create output directory for model checkpoints and eval results.



In [0]:
TASK = 'HC' #@param {type:"string"}
assert TASK in ('MRPC', 'CoLA','Atheism','CC','HC','LA','FM','ALL'), 'Only (MRPC, CoLA, Sem) are demonstrated here.'
# Download glue data.
if TASK=='MRPC' or TASK == 'CoLA':
  ! test -d download_glue_repo || git clone https://gist.github.com/60c2bdb54d156a41194446737ce03e2e.git download_glue_repo
  !python download_glue_repo/download_glue_data.py --data_dir='glue_data' --tasks=$TASK
  TASK_DATA_DIR = 'glue_data/' + TASK

elif TASK!='ALL':
  TASK_DATA_DIR = 'gdrive/My\ Drive/BERT/Data/' + TASK

else:
  TASK_DATA_DIR = 'gdrive/My\ Drive/BERT/Data/'
  
  
print('***** Task data directory: {} *****'.format(TASK_DATA_DIR))
!ls $TASK_DATA_DIR

# Available pretrained model checkpoints:
#   uncased_L-12_H-768_A-12: uncased BERT base model
#   uncased_L-24_H-1024_A-16: uncased BERT large model
#   cased_L-12_H-768_A-12: cased BERT large model
BERT_MODEL = 'uncased_L-24_H-1024_A-16' #@param {type:"string"}
BERT_PRETRAINED_DIR = 'gs://cloud-tpu-checkpoints/bert/' + BERT_MODEL
print('***** BERT pretrained directory: {} *****'.format(BERT_PRETRAINED_DIR))
!gsutil ls $BERT_PRETRAINED_DIR

BUCKET = 'bert-large-pair' #@param {type:"string"}
assert BUCKET, 'Must specify an existing GCS bucket name'
OUTPUT_DIR = 'gs://{}/bert/models/{}/{}_new'.format(BUCKET,BERT_MODEL ,TASK)
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))


***** Task data directory: gdrive/My\ Drive/BERT/Data/HC *****
test_preprocessed.csv  train_preprocessed.csv
***** BERT pretrained directory: gs://cloud-tpu-checkpoints/bert/uncased_L-24_H-1024_A-16 *****
gs://cloud-tpu-checkpoints/bert/uncased_L-24_H-1024_A-16/bert_config.json
gs://cloud-tpu-checkpoints/bert/uncased_L-24_H-1024_A-16/bert_model.ckpt.data-00000-of-00001
gs://cloud-tpu-checkpoints/bert/uncased_L-24_H-1024_A-16/bert_model.ckpt.index
gs://cloud-tpu-checkpoints/bert/uncased_L-24_H-1024_A-16/bert_model.ckpt.meta
gs://cloud-tpu-checkpoints/bert/uncased_L-24_H-1024_A-16/checkpoint
gs://cloud-tpu-checkpoints/bert/uncased_L-24_H-1024_A-16/vocab.txt
***** Model output directory: gs://bert-large-pair/bert/models/uncased_L-24_H-1024_A-16/HC_new *****


In [0]:
#!gsutil cp gs://bert-final/bert/models/Atheism/* gs://bert-large-pair/bert/models/uncased_L-24_H-1024_A-16/Atheism

**Now, let's play!**

In [0]:
# Setup task specific model and TPU running config.

import modeling
import optimization
import run_classifier
import tokenization

if TASK!='ALL':
  TASK_DATA_DIR = 'gdrive/My Drive/BERT/Data/' + TASK
else:
  TASK_DATA_DIR = 'gdrive/My Drive/BERT/Data/'
  
# Model Hyper Parameters
TRAIN_BATCH_SIZE = 32
EVAL_BATCH_SIZE = 8
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = "11" #@param {type:"string"}
NUM_TRAIN_EPOCHS = int(NUM_TRAIN_EPOCHS)
WARMUP_PROPORTION = 0.1
MAX_SEQ_LENGTH = 128
# Model configs
SAVE_CHECKPOINTS_STEPS = 1000
ITERATIONS_PER_LOOP = 1000
NUM_TPU_CORES = 8
VOCAB_FILE = os.path.join(BERT_PRETRAINED_DIR, 'vocab.txt')
CONFIG_FILE = os.path.join(BERT_PRETRAINED_DIR, 'bert_config.json')
INIT_CHECKPOINT = os.path.join(BERT_PRETRAINED_DIR, 'bert_model.ckpt')
DO_LOWER_CASE = BERT_MODEL.startswith('uncased')

processors = {
  "cola": run_classifier.ColaProcessor,
  "mnli": run_classifier.MnliProcessor,
  "mrpc": run_classifier.MrpcProcessor,
  "hc"  : run_classifier.SemProcessor,
    "atheism" : run_classifier.SemProcessor,
    "fm" : run_classifier.SemProcessor,
    "cc" : run_classifier.SemProcessor,
    "la" : run_classifier.SemProcessor,
    "all" : run_classifier.SemProcessor
}
print(processors[TASK.lower()])

tokenizer = tokenization.FullTokenizer(vocab_file=VOCAB_FILE, do_lower_case=DO_LOWER_CASE)

tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(TPU_ADDRESS)
run_config = tf.contrib.tpu.RunConfig(
    cluster=tpu_cluster_resolver,
    model_dir=OUTPUT_DIR,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS,
    tpu_config=tf.contrib.tpu.TPUConfig(
        iterations_per_loop=ITERATIONS_PER_LOOP,
        num_shards=NUM_TPU_CORES,
        per_host_input_for_training=tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2))

if TASK == 'ALL':
  train_examples = []
  full_forms = {'HC' : 'hillary clinton', 'CC' : 'climate change is a concern','Atheism' : 'Atheism', 'LA' : 'Legalisation of Abortion', 'FM' : 'Feminist Movement'}
  for key,value in full_forms.items():
    processor = run_classifier.SemProcessor(use_pair=True, topic = value)
    label_list = processor.get_labels()
    train_examples += processor.get_train_examples(TASK_DATA_DIR+key)   

else:
  full_forms = {'HC' : 'hillary clinton', 'CC' : 'climate change is a concern','Atheism' : 'Atheism', 'LA' : 'Legalisation of Abortion', 'FM' : 'Feminist Movement'}
  processor = processors[TASK.lower()](use_pair=False,\
                                       topic=full_forms[TASK])
  label_list = processor.get_labels()
  train_examples = processor.get_train_examples(TASK_DATA_DIR)


print("Number of train examples :",len(train_examples))
  
num_train_steps = int(
    len(train_examples) / TRAIN_BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

model_fn = run_classifier.model_fn_builder(
    bert_config=modeling.BertConfig.from_json_file(CONFIG_FILE),
    num_labels=len(label_list),
    init_checkpoint=INIT_CHECKPOINT,
    learning_rate=LEARNING_RATE,
    num_train_steps=num_train_steps,
    num_warmup_steps=num_warmup_steps,
    use_tpu=True,
    use_one_hot_embeddings=True)

estimator = tf.contrib.tpu.TPUEstimator(
    use_tpu=True,
    model_fn=model_fn,
    config=run_config,
    train_batch_size=TRAIN_BATCH_SIZE,
    eval_batch_size=EVAL_BATCH_SIZE)

<class 'run_classifier.SemProcessor'>
Number of train examples : 639
INFO:tensorflow:Using config: {'_model_dir': 'gs://bert-large-pair/bert/models/uncased_L-24_H-1024_A-16/HC_new', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.77.133.178:8470"
    }
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fcce37e1208>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'grpc://10.77.133.178:8470', '_evaluation_master': 'grpc://10.77.133.178:8470', '_is_chief': True, '_num_ps_replicas': 0, '_

In [0]:
import csv
def read_csv(input_file, quotechar=None):
  """Reads a tab separated value file."""
  with tf.gfile.Open(input_file, "r") as f:
    reader = reader = csv.reader(f)
    lines = []
    for line in reader:
      if sys.version_info[0]==2:
        line = list(unicode(cell, 'utf-8') for cell in line)
      lines.append(line)
    return lines


In [0]:
#lines = read_csv(TASK_DATA_DIR+'/train_preprocessed.csv')

In [0]:
#lines

In [0]:
# Train the model.
print('MRPC/CoLA on BERT base model normally takes about 2-3 minutes. Please wait...')
train_features = run_classifier.convert_examples_to_features(
    train_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
print('***** Started training at {} *****'.format(datetime.datetime.now()))
print('  Num examples = {}'.format(len(train_examples)))
print('  Batch size = {}'.format(TRAIN_BATCH_SIZE))
tf.logging.info("  Num steps = %d", num_train_steps)
train_input_fn = run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=True)
#estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print('***** Finished training at {} *****'.format(datetime.datetime.now()))

MRPC/CoLA on BERT base model normally takes about 2-3 minutes. Please wait...
INFO:tensorflow:Writing example 0 of 639
INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: train-0
INFO:tensorflow:tokens: [CLS] rt gunn jessica because i want young american women to be able to be proud of the 1st woman president [SEP]
INFO:tensorflow:input_ids: 101 19387 22079 8201 2138 1045 2215 2402 2137 2308 2000 2022 2583 2000 2022 7098 1997 1996 3083 2450 2343 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0

In [0]:
estimator = tf.contrib.tpu.TPUEstimator(
    use_tpu=True,
    model_fn=model_fn,
    config=run_config,
    train_batch_size=TRAIN_BATCH_SIZE,
    eval_batch_size=EVAL_BATCH_SIZE,
    predict_batch_size = EVAL_BATCH_SIZE)

INFO:tensorflow:Using config: {'_model_dir': 'gs://bert-large-pair/bert/models/uncased_L-24_H-1024_A-16/HC_new', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.77.133.178:8470"
    }
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fcce2de4c50>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'grpc://10.77.133.178:8470', '_evaluation_master': 'grpc://10.77.133.178:8470', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop

In [0]:
# Eval the model.
if TASK != 'ALL':
  eval_examples = processor.get_dev_examples(TASK_DATA_DIR)
  
else:
  eval_examples = []
  full_forms = {'HC' : 'hillary clinton', 'CC' : 'climate change is a concern','Atheism' : 'Atheism', 'LA' : 'Legalisation of Abortion', 'FM' : 'Feminist Movement'}
  for key,value in full_forms.items():
    processor = run_classifier.SemProcessor(use_pair=True, topic = value)
    eval_examples += processor.get_dev_examples(TASK_DATA_DIR+key)   

eval_features = run_classifier.convert_examples_to_features(
    eval_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
print('***** Started evaluation at {} *****'.format(datetime.datetime.now()))
print('  Num examples = {}'.format(len(eval_examples)))
print('  Batch size = {}'.format(EVAL_BATCH_SIZE))
# Eval will be slightly WRONG on the TPU because it will truncate
# the last batch.
eval_steps = int(len(eval_examples) / EVAL_BATCH_SIZE)
eval_input_fn = run_classifier.input_fn_builder(
    features=eval_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=True)
#result = estimator.evaluate(input_fn=eval_input_fn, steps=eval_steps)
print('***** Finished evaluation at {} *****'.format(datetime.datetime.now()))
#output_eval_file = os.path.join(OUTPUT_DIR, "eval_results.txt")
#with tf.gfile.GFile(output_eval_file, "w") as writer:
#  print("***** Eval results *****")
#  for key in sorted(result.keys()):
#    print('  {} = {}'.format(key, str(result[key])))
#    writer.write("%s = %s\n" % (key, str(result[key])))


INFO:tensorflow:Writing example 0 of 295
INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: dev-0
INFO:tensorflow:tokens: [CLS] # mt ##p # meet the press how is del ##eti ##ng emails part of the government record different from eras ##ing parts of a tape ? # nixon # # p ##2 # [SEP]
INFO:tensorflow:input_ids: 101 1001 11047 2361 1001 3113 1996 2811 2129 2003 3972 20624 3070 22028 2112 1997 1996 2231 2501 2367 2013 28500 2075 3033 1997 1037 6823 1029 1001 11296 1001 1001 1052 2475 1001 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0

In [0]:
#?? estimator

In [0]:
preds = estimator.predict(
              input_fn=eval_input_fn)

In [0]:
all_preds = []
for pred in preds:
  all_preds.append(pred)


INFO:tensorflow:Querying Tensorflow master (grpc://10.77.133.178:8470) for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 11193819257324014272)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 11743471906924308223)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 8817390715541045037)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 13647766437076739427)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 12291117123528086541)
INFO:tensorflow:*** Available Device: _DeviceAttribut

In [0]:
import numpy as np
np.argmax(all_preds[0]['probabilities'])

0

In [0]:
test = eval_examples

In [0]:
test[0].text_a

'# mtp # meet the press how is deleting emails part of the government record different from erasing parts of a tape ? # nixon # # p2 #'

In [0]:
import numpy as np
matrix = np.array([[0,0,0],[0,0,0]])
for i in range(len(all_preds)):
  gold = int(test[i].label)
  pred = np.argmax(all_preds[i]['probabilities'])
  if gold<2:
    matrix[gold][2]+=1
  if pred < 2:
    matrix[pred][1]+=1
    if gold == pred:
      matrix[gold][0]+=1
  
print(matrix)
a = matrix[0][0]/(matrix[0][1]+matrix[0][2]+1e-5)
b = matrix[1][0]/(matrix[1][1]+matrix[1][2]+1e-5)
print("fscore - ",a+b)
  

[[151 203 172]
 [ 26  32  45]]
fscore -  0.7403290043290043


In [0]:
labels_dict = ["oppose","support","neutral"]
tweets = ["tweet"]+[t.text_a for t in test]
gold_labels = ["correct"]+[labels_dict[int(t.label)] for t in test]
pred_labels = ["predicted"]+[labels_dict[np.argmax(t['probabilities'])] for t in all_preds]

In [0]:
tweets[0],gold_labels[0],pred_labels[0]

('tweet', 'correct', 'predicted')

In [0]:
np.savetxt('{}.csv'.format(TASK), [p for p in zip(tweets, gold_labels, pred_labels)], delimiter='\t', fmt='%s')

In [0]:
!ls

adc.json  bert	gdrive	HC.csv	sample_data


In [0]:
import pandas as pd
df = pd.read_csv("{}.csv".format(TASK),sep='\t')
df.head()

Unnamed: 0,tweet,correct,predicted
0,# mtp # meet the press how is deleting emails ...,oppose,oppose
1,jd son 78 andrew b roe ring andrew why do you ...,oppose,oppose
2,the white male vote is solidly gop the black v...,oppose,neutral
3,ny investing big banker buds need to ratchet u...,oppose,oppose
4,gop why should i believe you on this ? the gop...,oppose,oppose


In [0]:
df.tail()

Unnamed: 0,tweet,correct,predicted
290,hillary clinton looking forward too hearing yo...,support,support
291,mata hari krishna i'm loving it too ! draw tha...,neutral,neutral
292,"finney k ca n't stand msnbc anymore , but hope...",support,oppose
293,hillary ca n't create jobs ! last time she had...,oppose,oppose
294,it 's amazing to me how if you want a secure b...,neutral,neutral


In [0]:
from google.colab import files
files.download('HC.csv') 

In [0]:
df.to_csv('gdrive/My Drive/BERT/HC.csv')

In [0]:
SAVE_DATA_DIR = 'gdrive/My Drive/BERT/

SyntaxError: ignored

In [0]:
df.to_csv('gdrive/My Drive/BERT/')

IsADirectoryError: ignored