Integrating ChatGPT in R: A Comprehensive Guide

 Nirmal Ghimire, Ph.D.

This is how I often set up my global options that help me keep my output nice and clean. Click on the Show Code button to find the actual code.

Note: You can actually copy and paste the codes provide in this report.

ChatGPT’s reputation has been utterly hyped at the time this post is written. It makes news every single day. If you are a college or high school student, you have probably used it. If not, you have heard about it but you are trying to navigate why it is a such a big matter. In academia chatGPT can analyze data, write essays or other components of an article and many more. Many of the most recent publications have included chatGPT as co-authors. If not used properly, it may change the way we look at the world and/or ourselves. It is very scary indeed. But, the fact is, it's going to be around us forever. 

I am not going to discuss it, today. If you want to know more about chatGPT and its possible impact in various areas, please read this Wikipedia post: https://en.wikipedia.org/wiki/ChatGPT.

For me, I am going to show some simple steps to integrate chatGPT in R. Here’s how the process unfold:

Step 1: Get the chatGPT API

To be able to run ChatGPT through R terminal, we just need to access your chatGPT Application Programming Interface (API) key. For this you need to have an existing chatGPT account. If not get once. Here’s the link to either login or sign up: https://chat.openai.com/auth/login.

Once logged on to the chatGPT, you will land on this page: https://platform.openai.com/overview

Then, click on the Personal button on the top right hand side of the window, then View API Keys. You should you able to access your existing API keys. If not, you have to click on the Create new secret key button to create one. Then copy the code and head out to your favorite R IDE (like R Studio). The API code contains a bunch of letters put together and looks something like this:

                                    sksuAyOFEE......................Zzvblzs

You can access the complete information about setting up chatGPT here: https://platform.openai.com/docs/api-reference.

It is easy to set up on Python but not as straight in R. For the newbies in R, here’s one way to set this up. Remember, you have to follow this instruction every time you want to run chatGPT in R, until somebody comes up with some permanent fix.

Note: The Good news is, people are working on it. We may have chatGPT package in R for easy interface, very soon. It is more likely than not.

Step 2: Load Required Libraries

We need to use the curl() function to establish connection to our chatGPT account. It is done using the {httr} package in R. For Now, we just need just need the {httr} package in R. This package makes your journey easy when you want to integrate HTTP world into R. For example authenticate(), or control additional components.

It works absolutely fine while working with chatGPT but the output will be very messy. As we know cleaning output requires other packages as well. I am going to load the {stringr} package. I am going to show the first output withtout stringr{}, and the second a little clearner output.

Step 3: Run chatGPT

Please follow the step-by-step to help yourself successfully run chatGPT.

3.a. Save chatGPT API as an Object

First, lets run it and come up with the original output without any clean up:

Now that I have installed and loaded the {httr} package. I am ready to try chat GPT. To do so, I need to copy my API save it as an object. Here’s How I do it,

I saved my API as chatGPT_API. I have to remember it for future use.

Note: chatGPT or other API’s are personal information. You should not share them with anyone.

3.b. Use the Curl Command and Run the chatGPT

We, then need to establish the connection between R and our chatGPT account. The chunk below a little complex as there are a few thing going at the same time.

  • Establish the connection using the actual chatGPT website
  • Get Authorization using your personalized (chatGPT_API, in my case)
  • Output Type (I am using json)
  • Customize the output options
  • Ask Question to chatGPT
  • Save the output as an object (I am going to save it as chatGPT_response)

My question to chatGPT is “Why do people cry?”. Let’s see what chatGP’s answer is.

chatGPT_response <- POST(
  # use chatGPT website (you can copy paste)
  url = "https://api.openai.com/v1/chat/completions",
  # Authorize
  add_headers(Authorization = paste("Bearer", chatGPT_API)),
  # Output type: use JSON
  content_type_json(),
  # encode the value to json format
  encode = "json",
  # Controlling what to show as the output, it's going to be a list of following things
  body = list(
    model = "gpt-3.5-turbo-0301", # Use gpt-3.5 is very fast
    messages = list(list(role = "user", content = "Why do people cry?"))
  )
)
# Print chatGPT's Answer
content(chatGPT_response)
$id
[1] "chatcmpl-6qDlWHDv6Pkfb0ht1hEbQKg194N0b"

$object
[1] "chat.completion"

$created
[1] 1677903990

$model
[1] "gpt-3.5-turbo-0301"

$usage
$usage$prompt_tokens
[1] 12

$usage$completion_tokens
[1] 108

$usage$total_tokens
[1] 120


$choices
$choices[[1]]
$choices[[1]]$message
$choices[[1]]$message$role
[1] "assistant"

$choices[[1]]$message$content
[1] "\n\nAs an AI language model, I do not have feelings and emotions like humans do. However, I can provide information on why people cry. \n\nPeople cry for various reasons, including expressing emotions such as sadness, grief, joy, frustration, or relief. Crying can be a physical response to emotional stimuli, such as a sad movie or a personal experience. Tears can also be a response to physical pain, allergies, or extreme temperatures. Additionally, crying can serve a practical purpose for cleansing the eyes and promoting overall eye health."


$choices[[1]]$finish_reason
[1] "stop"

$choices[[1]]$index
[1] 0

I got the output. Within the pile of the information, I have the actual answer the chatGPT gave to my question. But, I can’t even read all. The lines did not break, and there are many unnecessary outputs.

3.c. Clean the Output Using {stringr}

I will now use the {stringr} to clean the output and display just the message.

# Selecting the portion we want to display
answer_one <- content(chatGPT_response)$choices[[1]]$message$content
# cleaning the selected output
answer_one <- stringr::str_trim(answer_one)
# Printing the message as a character string
cat(answer_one)
As an AI language model, I do not have feelings and emotions like humans do. However, I can provide information on why people cry. 

People cry for various reasons, including expressing emotions such as sadness, grief, joy, frustration, or relief. Crying can be a physical response to emotional stimuli, such as a sad movie or a personal experience. Tears can also be a response to physical pain, allergies, or extreme temperatures. Additionally, crying can serve a practical purpose for cleansing the eyes and promoting overall eye health.

Step 4. Put Everything Together and Create a Single Function

I have been able to use chatGPT via R. But it is a massive task. You can create a function that you can copy and paste whenever you want to use it with R.

  • Created a function named answer_my_question, which will take
    • chatGPT url as an input,
    • uses provided personal API key to get into the chatGPT account
    • uses the most advanced and lightning fast processor “gpt-3.5-turbo-0301”
    • creates a list of answers given by chatGPT
    • uses answer_my_question function to identify my question
    • saves the answer in an object named “chat_GPT_answer”, and
    • the particular output (just the answer given by the chatGPT) is selected, while
    • all of the above activities are conducted under the surface and the outputs are saved in an object called “hey_chatGPT”

It may sound a lot, but it’s simple doing than saying.

Here’s how we do it:

# Get and Save Personal API
my_API <- "sk-............cMtl1Zzvblzs"
# Loading Required Libraries
library(stringr)
library(httr)
# Asking Questions to ChatGPT, Saving and Cleaning Answer
hey_chatGPT <- function(answer_my_question) {
  chat_GPT_answer <- POST(
    url = "https://api.openai.com/v1/chat/completions",
    add_headers(Authorization = paste("Bearer", my_API)),
    content_type_json(),
    encode = "json",
    body = list(
      model = "gpt-3.5-turbo-0301",
      messages = list(
        list(
          role = "user",
          content = answer_my_question
        )
      )
    )
  )
  str_trim(content(chat_GPT_answer)$choices[[1]]$message$content)
}

Step 5. Test the Single Funtion: “What are the difference between R and Python?”

Now, the function has been created and everything is working. Let’s ask questions and find chatGPT’s answers. Something related to R itself.

response <- hey_chatGPT("What are the difference between R and Python?")
cat(response)
As an AI language model, I do not have any personal experience of using these languages. However, the following are the major differences between R and Python:

1. Application Domain: R is mainly used for statistical modeling, data analysis, and visualization, whereas Python has a broader application domain, including web development, machine learning, artificial intelligence, and automation.

2. Learning Curve: Due to its syntax and design, R has a steeper learning curve than Python. Beginners may find it challenging to learn and use R, while Python is known for its simple syntax that is easy to grasp.

3. Libraries: Both R and Python have dedicated libraries for data analysis and visualization, but R has a more extensive collection of statistical libraries, while Python has more libraries for machine learning and scientific computing.

4. Performance: Python is generally faster than R when it comes to executing code, but R excels in handling complex statistical operations.

5. Syntax: Python has a cleaner and more readable syntax than R, which can sometimes have messy and complex syntax.

In summary, R is a statistical programming language that is powerful for data analysis, while Python is more versatile and suitable for different domains, including web development and machine learning.

Wow! Here’s the answers! It is siccck.

Step 6. Asking chatGPT How to Create a Simulated Data

I am now going to check if I can ask chatGPT about creating a data set in R. Let’s try:

data_1 <- hey_chatGPT("How can I create a dataset with X, Y, and Z numerical variables having 200 random numbers in each using R?")
cat(data_1)
You can create a dataset with X, Y, and Z numerical variables having 200 random numbers in each using the following R code:

``` r
# Generate random data
set.seed(123)
X <- rnorm(200)
Y <- rnorm(200)
Z <- rnorm(200)

# Combine into a data frame
df <- data.frame(X, Y, Z)

# View the first few rows of the data frame
head(df)
```

This code generates 200 random numbers for each variable X, Y, and Z, combines them into a data frame, and prints the first few rows of the data. You can adjust the number of random numbers generated by changing the argument to `rnorm()`.

Amazing!!

Step 7. Check If Data Set Can be Created Using Suggested Method

Now, lets follow the steps suggested by ChatGPT and create the Dataset.

set.seed(123)
X <- rnorm(200)
Y <- rnorm(200)
Z <- rnorm(200)
data <- data.frame(X, Y, Z)
head(data)
            X          Y           Z
1 -0.56047565  2.1988103 -0.07355602
2 -0.23017749  1.3124130 -1.16865142
3  1.55870831 -0.2651451 -0.63474826
4  0.07050839  0.5431941 -0.02884155
5  0.12928774 -0.4143399  0.67069597
6  1.71506499 -0.4762469 -1.65054654

That did absolutely work. What’s next? 

Step 8. Running Regression

simple_lm <- lm(X ~ Y + Z, data = data)
findings_lm <- summary(simple_lm)
summary(simple_lm)

Call:
lm(formula = X ~ Y + Z, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2847 -0.6384 -0.0621  0.5846  3.3220 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.006414   0.067071  -0.096    0.924
Y           -0.027782   0.067502  -0.412    0.681
Z           -0.031029   0.069679  -0.445    0.657

Residual standard error: 0.9471 on 197 degrees of freedom
Multiple R-squared:  0.001772,  Adjusted R-squared:  -0.008362 
F-statistic: 0.1749 on 2 and 197 DF,  p-value: 0.8397

Yes. I ran a simple linear regression on the created data. I also wanted chatGPT to interpret the findings but it said, it could not. I tried to ask the questions different ways, but none worked. Here’s an example:

Step 9. Asking chatGPT to Interpret Findings

read_table <- hey_chatGPT("read the table findings_lm.")
cat(read_table)
As an AI language model, I do not have the ability to access external files or tables. Please provide the necessary information or upload the table for me to analyze.

I understand that chatGPT is a language model. It may not be actually capable of doing anything other than using language to communicate.

Looks like, as of today, the chatGPT is not able to read data tables and explain/interpret the findings of an analysis, as easy as a linear model. It maybe my fault, as well. In other words, stating some other ways may actually work. Or, I am using the free version of chatGPT, which may not be as capable as their plus version. If not, it is one of the limitations, so far.

What other limitation does it have? Does it Draw Pictures? Or can it simply copy and paste a picture from the internet? Let’s see:

Step 10. Does chatGPT Draw?

hey_chatGPT("Image Search Nepal.")
[1] "I am sorry, but as an AI language model, I am not able to perform an image search. However, you can easily search for images of Nepal by using any popular search engine such as Google, Bing, or Yahoo. Just type in your query and press enter, and you will find a plethora of images related to Nepal."
hey_chatGPT("draw a map of the United States of America")
[1] "I'm sorry, as an AI language model, I don't have the capability to create images or drawings. However, you can easily find various maps of the United States of America on the internet."

I tried to force chatGPT to draw the map of different countries but it could not. However, if you read the responses, chatGPT very clearly knows what it can do. It’s language use (grammar, vocabulary, stylistics, etc.) is very pristine. Much better than mine! The answers are comprehensive and are in complete sentences. It also offers some solution to the problems, even though it can’t do exactly what you want it to do.

Step 11. And Finally …

Finally, my daughter was stuck with this questions for a few days now. She was asking this question to everybody she knew. Unfortunately, nobody could answer it to her satisfaction. The question is pretty good, thus I wanted to test chatGPT whether it understands the question correctly. If it does how does it answer? And if, the answer would satisfy my daughter’s curiosity. Here’s what it did:

aani_question <- hey_chatGPT("
                             Is the fruit orange named orange because it is the
                             color orange or is the color orange named orange
                             because the fruit orange is orange?")
cat(aani_question)
As an AI language model, I cannot really say what came first - the fruit or the color. However, it is believed that the fruit was named first, and then the color was named after it. Before the fruit was introduced to Europe, the color was referred to as "yellow-red" or "saffron" and was not commonly used or named. Once the fruit became popular and was imported to Europe, the color began being associated with the fruit and was eventually named after it, hence the color orange.

It did absolutely understand the question! It also understood the humor in the question! LOL.

Every time I ran this prompt, the answer changed a little bit and they got better and better as they started being based in facts like when the Orange fruit got named “orange” etc.

Comments

Popular posts from this blog

Education Matters: Understanding Nepal’s Education (Publication Date: June 19, 2023, Ratopati-English, Link at the End)

Multiple Correspondence Analysis (MCA) in Educational Data

charting Concept and Computation: Maps for the Deep Learning Frontier