Skip to content

franklinbill/QTypist

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

QTypist

A tool to automatically generate the input text. It is like an intelligent typist writing correct input text according to different mobile app input scenarios.

We give source code (./source code/). Please note that our LLM uses OpenAI API. Please replace it with your OpenAI key. Thank you very much for your support for our work!

The use of OpenAI API has been described in detail in Readme. The generation method of training data is also given in Readme.

As shown in the following table, it is all our patterns.

Id Sample of linguistic patterns/rules Examples of linguistic patterns/rules
Patterns related to input widget: IWPn
1 Please input < widget[n] >, the < widget[n] > is Please input game name, the game name is
2 Please input < widget[det+n] >, < widget[det+n] > is Please input your nickname, your nickname is
3 Please < widget[v+n] >, the < widget[n] > is Please search the food, the food is
4 Please < widget[v] > Please search
5 < widget[n] > + $[MASK]$ + < widget[n] > Your weight is [MASK] kg
6 < widget[n] > + $[MASK]$ Your age is [MASK]
7 < widget[prep] > + $[MASK]$ From [MASK]
Patterns related to local context: LCPn
8 < widget[prep] > + $[MASK]$, < widget[prep] > + $[MASK]$ From [MASK], to [MASK]
9 This input is about < local[n] > This input is about the NBA team.
10 This input is about < local[n] >, we need to < local[v+n] > This input is about one-way flight, we need to search the flight information.
11 This input is about < local[n] >, please < local[v] > This input is about your health, please input.
12 This input is about < local[n] >, we need to input < local[n] > This input is about one-way train, we need to input the seat map.
13 This input is about < local[n] >, we need to known it < local[prep] > This input is about your trip, we need to know it from.
Patterns related to global context: GCPn
14 This is < app\ name > app, in its < activity\ name > page, the input category is < input\ category >. This is a NBA sport app, in its search the NBA team page, the input category is query category.
Prompt generation rules
1 < GCPtn > + < LCPtn > + < IWPtn > This is a my movie app, in its search movie page, the input category is query category. This input is about your favorite move in this year. Please search the movie, the movie is
2 < GCPtn > + [< LCPtn > + < IWPtn >]{n} This is a money wallet app, in its personal income page, the input category is numeric category. This input is about your monthly income. Income is [MASK] dollar. This input is about your expenses. Expenses is [MASK] dollar.

You can get the code and tuning data through our code.

Fine tune your gpt-3 as follows, and the effect is the same.

1.We recommend using our OpenAI command-line interface (CLI). To install this, run pip install --upgrade openai

2.(The following instructions work for version 0.9.4 and up. Additionally, the OpenAI CLI requires python 3.)

Set your OPENAI_API_KEY environment variable by adding the following line into your shell initialization script (e.g. .bashrc, zshrc, etc.) or running it in the command line before the fine-tuning command:

export OPENAI_API_KEY="<OPENAI_API_KEY>"

3.Prepare training data

Training data is how you teach GPT-3 what you'd like it to say. Your data must be a JSONL document, where each line is a prompt-completion pair corresponding to a training example. You can use CLI data preparation tool to easily convert your data into this file format.

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

  1. CLI data preparation tool We developed a tool which validates, gives suggestions and reformats your data:

openai tools fine_tunes.prepare_data -f <LOCAL_FILE>

  1. Create a fine-tuned model

openai api fine_tunes.create -t <TRAIN_FILE_ID_OR_PATH> -m Curie

  1. After you've started a fine-tune job, it may take some time to complete. Your job may be queued behind other jobs on our system, and training our model can take minutes or hours depending on the model and dataset size. If the event stream is interrupted for any reason, you can resume it by running:

openai api fine_tunes.follow -i <YOUR_FINE_TUNE_JOB_ID>

  1. Use a fine-tuned model

openai api completions.create -m <FINE_TUNED_MODEL> -p <YOUR_PROMPT>

curl https://api.openai.com/v1/completions \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": YOUR_PROMPT, "model": FINE_TUNED_MODEL}'

import openai openai.Completion.create( model=FINE_TUNED_MODEL, prompt=YOUR_PROMPT)

Since the API of gpt-3 contains personal information, we will give our fine tuned API after the double-blind review.

The key words of our approach are shown in the table.

The data construction algorithm code is as follows:

\begin{algorithm}
\caption{Heuristic-based training data construction}
\KwIn{ $vhief$: associated view hierarchy file; $category$: category of input widget; }
\KwOut{$con$: constructed input content;}

Traverse $vhief$ to obtain EditText / TextView / Spinner / ListView; Obtain the coordinates of EditText / TextView / Spinner / ListView ($x_1$,$y_1$),($x_2$,$y_2$); \textcolor{gray}{//coordinate of upper left and lower right}; Obtain text of TextView ($text$); Obtain hint-text of EditText ($htext$);

\If{$category==filled\ content$'} { \If{search, add, input, enter' not in $htext$} { $con\gets$ $htext$; } }

\If{$category==search\ list$'} { \If{search, input' in $htext$ and $Ey_2 &gt; Ly_1$} { obtain items of listview $item$; $title\gets$ gettitle($item$); $con\gets$ $title$; } }

\If{$category==`popup\ menu$'} { \If{$Tx_2 < Sx_1$} { obtain items of Spinner $item$; $text\gets$ getSpinnertext($item$); $con\gets$ $text$; } }

\If{$category==setting content$'} { \If{setting' in $activityname$} { obtain items of listview $item$; $T1,T2\gets$ getTextView($item$); \If{$T1x_2 < T2x_1$ and $T1y_1 = T2y_1$} { $con\gets$ $title$; } } }

return $con$;
\end{algorithm}

Pilot Study Dataset (./experiment/)

We give the pilot study dataset (./motivation/)

The pilot study dataset for motivation section.

They are the screenshot with text input from Rico, which contains 7000+ screenshots.

Because the storage space of GitHub is limited to 2GB (we use 90% of it), we provide the screenshot, and the rview hierachy file can be downloaded on Rico.

After the double-blind review, we also will upload all of them to my Google drive.

We give experiment dataset (./experiment/)

Dataset (./experiment/)

The experimental dataset for effectiveness evaluation and usefulness evaluation. The first is the apks from effectiveness evaluation, which contains 106 apps, the app information as shown in table.

Because the storage space of GitHub is limited to 2GB, we provide the first 85 apks, and the rest can be downloaded on Google play through the information in the table.

After the double-blind review, we will upload all of them to my Google drive.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 94.1%
  • Shell 2.3%
  • PowerShell 1.3%
  • Nu 1.2%
  • Batchfile 1.1%