vcNLP's tests are grouped under different test types/categories and vary based on tasks:
-
security: cyber-security / adversarial attacks (imperceptible/invisible perturbations tests)
-
insert zero width space:​ insert Unicode character U+200B in a random place (between characters) in each word of sentence.
-
insert zero width joiner: insert Unicode character U+200D in a random place (between characters) in each word of sentence.
-
insert zero width non-joiner: insert Unicode character U+200C in a random place (between characters) in each word of sentence.
-
replace with homoglyphs: A homoglyph is a character that is visually similar to another. For the following 12 homoglyph pairs, we replaced each character, in every word of a sentence, with its homoglyph:
-
a, U+0430 (cyrillic small letter a)​
-
c, U+0441 (cyrillic small letter Es)
-
e, U+0435 (cyrillic small letter Ie)
-
h, U+04BB (cyrillic small letter Shha)
-
i, U+0456 (cyrillic small letter Byelorussian-Ukrainian I)
-
j, U+03F3 (Greek Letter Yot)
-
n, U+0578 (Armenian Small Letter Vo)
-
o, U+043E (Cyrillic Small Letter O)
-
p, U+0440 (Cyrillic Small Letter Er)
-
q, U+0566 (Armenian Small Letter Za)
-
x, U+0445 (Cyrillic Small Letter Ha)
-
y, U+0443 (Cyrillic Small Letter U)
-
-
-
mutations/errors: perturbations to original sentence
-
paraphrase the sentence: We created a paraphrase generator using the T5-base pre-trained transformer model, from the Huggingface model hub. We trained this T5 model using the ParaNMT-50 dataset.
-
punctuation: strip punctuation and/or add "." .
-
typos: add one typo to input by swapping two adjacent characters.
-
2 typos: add two typos to input by swapping two adjacent characters twice.
-
contractions: Contract or expand contractions, e.g. What is -> What\'s.
-
add random sentences to context: This pertains mainly to the question-answering task.
-
-
entityChange: change entities in the sentence
-
change numbers: replace integers with random integers within a 20% radius of the original.​
-
change locations: replace city or country names with other cities or countries.
-
change names: replace names with other common names.
-
-
bias: create new sentences to test model fairness in sensitive / protected categories
-
racial bias: sentences with different races.​
-
sexual orientation bias: sentences with different sexual orientations.
-
religion bias: sentences with different religions.
-
nationality bias: sentences with different nationalities.
-
gender bias: M/F failure rates should be similar for different professions.
-
Example: Anna is not a secretary, Ralph is. Who is a secretary?​
-
-
-
mutations2: more perturbations to original sentence
-
upperCase sentence: convert entire sentence to upperCase characters.​
-
lowerCase sentence: convert entire sentence to lowerCase characters.
-
ampersand replaces and: replace 'and' with '&'.
-
swap qwerty adjacent keys: swap adjacent keys in the qwerty keyboard.
-
swap WordNet synonym: replace each word with its WordNet synonym if one exists.
-
-
vocabulary: create new sentences which pose various kinds of vocabulary related tests to the model
-
A is <COMP> than B. Who is more or less <COMP>? ; where <COMP> is an adjective such as better, older, smarter etc.​
-
Example: Arthur is stranger than Susan. Who is less strange?​
-
-
Intensifiers (very, super, extremely, etc.) and reducers (somewhat, kinda, etc.).
-
Example: Philip is excited about the project. Samuel is highly excited about the project. Who is most excited about the project?​
-
Example-2: Samuel is excited about the project. Philip is somewhat excited about the project., Who is least excited about the project?
-
-
-
taxonomy: create new sentences which pose taxonomy related tests to the model
-
A is COMP than B. Who is antonym(COMP)? B​
-
Example: Matthew is poorer than Margaret. Who is richer?
-
-
Size, shape, age, color​
-
Example: There is an oval brown table in the room. What shape is the table?​
-
-
Profession vs. Nationality​
-
Example: Catherine is a Bangladeshi historian. What is Catherine's job?​
-
-
Animal vs. Vehicle​
-
Example: Daniel has a bull and a train. What animal does Daniel have?​
-
-
Animal vs. Vehicle v2​
-
Example: Jerry bought a snake. Charlotte bought a car. Who bought an animal?​
-
-
Synonyms​
-
Example: Elizabeth is very intelligent. Bill is very happy. Who is smart?​
-
-
A is more X than B. Who is more antonym(X)? B. Who is less X? B. Who is more X? A.​
-
Example: Anna is more passive than Steven. Who is less passive?​
-
-
-
temporal: create new sentences that involve temporal variance, i.e. before/after​
-
There was a change in profession​
-
Both Steve and Patricia were agents, but there was a change in Steve, who is now a waitress. Who is a waitress?​
-
-
Understanding before/after --> first/last​
-
Alexandra became a engineer before Alice did. Who became a engineer first?​
-
-
-
negation: create sentences with different types of negation (differ based on task)​
-
Negation in context, may or may not be in question (question-answering task only)​
-
Example: Patricia is not an accountant. Ann is. Who is an accountant?​
-
-
Negation in question only (question-answering task only)​
-
Example: Roy is an agent. Ed is an artist. Who is not an artist?​
-
-
-
coreference resolution (coref): create sentences that test the ability of the model to correctly link all expressions (like pronouns or nouns) that refer to the same entity in a text.​
-
Basic coref, he / she​
-
Example: Harold and Melissa are friends. He is an author, and she is an executive. Who is an executive?​
-
-
Basic coref, his / her​
-
Example: Kathleen and Billy are friends. His mom is an activist. Whose mom is an activist?​
-
-
Understanding former / latter relations​
-
Example: Patrick and Simon are friends. The former is an economist. Who is an economist?​
-
-
-
semantic role labeling (SRL): also called shallow semantic parsing or slot-filling, is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of an agent, goal, or result.​ SRL tests differ based on task as indicated below.
-
Question-Answering Task:​
-
Agent / object distinction​
-
Example: Scott is accepted by Elizabeth. Who accepts?​
-
-
Agent / object distinction with 3 agents​
-
Example: Mike is trusted by Jim. Mike trusts Adam. Who trusts Mike?​
-
-
-
QQP Duplicate Detection​​​ Task:
-
Who do X think - Who is the ... according to X​
-
Example: If Bruce and Cynthia were married, would his family be happy? If Bruce and Cynthia were married, would Cynthia's family be happy?​
-
-
Order does not matter for comparison​
-
Example: 'Who do kids think is the top player in the world?', 'Who is the top player in the world according to kids?'
-
-
Order does not matter for symmetric relations​
-
Example: 'Are seals cheaper than bats?', 'What is cheaper, bats or seals?'​
-
-
Order does matter for asymmetric relations​
-
Example: 'Is Ben abusive to Sally?', 'Is Sally abusive to Ben?'​
-
-
Traditional SRL: active / passive swap
-
Example: 'Did Steven return the castle?', 'Was the castle returned by Steven?'
-
-
Traditional SRL: wrong active / passive swap
-
Example: 'Did Kathleen miss the school?', 'Was Kathleen missed by the school?'
-
-
Traditional SRL: active / passive swap with people
-
Example: 'Does Jean trust Martha?', 'Is Martha trusted by Jean?'
-
-
Traditional SRL: wrong active / passive swap with people
-
Example: 'Does Martin like Arthur?', 'Is Martin liked by Arthur?'
-
-
-
Sentiment Classification Task:
-
My opinion is what matters
-
Example: I had heard you were beautiful, I think you are lousy.
-
-
Q & A: yes
-
Example: Do I think that pilot is unhappy? Yes
-
-
Q & A: yes (neutral)
-
Example: Do I think that is a private customer service? Yes
-
-
Q & A: no
-
Example: Do I think it was a difficult aircraft? No
-
-
Q & A: no (neutral)
-
Example: Did I find the company? No​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​
-
-
-