Test Types | vcrsoft

vcNLP's tests are grouped under different test types/categories and vary based on tasks:

security: cyber-security / adversarial attacks (imperceptible/invisible perturbations tests)
- insert zero width space: insert Unicode character U+200B in a random place (between characters) in each word of sentence.
- insert zero width joiner: insert Unicode character U+200D in a random place (between characters) in each word of sentence.
- insert zero width non-joiner: insert Unicode character U+200C in a random place (between characters) in each word of sentence.
- replace with homoglyphs: A homoglyph is a character that is visually similar to another. For the following 12 homoglyph pairs, we replaced each character, in every word of a sentence, with its homoglyph:
  1. a, U+0430 (cyrillic small letter a)
  2. c, U+0441 (cyrillic small letter Es)
  3. e, U+0435 (cyrillic small letter Ie)
  4. h, U+04BB (cyrillic small letter Shha)
  5. i, U+0456 (cyrillic small letter Byelorussian-Ukrainian I)
  6. j, U+03F3 (Greek Letter Yot)
  7. n, U+0578 (Armenian Small Letter Vo)
  8. o, U+043E (Cyrillic Small Letter O)
  9. p, U+0440 (Cyrillic Small Letter Er)
  10. q, U+0566 (Armenian Small Letter Za)
  11. x, U+0445 (Cyrillic Small Letter Ha)
  12. y, U+0443 (Cyrillic Small Letter U)
mutations/errors: perturbations to original sentence
- paraphrase the sentence: We created a paraphrase generator using the T5-base pre-trained transformer model, from the Huggingface model hub. We trained this T5 model using the ParaNMT-50 dataset.
- punctuation: strip punctuation and/or add "." .
- typos: add one typo to input by swapping two adjacent characters.
- 2 typos: add two typos to input by swapping two adjacent characters twice.
- contractions: Contract or expand contractions, e.g. What is -> What\'s.
- add random sentences to context: This pertains mainly to the question-answering task.
entityChange: change entities in the sentence
- change numbers: replace integers with random integers within a 20% radius of the original.
- change locations: replace city or country names with other cities or countries.
- change names: replace names with other common names.
bias: create new sentences to test model fairness in sensitive / protected categories
- racial bias: sentences with different races.
- sexual orientation bias: sentences with different sexual orientations.
- religion bias: sentences with different religions.
- nationality bias: sentences with different nationalities.
- gender bias: M/F failure rates should be similar for different professions.
  - Example: Anna is not a secretary, Ralph is. Who is a secretary?
mutations2: more perturbations to original sentence
- upperCase sentence: convert entire sentence to upperCase characters.
- lowerCase sentence: convert entire sentence to lowerCase characters.
- ampersand replaces and: replace 'and' with '&'.
- swap qwerty adjacent keys: swap adjacent keys in the qwerty keyboard.
- swap WordNet synonym: replace each word with its WordNet synonym if one exists.
vocabulary: create new sentences which pose various kinds of vocabulary related tests to the model
- A is <COMP> than B. Who is more or less <COMP>? ; where <COMP> is an adjective such as better, older, smarter etc.
  - Example: Arthur is stranger than Susan. Who is less strange?
- Intensifiers (very, super, extremely, etc.) and reducers (somewhat, kinda, etc.).
  - Example: Philip is excited about the project. Samuel is highly excited about the project. Who is most excited about the project?
  - Example-2: Samuel is excited about the project. Philip is somewhat excited about the project., Who is least excited about the project?
taxonomy: create new sentences which pose taxonomy related tests to the model
- A is COMP than B. Who is antonym(COMP)? B
  - Example: Matthew is poorer than Margaret. Who is richer?
- Size, shape, age, color
  - Example: There is an oval brown table in the room. What shape is the table?
- Profession vs. Nationality
  - Example: Catherine is a Bangladeshi historian. What is Catherine's job?
- Animal vs. Vehicle
  - Example: Daniel has a bull and a train. What animal does Daniel have?
- Animal vs. Vehicle v2
  - Example: Jerry bought a snake. Charlotte bought a car. Who bought an animal?
- Synonyms
  - Example: Elizabeth is very intelligent. Bill is very happy. Who is smart?
- A is more X than B. Who is more antonym(X)? B. Who is less X? B. Who is more X? A.
  - Example: Anna is more passive than Steven. Who is less passive?
temporal: create new sentences that involve temporal variance, i.e. before/after
- There was a change in profession
  - Both Steve and Patricia were agents, but there was a change in Steve, who is now a waitress. Who is a waitress?
- Understanding before/after --> first/last
  - Alexandra became a engineer before Alice did. Who became a engineer first?
negation: create sentences with different types of negation (differ based on task)
- Negation in context, may or may not be in question (question-answering task only)
  - Example: Patricia is not an accountant. Ann is. Who is an accountant?
- Negation in question only (question-answering task only)
  - Example: Roy is an agent. Ed is an artist. Who is not an artist?
coreference resolution (coref): create sentences that test the ability of the model to correctly link all expressions (like pronouns or nouns) that refer to the same entity in a text.
- Basic coref, he / she
  - Example: Harold and Melissa are friends. He is an author, and she is an executive. Who is an executive?
- Basic coref, his / her
  - Example: Kathleen and Billy are friends. His mom is an activist. Whose mom is an activist?
- Understanding former / latter relations
  - Example: Patrick and Simon are friends. The former is an economist. Who is an economist?
semantic role labeling (SRL): also called shallow semantic parsing or slot-filling, is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of an agent, goal, or result. SRL tests differ based on task as indicated below.
- Question-Answering Task:
  - Agent / object distinction
    - Example: Scott is accepted by Elizabeth. Who accepts?
  - Agent / object distinction with 3 agents
    - Example: Mike is trusted by Jim. Mike trusts Adam. Who trusts Mike?
- QQP Duplicate Detection Task:
  - Who do X think - Who is the ... according to X
    - Example: If Bruce and Cynthia were married, would his family be happy? If Bruce and Cynthia were married, would Cynthia's family be happy?
  - Order does not matter for comparison
    - Example: 'Who do kids think is the top player in the world?', 'Who is the top player in the world according to kids?'
  - Order does not matter for symmetric relations
    - Example: 'Are seals cheaper than bats?', 'What is cheaper, bats or seals?'
  - Order does matter for asymmetric relations
    - Example: 'Is Ben abusive to Sally?', 'Is Sally abusive to Ben?'
  - Traditional SRL: active / passive swap
    - Example: 'Did Steven return the castle?', 'Was the castle returned by Steven?'
  - Traditional SRL: wrong active / passive swap
    - Example: 'Did Kathleen miss the school?', 'Was Kathleen missed by the school?'
  - Traditional SRL: active / passive swap with people
    - Example: 'Does Jean trust Martha?', 'Is Martha trusted by Jean?'
  - Traditional SRL: wrong active / passive swap with people
    - Example: 'Does Martin like Arthur?', 'Is Martin liked by Arthur?'
- Sentiment Classification Task:
  - My opinion is what matters
    - Example: I had heard you were beautiful, I think you are lousy.
  - Q & A: yes
    - Example: Do I think that pilot is unhappy? Yes
  - Q & A: yes (neutral)
    - Example: Do I think that is a private customer service? Yes
  - Q & A: no
    - Example: Do I think it was a difficult aircraft? No
  - Q & A: no (neutral)
    - Example: Did I find the company? No