top of page

vcNLP's tests are grouped under different test types/categories and vary based on tasks:

  • security: cyber-security / adversarial attacks (imperceptible/invisible perturbations tests)

    • insert zero width space:​ insert Unicode character U+200B in a random place (between characters) in each word of sentence. 

    • insert zero width joiner: insert Unicode character U+200D in a random place (between characters) in each word of sentence. 

    • insert zero width non-joiner: insert Unicode character U+200C in a random place (between characters) in each word of sentence. 

    • replace with homoglyphs: A homoglyph is a character that is visually similar to another. For the following 12 homoglyph pairs, we replaced each character, in every word of a sentence, with its homoglyph:

      1. a, U+0430  (cyrillic small letter a)​

      2. c, U+0441  (cyrillic small letter Es)

      3. e, U+0435  (cyrillic small letter Ie)

      4. h, U+04BB  (cyrillic small letter Shha)

      5. i, U+0456  (cyrillic small letter Byelorussian-Ukrainian I)

      6. j, U+03F3  (Greek Letter Yot)

      7. n, U+0578  (Armenian Small Letter Vo)

      8. o, U+043E (Cyrillic Small Letter O)

      9. p, U+0440  (Cyrillic Small Letter Er)

      10. q, U+0566  (Armenian Small Letter Za)

      11. x, U+0445  (Cyrillic Small Letter Ha)

      12. y, U+0443  (Cyrillic Small Letter U)

  • mutations/errors: perturbations to original sentence

    • paraphrase the sentence: We created a paraphrase generator using the T5-base pre-trained transformer model, from the Huggingface model hub.  We trained this T5 model using the ParaNMT-50 dataset

    • punctuation: strip punctuation and/or add "." .

    • typos: add one typo to input by swapping two adjacent characters.

    • 2 typos: add two typos to input by swapping two adjacent characters twice.

    • contractions: Contract or expand contractions, e.g. What is -> What\'s.

    • add random sentences to context: This pertains mainly to the question-answering task.

  • entityChange: change entities in the sentence

    • change numbers: replace integers with random integers within a 20% radius of the original.​

    • change locations: replace city or country names with other cities or countries.

    • change names: replace names with other common names.

  • bias: create new sentences to test model fairness in sensitive / protected categories

    • racial bias: sentences with different races.​

    • sexual orientation bias: sentences with different sexual orientations.

    • religion bias: sentences with different religions.

    • nationality bias: sentences with different nationalities.

    • gender bias: M/F failure rates should be similar for different professions.

      • Example: Anna is not a secretary, Ralph is. Who is a secretary?​

  • mutations2: more perturbations to original sentence

    • upperCase sentence: convert entire sentence to upperCase characters.​

    • lowerCase sentence: convert entire sentence to lowerCase characters.

    • ampersand replaces and: replace 'and' with '&'.

    • swap qwerty adjacent keys: swap adjacent keys in the qwerty keyboard.

    • swap WordNet synonym: replace each word with its WordNet synonym if one exists.

  • vocabulary: create new sentences which pose various kinds of vocabulary related tests to the model

    • A is <COMP> than B. Who is more or less <COMP>?  ; where <COMP> is an adjective such as better, older, smarter etc.​

      • Example: Arthur is stranger than Susan. Who is less strange?​

    • Intensifiers (very, super, extremely, etc.) and reducers (somewhat, kinda, etc.).

      • Example: Philip is excited about the project. Samuel is highly excited about the project. Who is most excited about the project?​

      • Example-2: Samuel is excited about the project. Philip is somewhat excited about the project., Who is least excited about the project?

  • taxonomy: create new sentences which pose taxonomy related tests to the model

    • A is COMP than B. Who is antonym(COMP)? B​

      • Example: Matthew is poorer than Margaret. Who is richer?

    • Size, shape, age, color​

      • Example: There is an oval brown table in the room. What shape is the table?​

    • Profession vs. Nationality​

      • Example: Catherine is a Bangladeshi historian. What is Catherine's job?​

    • Animal vs. Vehicle​

      • Example: Daniel has a bull and a train. What animal does Daniel have?​

    • Animal vs. Vehicle v2​

      • Example: Jerry bought a snake. Charlotte bought a car. Who bought an animal?​

    • Synonyms​

      • Example: Elizabeth is very intelligent. Bill is very happy. Who is smart?​

    • A is more X than B. Who is more antonym(X)? B. Who is less X? B. Who is more X? A.​

      • Example: Anna is more passive than Steven. Who is less passive?​

  • temporal: create new sentences that involve temporal variance, i.e. before/after​

    • There was a change in profession​

      • Both Steve and Patricia were agents, but there was a change in Steve, who is now a waitress. Who is a waitress?​

    • Understanding before/after --> first/last​

      • Alexandra became a engineer before Alice did. Who became a engineer first?​

  • negation: create sentences with different types of negation (differ based on task)​

    • Negation in context, may or may not be in question (question-answering task only)​

      • Example: Patricia is not an accountant. Ann is. Who is an accountant?​

    • Negation in question only (question-answering task only)​

      • Example: Roy is an agent. Ed is an artist. Who is not an artist?​

  • coreference resolution (coref): create sentences that test the ability of the model to correctly link all expressions (like pronouns or nouns) that refer to the same entity in a text.​

    • Basic coref, he / she​

      • Example: Harold and Melissa are friends. He is an author, and she is an executive. Who is an executive?​

    • Basic coref, his / her​

      • Example: Kathleen and Billy are friends. His mom is an activist. Whose mom is an activist?​

    • Understanding former / latter relations​

      • Example: Patrick and Simon are friends. The former is an economist. Who is an economist?​

  • semantic role labeling (SRL): also called shallow semantic parsing or slot-filling, is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of an agent, goal, or result.​ SRL tests differ based on task as indicated below.

    • Question-Answering Task:​

      • Agent / object distinction​

        • Example: Scott is accepted by Elizabeth. Who accepts?​

      • Agent / object distinction with 3 agents​

        • Example: Mike is trusted by Jim. Mike trusts Adam. Who trusts Mike?​

    • QQP Duplicate Detection​​​ Task:

      • Who do X think - Who is the ... according to X​

        • Example: If Bruce and Cynthia were married, would his family be happy? If Bruce and Cynthia were married, would Cynthia's family be happy?​

      • Order does not matter for comparison​

        • Example: 'Who do kids think is the top player in the world?', 'Who is the top player in the world according to kids?'

      • Order does not matter for symmetric relations​

        • Example: 'Are seals cheaper than bats?', 'What is cheaper, bats or seals?'​

      • Order does matter for asymmetric relations​

        • Example: 'Is Ben abusive to Sally?', 'Is Sally abusive to Ben?'​

      • Traditional SRL: active / passive swap

        • Example: 'Did Steven return the castle?', 'Was the castle returned by Steven?'

      • Traditional SRL: wrong active / passive swap

        • Example: 'Did Kathleen miss the school?', 'Was Kathleen missed by the school?'

      • Traditional SRL: active / passive swap with people

        • Example: 'Does Jean trust Martha?', 'Is Martha trusted by Jean?'

      • Traditional SRL: wrong active / passive swap with people

        • Example: 'Does Martin like Arthur?', 'Is Martin liked by Arthur?'

    • Sentiment Classification Task:

      • My opinion is what matters

        • Example: I had heard you were beautiful, I think you are lousy.

      • Q & A: yes

        • Example: Do I think that pilot is unhappy? Yes

      • Q & A: yes (neutral)

        • Example: Do I think that is a private customer service? Yes

      • Q & A: no

        • Example: Do I think it was a difficult aircraft? No

      • Q & A: no (neutral)

        • Example: Did I find the company? No​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​

 

©2023 by vcrsoft.

bottom of page