Valid N-gram & POS Service Documentation

This is API service to get the valid ngram possibly (1,2,3,4) from set of words, or sentences. This service is also capable of assign the most frequent POS i.e part of speech tag to each of those ngram extract from a given content or input.

This service can be useful for:

Valid N-gram Extraction,
POS Assignment,
Verification,
Filtering Sentences.
Checking Validity of words.

Language Supported ?

JavaScript
Python
PHP.

Why Python?

N-gram Extraction and Verification

Five input the user can give as the input.

content
content_type
ngram_n
delimeter
verify_for
ngram_n_max

{
    "content" : "animalia is a book;boy",
    "delimeter": ";",
    "content_type": "ngram",
    "ngram_n": 2,
    "ngram_n_max": 3,
    "verfiy_for" : "ngram"
}

The main service url : https://ngrampos.vipresearch.ca/ngram_pos/service/word_service/

content : Users will have to input the sentence or single word or group of words seperated by a delimeter which is optional

content_type : two options are available i.e. " ngram " and " pos ".

delimeter : (Optional) User can specify the seperator between the sentences or words, if not assigned space will act as a delimeter.

verify_for : (Optional) if " ngram " is selected will get results for the ngram, if " pos " is selected will get results for the pos and its details, if this argument is not provided the system wil provide results for both ngram and pos.

ngram_n : The specific ngram the user want from the sentences. (1,2,3,4)

ngram_n_max : All the ngrams from 1 to value provided will be extracted and processed.

[
    {
        "ngram_asked": {
            "amount": 3,
            "valid": {
                "animalia": [
                    {
                        "valid_ngram": "True",
                        "pos": "NNS",
                        "ngram": 1
                    }
                ],
                "book": [
                    {
                        "valid_ngram": "True",
                        "pos": "NN",
                        "ngram": 1
                    }
                ],
                "boy": [
                    {
                        "valid_ngram": "True",
                        "pos": "NN",
                        "ngram": 1
                    }
                ]
            },
            "validity": ["True","True","True"],
            "valid_ngram": ["animalia","book","boy"],
            "valid_ngram_n": [1,1,1],
            "valid_ngram_pos": ["NN","NN"],
            "invalid_ngram_pos": ["NNS"],
            "invalid_ngram": [],
            "invalid_ngram_n": []
        }
    }
]

verify_for : If its set to "pos", we obtain the following results

[
  {
      "pos_asked": {
          "invalid": {
              "animalia": [
                  {
                      "valid_pos": "False",
                      "pos": "NNS",
                      "Full-form": "Noun(plural)",
                      "pos_frequency": null,
                      "ngram": 1
                  }
              ]
          },
          "valid": {
              "book": [
                  {
                      "valid_pos": "True",
                      "pos": "NN",
                      "Full-form": "Noun(singular)",
                      "pos_frequency": 51402,
                      "ngram": 1
                  }
              ],
              "boy": [
                  {
                      "valid_pos": "True",
                      "pos": "NN",
                      "Full-form": "Noun(singular)",
                      "pos_frequency": 51402,
                      "ngram": 1
                  }
              ]
          },
          "amount": 2,
          "valid_pos": ["NN"],
          "invalid_pos": ["NNS"]
      }
  }
]

Parts of Speech Validity

When the content_type is set to " pos ", we will obtain the information of the part of speech tags.

The json results carries the output

Abbreviation
Frequency
Validity

{
  "content" : "NN,JJ-IN",
  "delimeter": ",",
  "content_type": "pos"
}

The json response from the above API call looks like:

{
              "pos": {
                  "amount": 2,
                  "valid": {
                      "NN": [
                          {
                              "valid_pos": "True",
                              "Full-form": "Noun(singular)",
                              "pos_frequency": 51402
                          }
                      ]
                  },
                  "invalid": {
                      "JJ-IN": [
                          {
                              "valid_pos": "False",
                              "Full-form": "Adjective-Preposition",
                              "pos_frequency": null
                          }
                      ]
                  },
                  "validity": [
                      "True",
                      "False"
                  ],
                  "valid_pos": [
                      "NN"
                  ],
                  "invalid_pos": [
                      "JJ-IN"
                  ]
              }
            }

The response shows the the validity of each pos tags asked by the user, it can be helpful to check grammatical correctness.

Retrieving Top POS

This service helps to capture the current top-15 and most frequent part of speech tags and present the response in JSON or CSV format, depending on the user preferences.

URL : https://ngrampos.vipresearch.ca/ngram_pos/service/get_list.php

The json string should look like :

fomat JSON and CSV format are supported in the output.(default - JSON)

{"format":"json"}

The Output in JSON :


 {
     "NN": [
       {
         "Abbrevation": "NN",
         "pos_count": 51402
       }
     ],
     "NN-NN": [
       {
         "Abbrevation": "NN-NN",
         "pos_count": 41403
       }
     ],
     "JJ-NN": [
       {
         "Abbrevation": "JJ-NN",
         "pos_count": 38855
       }
     ],
     "NNP-NNP": [
       {
         "Abbrevation": "NNP-NNP",
         "pos_count": 27907
       }
     ],
     ..........
     ......
}

{"format":"csv"}

The Output in CSV :

NN,NN-NN,JJ-NN,NNP-NNP,NN-IN,DT-JJ-NN,IN-DT-NN,NNP-NN,DT-NN,NNP-NNP-NNP,JJ-NN-IN,
JJ-NNS,IN-NNP,NN-IN-DT,NN-NNS

Accessing API/Implementing in Code

In php :

Users can call this API in the below format :


$input_arr = array(
            'content' => "dogs are wonderful;enjoyable",
            'ngram_n' => 1,
            'delimeter'=> ";",
            'verify_for' => 'ngram');

$json = json_encode($input_arr);
$context = array('http' =>
        array(
        'method'  => 'POST',
        'header'  => 'Content-Type: application/json',
        'content' => $json
        )
      );
$context  = stream_context_create($context);
// use file_get_get_contents or curl and  json_decode to capture response
$url = "https://ngrampos.vipresearch.ca/ngram_pos/service/word_service/";
$contents = file_get_contents($url, false, $context);

In Python :

Users can call this API in the below format :


import requests
parameters = {"content" : "stones are hard;cake",
              "content_type":"ngram",
              "ngram_n" : 1,
              "delimeter" : ";",
              "verify_for" : "ngram"}
url = "https://ngrampos.vipresearch.ca/ngram_pos/service/word_service/";
r = requests.get(url, json=parameters)
print(r.json())