Class QuantitativeLinguistics

java.lang.Object
org.episteme.social.linguistics.quantitative.QuantitativeLinguistics

public final class QuantitativeLinguistics extends Object
Implements fundamental laws of quantitative linguistics. Provides scientific metrics for statistical language analysis.
Since:
1.0
Author:
Silvere Martin-Michiellot, Gemini AI (Google DeepMind)
  • Method Summary

    Modifier and Type
    Method
    Description
    static double
    calculateEntropy(Map<String,Long> wordFrequencies)
    Calculates the Shannon Entropy of a text based on word frequencies.
    static double
    calculateTTR(long vocabularySize, long totalTokens)
    Calculates the TTR (Type-Token Ratio).
    static double
    heapsLaw(long totalTokens, double K, double beta)
    Heaps' Law: Describes the number of distinct words (vocabulary size) in a document as a function of its length.
    static double
    menzerathAltmannLaw(double x, double a, double b, double c)
    Menzerath-Altmann Law: The more components a linguistic construct has, the smaller the components are. y = a * x^b * e^(cx)
    static double
    zipfLaw(int rank, double exponent, double constant)
    Zipf's Law: The frequency of any word is inversely proportional to its rank in the frequency table. f(r) = C / r^s

    Methods inherited from class Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • zipfLaw

      public static double zipfLaw(int rank, double exponent, double constant)
      Zipf's Law: The frequency of any word is inversely proportional to its rank in the frequency table. f(r) = C / r^s
      Parameters:
      rank - The rank of the word (1-indexed).
      exponent - The Zipfian exponent (usually close to 1.0).
      constant - The normalizing constant.
      Returns:
      The theoretical frequency.
    • heapsLaw

      public static double heapsLaw(long totalTokens, double K, double beta)
      Heaps' Law: Describes the number of distinct words (vocabulary size) in a document as a function of its length. V = K * N^beta
      Parameters:
      totalTokens - (N) total number of tokens in the corpus.
      K - empirically determined constant (typically 10-100).
      beta - empirically determined exponent (typically 0.4-0.6).
      Returns:
      theoretical vocabulary size (V).
    • menzerathAltmannLaw

      public static double menzerathAltmannLaw(double x, double a, double b, double c)
      Menzerath-Altmann Law: The more components a linguistic construct has, the smaller the components are. y = a * x^b * e^(cx)
      Parameters:
      x - number of components (e.g., syllables in a word).
      a - parameter.
      b - parameter.
      c - parameter.
      Returns:
      length of components (e.g., average phonemes in a syllable).
    • calculateEntropy

      public static double calculateEntropy(Map<String,Long> wordFrequencies)
      Calculates the Shannon Entropy of a text based on word frequencies. Measures the unpredictability or information content.
      Parameters:
      wordFrequencies - Map of words to their occurrences.
      Returns:
      Entropy value in bits.
    • calculateTTR

      public static double calculateTTR(long vocabularySize, long totalTokens)
      Calculates the TTR (Type-Token Ratio). A simple measure of lexical diversity.