java.lang.Object

org.episteme.social.linguistics.quantitative.QuantitativeLinguistics

public final class QuantitativeLinguistics extends Object

Implements fundamental laws of quantitative linguistics. Provides scientific metrics for statistical language analysis.

Since:: 1.0
Author:: Silvere Martin-Michiellot, Gemini AI (Google DeepMind)

Method Summary

Modifier and Type

Method

Description

static double

calculateEntropy(Map<String,Long> wordFrequencies)

Calculates the Shannon Entropy of a text based on word frequencies.

static double

calculateTTR(long vocabularySize, long totalTokens)

Calculates the TTR (Type-Token Ratio).

static double

heapsLaw(long totalTokens, double K, double beta)

Heaps' Law: Describes the number of distinct words (vocabulary size) in a document as a function of its length.

static double

menzerathAltmannLaw(double x, double a, double b, double c)

Menzerath-Altmann Law: The more components a linguistic construct has, the smaller the components are. y = a * x^b * e^(cx)

static double

zipfLaw(int rank, double exponent, double constant)

Zipf's Law: The frequency of any word is inversely proportional to its rank in the frequency table. f(r) = C / r^s

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- zipfLaw
  
  public static double zipfLaw(int rank, double exponent, double constant)
  
  Zipf's Law: The frequency of any word is inversely proportional to its rank in the frequency table. f(r) = C / r^s
  
  Parameters:
  
  rank - The rank of the word (1-indexed).
  
  exponent - The Zipfian exponent (usually close to 1.0).
  
  constant - The normalizing constant.
  
  Returns:
  
  The theoretical frequency.
- heapsLaw
  
  public static double heapsLaw(long totalTokens, double K, double beta)
  
  Heaps' Law: Describes the number of distinct words (vocabulary size) in a document as a function of its length. V = K * N^beta
  
  Parameters:
  
  totalTokens - (N) total number of tokens in the corpus.
  
  K - empirically determined constant (typically 10-100).
  
  beta - empirically determined exponent (typically 0.4-0.6).
  
  Returns:
  
  theoretical vocabulary size (V).
- menzerathAltmannLaw
  
  public static double menzerathAltmannLaw(double x, double a, double b, double c)
  
  Menzerath-Altmann Law: The more components a linguistic construct has, the smaller the components are. y = a * x^b * e^(cx)
  
  Parameters:
  
  x - number of components (e.g., syllables in a word).
  
  a - parameter.
  
  b - parameter.
  
  c - parameter.
  
  Returns:
  
  length of components (e.g., average phonemes in a syllable).
- calculateEntropy
  
  public static double calculateEntropy(Map<String,Long> wordFrequencies)
  
  Calculates the Shannon Entropy of a text based on word frequencies. Measures the unpredictability or information content.
  
  Parameters:
  
  wordFrequencies - Map of words to their occurrences.
  
  Returns:
  
  Entropy value in bits.
- calculateTTR
  
  public static double calculateTTR(long vocabularySize, long totalTokens)
  
  Calculates the TTR (Type-Token Ratio). A simple measure of lexical diversity.

Class QuantitativeLinguistics

Method Summary

Methods inherited from class Object

Method Details

zipfLaw

heapsLaw

menzerathAltmannLaw

calculateEntropy

calculateTTR