Class GenBankReader

java.lang.Object
org.episteme.core.io.AbstractResourceReader<String>
org.episteme.natural.biology.loaders.GenBankReader
All Implemented Interfaces:
ResourceIO<String>, ResourceReader<String>

public class GenBankReader extends AbstractResourceReader<String>
Connector to NCBI GenBank for DNA/protein sequence retrieval.

What it does: Fetches nucleotide and protein sequences using NCBI E-utilities (efetch/esearch).

Data Source: NCBI Entrez E-utilities API

Usage example:

Since:
1.0
Author:
Silvere Martin-Michiellot, Gemini AI (Google DeepMind)
  • Constructor Details

    • GenBankReader

      public GenBankReader()
  • Method Details

    • loadFromSource

      protected String loadFromSource(String accession) throws Exception
      Specified by:
      loadFromSource in class AbstractResourceReader<String>
      Throws:
      Exception
    • getName

      public String getName()
      Description copied from interface: ResourceIO
      Returns the display name of this resource handler. MUST be implemented with I18N support.
      Returns:
      the display name
    • getDescription

      public String getDescription()
      Description copied from interface: ResourceIO
      Returns a short description of this resource handler. MUST be implemented with I18N support.
      Returns:
      the description
    • getLongDescription

      public String getLongDescription()
      Description copied from interface: ResourceIO
      Returns a long description of this resource handler. MUST be implemented with I18N support.
      Returns:
      the long description
    • getCategory

      public String getCategory()
      Description copied from interface: ResourceIO
      Returns the category for grouping. MUST be implemented with I18N support.
      Returns:
      the category name
    • getSupportedVersions

      public String[] getSupportedVersions()
      Description copied from interface: ResourceIO
      Returns the supported versions of the format this reader/writer handles.

      Each implementation MUST override this method to declare which versions of the underlying format are supported. The returned array should contain version strings in the format's standard notation (e.g., "3.0", "2.1", "Level 3 Version 2").

      Examples:

      • MathML: {"3.0", "2.0"}
      • SBML: {"Level 3 Version 2", "Level 3 Version 1", "Level 2 Version 5"}
      • PhyloXML: {"1.10", "1.00"}

      Returns:
      array of supported version strings, never null (empty array if version-agnostic)
    • getResourcePath

      public String getResourcePath()
      Description copied from interface: ResourceIO
      Returns the base path where this resource is located.
    • getResourceType

      public Class<String> getResourceType()
      Description copied from interface: ResourceIO
      Returns the type of resource.
    • getFasta

      public static String getFasta(String accession)
      Fetches sequence in FASTA format.
      Parameters:
      accession - GenBank accession number (e.g., "NC_000913.3", "NM_001301717")
      Returns:
      FASTA formatted sequence
    • getProteinFasta

      public static String getProteinFasta(String accession)
      Fetches protein sequence in FASTA format.
      Parameters:
      accession - protein accession number
      Returns:
      FASTA formatted protein sequence
    • getGenBankRecord

      public static String getGenBankRecord(String accession)
      Fetches sequence in GenBank flat file format.
      Parameters:
      accession - GenBank accession number
      Returns:
      GenBank format record
    • getSequenceInfo

      public static SequenceInfo getSequenceInfo(String accession)
      Parses FASTA to extract sequence and metadata.
      Parameters:
      accession - GenBank accession
      Returns:
      SequenceInfo with parsed data
    • getInstance

      public static GenBankReader getInstance()