Domain Knowledge Graph

DKG (mira.dkg)

Construction of domain knowledge graphs.

add_resource(resource)[source]

Add a resource to the default registry during current runtime.

Return type:

None

ASKEMO (mira.dkg.askemo)

EQUIVALENCE_TYPES = {'skos:broadBarch', 'skos:exactMatch', 'skos:narrowMatch', 'skos:relatedMatch'}

Valid equivalence annotations in ASKEMO

SYNONYM_TYPES = {'oboInOwl:hasBroadSynonym': 'BROAD', 'oboInOwl:hasExactSynonym': 'EXACT', 'oboInOwl:hasNarrowSynonym': 'NARROW', 'oboInOwl:hasRelatedSynonym': 'RELATED', 'referenced_by_latex': 'RELATED', 'referenced_by_symbol': 'RELATED'}

Keys are values in ASKEMO and values are OBO specificities

pydantic model Term[source]

Bases: BaseModel

A term in the ASKEMO ontology.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Show JSON schema
{
   "title": "Term",
   "description": "A term in the ASKEMO ontology.",
   "type": "object",
   "properties": {
      "id": {
         "title": "Id",
         "type": "string"
      },
      "name": {
         "title": "Name",
         "type": "string"
      },
      "type": {
         "title": "Type",
         "enum": [
            "class",
            "property",
            "individual",
            "unknown"
         ],
         "type": "string"
      },
      "obsolete": {
         "title": "Obsolete",
         "default": false,
         "type": "boolean"
      },
      "description": {
         "title": "Description",
         "type": "string"
      },
      "xrefs": {
         "title": "Xrefs",
         "type": "array",
         "items": {
            "$ref": "#/definitions/Xref"
         }
      },
      "parents": {
         "title": "Parents",
         "description": "A list of CURIEs for parent terms",
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "synonyms": {
         "title": "Synonyms",
         "type": "array",
         "items": {
            "$ref": "#/definitions/Synonym"
         }
      },
      "part_ofs": {
         "title": "Part Ofs",
         "description": "A list of CURIEs for terms that this term is part of",
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "physical_min": {
         "title": "Physical Min",
         "type": "number"
      },
      "physical_max": {
         "title": "Physical Max",
         "type": "number"
      },
      "suggested_data_type": {
         "title": "Suggested Data Type",
         "type": "string"
      },
      "suggested_unit": {
         "title": "Suggested Unit",
         "type": "string"
      },
      "typical_min": {
         "title": "Typical Min",
         "type": "number"
      },
      "typical_max": {
         "title": "Typical Max",
         "type": "number"
      },
      "dimensionality": {
         "title": "Dimensionality",
         "type": "string"
      }
   },
   "required": [
      "id",
      "name",
      "type",
      "description"
   ],
   "definitions": {
      "Xref": {
         "title": "Xref",
         "description": "Represents a typed cross-reference.",
         "type": "object",
         "properties": {
            "id": {
               "title": "Id",
               "description": "The CURIE of the cross reference",
               "type": "string"
            },
            "type": {
               "title": "Type",
               "description": "The CURIE for the cross reference predicate",
               "example": "skos:exactMatch",
               "type": "string"
            }
         },
         "required": [
            "id",
            "type"
         ]
      },
      "Synonym": {
         "title": "Synonym",
         "description": "Represents a typed synonym.",
         "type": "object",
         "properties": {
            "value": {
               "title": "Value",
               "description": "The text of the synonym",
               "type": "string"
            },
            "type": {
               "title": "Type",
               "description": "The CURIE for the synonym predicate",
               "example": "skos:exactMatch",
               "type": "string"
            }
         },
         "required": [
            "value",
            "type"
         ]
      }
   }
}

Fields:
field description: str [Required]
field dimensionality: Optional[str] = None
field id: str [Required]
field name: str [Required]
field obsolete: bool = False
field parents: List[str] [Optional]

A list of CURIEs for parent terms

field part_ofs: List[str] [Optional]

A list of CURIEs for terms that this term is part of

field physical_max: Optional[float] = None
field physical_min: Optional[float] = None
field suggested_data_type: Optional[str] = None
field suggested_unit: Optional[str] = None
field synonyms: List[Synonym] [Optional]
field type: Literal['class', 'property', 'individual', 'unknown'] [Required]
field typical_max: Optional[float] = None
field typical_min: Optional[float] = None
field xrefs: List[Xref] [Optional]
property prefix: str

Get the prefix for the term.

get_askemo_terms()[source]

Load the epi ontology JSON.

Return type:

Mapping[str, Term]

get_askemosw_terms()[source]

Load the space weather ontology JSON.

Return type:

Mapping[str, Term]

get_askem_climate_ontology_terms()[source]

Load the space weather ontology JSON.

Return type:

Mapping[str, Term]

Client (mira.dkg.client)

Neo4j client module.

class Neo4jClient(url=None, user=None, password=None)[source]

Bases: object

A client to Neo4j.

Initialize the Neo4j client.

create_tx(query, **query_params)[source]

Run a query that creates nodes and/or relations.

Parameters:
  • query (str) – The query string to be executed.

  • query_params – The parameters to be used in the query.

Returns:

The result of the query

create_single_property_node_index(index_name, label, property_name, exist_ok=False)[source]

Create a single-property node index.

Parameters:
  • index_name (str) – The name of the index to create.

  • label (str) – The label of the nodes to index.

  • property_name (str) – The node property to index.

  • exist_ok (bool) – If True, do not raise an exception if the index already exists. Default: False.

query_nodes(query)[source]

Run a read-only query for nodes.

Parameters:

query (str) – The query string to be executed.

Returns:

A list of Node instances corresponding to the results of the query

Return type:

values

get_lexical()[source]

Get Lexical information for all entities.

Return type:

List[Entity]

get_node_counter()[source]

Get a count of each entity type.

Return type:

Counter

search(query, limit=25, offset=0, prefixes=None, labels=None, wikidata_fallback=False)[source]

Search nodes for a given name or synonym substring.

Parameters:
  • query (str) – The query string to search (by a normalized substring search).

  • limit (int) – The number of results to return. Useful for pagination.

  • offset (int) – The offset of the entities to return. Useful for pagination.

  • prefixes (Union[None, str, Iterable[str]]) – A prefix or list of prefixes. If given, any result matching any of the prefixes will be retained.

  • labels (Union[None, str, Iterable[str]]) – A label or list of labels used for filtering results. If given, any result with any of the labels will be retained.

  • wikidata_fallback (bool) – If true, use wikidata for searching if DKG returns no results

Return type:

A list of entity objects that match all of the query parameters

get_entity(curie)[source]

Look up an entity based on its CURIE.

Return type:

Optional[Entity]

get_transitive_closure(rels=None)[source]

Return transitive closure with respect to one or more relations.

Transitive closure is constructed as a set of pairs of node IDs ordered as (successor, descendant). Note that if rels are ones that point towards taxonomical parents (e.g., subclassof, part_of), then the pairs are interpreted as (taxonomical child, taxonomical ancestor).

Parameters:

rels (Optional[List[str]]) – One or more relation types to traverse. If not given, the default DKG_REFINER_RELS are used capturing taxonomical parenthood relationships.

Return type:

Set[Tuple[str, str]]

Returns:

The set of pairs constituting the transitive closure.

get_common_parents(curie1, curie2)[source]

Return the direct parents of two entities.

Return type:

Optional[List[Entity]]

pydantic model Entity[source]

Bases: BaseModel

An entity in the domain knowledge graph.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Show JSON schema
{
   "title": "Entity",
   "description": "An entity in the domain knowledge graph.",
   "type": "object",
   "properties": {
      "id": {
         "title": "Compact URI",
         "description": "The CURIE of the entity",
         "example": "ido:0000511",
         "type": "string"
      },
      "name": {
         "title": "Name",
         "description": "The name of the entity",
         "example": "infected population",
         "type": "string"
      },
      "type": {
         "title": "Type",
         "description": "The type of the entity",
         "example": "class",
         "enum": [
            "class",
            "property",
            "individual",
            "unknown"
         ],
         "type": "string"
      },
      "obsolete": {
         "title": "Obsolete",
         "description": "Is the entity marked obsolete?",
         "example": false,
         "type": "boolean"
      },
      "description": {
         "title": "Description",
         "description": "The description of the entity.",
         "example": "An organism population whose members have an infection.",
         "type": "string"
      },
      "synonyms": {
         "title": "Synonyms",
         "description": "A list of string synonyms",
         "example": [],
         "type": "array",
         "items": {
            "$ref": "#/definitions/Synonym"
         }
      },
      "alts": {
         "title": "Alternative Identifiers",
         "description": "A list of alternative identifiers, given as CURIE strings.",
         "example": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "xrefs": {
         "title": "Database Cross-references",
         "description": "A list of database cross-references, given as CURIE strings.",
         "example": [],
         "type": "array",
         "items": {
            "$ref": "#/definitions/Xref"
         }
      },
      "labels": {
         "title": "Labels",
         "description": "A list of Neo4j labels assigned to the entity.",
         "example": [
            "ido"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "properties": {
         "title": "Properties",
         "description": "A mapping of properties to their values",
         "example": {},
         "type": "object",
         "additionalProperties": {
            "type": "array",
            "items": {
               "type": "string"
            }
         }
      },
      "link": {
         "title": "Link",
         "type": "string"
      }
   },
   "required": [
      "id",
      "type",
      "obsolete"
   ],
   "definitions": {
      "Synonym": {
         "title": "Synonym",
         "description": "Represents a typed synonym.",
         "type": "object",
         "properties": {
            "value": {
               "title": "Value",
               "description": "The text of the synonym",
               "type": "string"
            },
            "type": {
               "title": "Type",
               "description": "The CURIE for the synonym predicate",
               "example": "skos:exactMatch",
               "type": "string"
            }
         },
         "required": [
            "value",
            "type"
         ]
      },
      "Xref": {
         "title": "Xref",
         "description": "Represents a typed cross-reference.",
         "type": "object",
         "properties": {
            "id": {
               "title": "Id",
               "description": "The CURIE of the cross reference",
               "type": "string"
            },
            "type": {
               "title": "Type",
               "description": "The CURIE for the cross reference predicate",
               "example": "skos:exactMatch",
               "type": "string"
            }
         },
         "required": [
            "id",
            "type"
         ]
      }
   }
}

Fields:
Validators:
field alts: List[str] [Optional]

A list of alternative identifiers, given as CURIE strings.

field description: Optional[str] = None

The description of the entity.

field id: str [Required]

The CURIE of the entity

field labels: List[str] [Optional]

A list of Neo4j labels assigned to the entity.

Validated by:
field name: Optional[str] = None

The name of the entity

field obsolete: bool [Required]

Is the entity marked obsolete?

field properties: Dict[str, List[str]] [Optional]

A mapping of properties to their values

field synonyms: List[Synonym] [Optional]

A list of string synonyms

field type: Literal['class', 'property', 'individual', 'unknown'] [Required]

The type of the entity

field xrefs: List[Xref] [Optional]

A list of database cross-references, given as CURIE strings.

as_askem_entity()[source]

Parse this term into an ASKEM Ontology-specific class.

classmethod from_data(data)[source]

Create from a data dictionary as it’s stored in neo4j.

Parameters:

data – Either a plain python dictionary or a neo4j.graph.Node object that will get unpacked. These correspond to the structure of data inside the neo4j graph, and therefore have parallel lists representing dictionaries for properties, xrefs, and synonyms.

Return type:

A MIRA entity

Set the value of the link field based on the value of the id field. This gets run as a post-init hook by Pydantic

See also: https://stackoverflow.com/questions/54023782/pydantic-make-field-none-in-validator-based-on-other-fields-value

property prefix: str

Get the prefix.

Construct (mira.dkg.construct)

Generate the nodes and edges file for the MIRA domain knowledge graph.

After these are generated, see the /docker folder in the repository for loading a neo4j instance.

Example command for local bulk import on mac with neo4j 4.x:

neo4j-admin import --database=mira         --delimiter='TAB'         --force         --skip-duplicate-nodes=true         --skip-bad-relationships=true         --nodes ~/.data/mira/demo/import/nodes.tsv.gz         --relationships ~/.data/mira/demo/import/edges.tsv.gz

Then, restart the neo4j service with homebrew brew services neo4j restart

class UseCasePaths(use_case, config=None)[source]

Bases: object

A configuration containing the file paths for use case-specific files.

class NodeInfo(curie, prefix, label, synonyms, deprecated, type, definition, xrefs, alts, version, property_predicates, property_values, xref_types, synonym_types)[source]

Bases: NamedTuple

Create new instance of NodeInfo(curie, prefix, label, synonyms, deprecated, type, definition, xrefs, alts, version, property_predicates, property_values, xref_types, synonym_types)

curie: str

Alias for field number 0

prefix: str

Alias for field number 1

label: str

Alias for field number 2

synonyms: str

Alias for field number 3

deprecated: Literal['true', 'false']

Alias for field number 4

type: Literal['class', 'property', 'individual', 'unknown']

Alias for field number 5

definition: str

Alias for field number 6

xrefs: str

Alias for field number 7

alts: str

Alias for field number 8

version: str

Alias for field number 9

property_predicates: str

Alias for field number 10

property_values: str

Alias for field number 11

xref_types: str

Alias for field number 12

synonym_types: str

Alias for field number 13

upload_s3(path, *, use_case, bucket='askem-mira', s3_client=None)[source]

Upload the nodes and edges to S3.

Return type:

None

upload_neo4j_s3(use_case_paths)[source]

Upload the nodes and edges to S3.

Return type:

None

Constructing Registry (mira.dkg.construct_registry)

Constants for the MIRA Metaregistry.

get_prefixes(*, nodes_path=None, edges_path=None)[source]

Get the prefixes to use for the slim.

Return type:

Set[str]

Configuration Models (mira.dkg.models)

Configuration for the DKG.

pydantic model Config[source]

Bases: BaseModel

Configuration for a custom metaregistry instance.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Show JSON schema
{
   "title": "Config",
   "description": "Configuration for a custom metaregistry instance.",
   "type": "object",
   "properties": {
      "web": {
         "title": "Web",
         "description": "Configuration for the web application",
         "type": "object"
      },
      "registry": {
         "title": "Registry",
         "description": "Custom registry entries",
         "type": "object",
         "additionalProperties": {
            "$ref": "#/definitions/Resource"
         }
      },
      "collections": {
         "title": "Collections",
         "description": "Custom collections",
         "type": "object",
         "additionalProperties": {
            "$ref": "#/definitions/Collection"
         }
      }
   },
   "definitions": {
      "Provider": {
         "title": "Provider",
         "description": "A provider.",
         "type": "object",
         "properties": {
            "code": {
               "title": "Code",
               "description": "A locally unique code within the prefix for the provider",
               "type": "string"
            },
            "name": {
               "title": "Name",
               "description": "Name of the provider",
               "type": "string"
            },
            "description": {
               "title": "Description",
               "description": "Description of the provider",
               "type": "string"
            },
            "homepage": {
               "title": "Homepage",
               "description": "Homepage of the provider",
               "type": "string"
            },
            "uri_format": {
               "title": "URI Format",
               "description": "The URI format string, which must have at least one ``$1`` in it. Note that this field is generic enough to accept IRIs. See the URI specification (https://www.rfc-editor.org/rfc/rfc3986) and IRI specification (https://www.ietf.org/rfc/rfc3987.txt) for more information.",
               "type": "string"
            }
         },
         "required": [
            "code",
            "name",
            "description",
            "homepage",
            "uri_format"
         ]
      },
      "Attributable": {
         "title": "Attributable",
         "description": "An upper-level metadata for a researcher.",
         "type": "object",
         "properties": {
            "name": {
               "title": "Name",
               "description": "The full name of the researcher",
               "type": "string"
            },
            "orcid": {
               "title": "Open Researcher and Contributor Identifier",
               "description": "The Open Researcher and Contributor Identifier (ORCiD) provides researchers with an open, unambiguous identifier for connecting various digital assets (e.g., publications, reviews) across the semantic web. An account can be made in seconds at https://orcid.org.",
               "pattern": "^\\d{4}-\\d{4}-\\d{4}-\\d{3}(\\d|X)$",
               "type": "string"
            },
            "email": {
               "title": "Email address",
               "description": "The email address specific to the researcher.",
               "type": "string"
            },
            "github": {
               "title": "GitHub handle",
               "description": "The GitHub handle enables contacting the researcher on GitHub: the *de facto* version control in the computer sciences and life sciences.",
               "type": "string"
            }
         },
         "required": [
            "name"
         ]
      },
      "Organization": {
         "title": "Organization",
         "description": "Model for organizataions.",
         "type": "object",
         "properties": {
            "ror": {
               "title": "Research Organization Registry identifier",
               "description": "ROR identifier for a record about the organization",
               "type": "string"
            },
            "wikidata": {
               "title": "Wikidata identifier",
               "description": "Wikidata identifier for a record about the organization",
               "type": "string"
            },
            "name": {
               "title": "Name",
               "description": "Name of the organization",
               "type": "string"
            },
            "partnered": {
               "title": "Partnered",
               "description": "Has this organization made a specific connection with Bioregistry?",
               "default": false,
               "type": "boolean"
            }
         },
         "required": [
            "name"
         ]
      },
      "Publication": {
         "title": "Publication",
         "description": "Metadata about a publication.",
         "type": "object",
         "properties": {
            "pubmed": {
               "title": "PubMed",
               "description": "The PubMed identifier for the article",
               "type": "string"
            },
            "doi": {
               "title": "DOI",
               "description": "The DOI for the article. DOIs are case insensitive, so these are required by the Bioregistry to be standardized to their lowercase form.",
               "type": "string"
            },
            "pmc": {
               "title": "PMC",
               "description": "The PubMed Central identifier for the article",
               "type": "string"
            },
            "title": {
               "title": "Title",
               "description": "The title of the article",
               "type": "string"
            },
            "year": {
               "title": "Year",
               "description": "The year the article was published",
               "type": "integer"
            }
         }
      },
      "Author": {
         "title": "Author",
         "description": "Metadata for an author.",
         "type": "object",
         "properties": {
            "name": {
               "title": "Name",
               "description": "The full name of the researcher",
               "type": "string"
            },
            "orcid": {
               "title": "Open Researcher and Contributor Identifier",
               "description": "The Open Researcher and Contributor Identifier (ORCiD) provides researchers with an open, unambiguous identifier for connecting various digital assets (e.g., publications, reviews) across the semantic web. An account can be made in seconds at https://orcid.org.",
               "pattern": "^\\d{4}-\\d{4}-\\d{4}-\\d{3}(\\d|X)$",
               "type": "string"
            },
            "email": {
               "title": "Email address",
               "description": "The email address specific to the researcher.",
               "type": "string"
            },
            "github": {
               "title": "GitHub handle",
               "description": "The GitHub handle enables contacting the researcher on GitHub: the *de facto* version control in the computer sciences and life sciences.",
               "type": "string"
            }
         },
         "required": [
            "name",
            "orcid"
         ]
      },
      "Resource": {
         "title": "Resource",
         "description": "Metadata about an ontology, database, or other resource.",
         "type": "object",
         "properties": {
            "prefix": {
               "title": "Prefix",
               "description": "The prefix for this resource",
               "type": "string"
            },
            "name": {
               "title": "Name",
               "description": "The name of the resource",
               "type": "string"
            },
            "description": {
               "title": "Description",
               "description": "A description of the resource",
               "type": "string"
            },
            "pattern": {
               "title": "Pattern",
               "description": "The regular expression pattern for local unique identifiers in the resource",
               "type": "string"
            },
            "uri_format": {
               "title": "URI format string",
               "description": "The URI format string, which must have at least one ``$1`` in it. Note that this field is generic enough to accept IRIs. See the URI specification (https://www.rfc-editor.org/rfc/rfc3986) and IRI specification (https://www.ietf.org/rfc/rfc3987.txt) for more information.",
               "type": "string"
            },
            "uri_format_resolvable": {
               "title": "URI format string resolvable",
               "description": "If false, denotes if the URI format string is known to be not resolvable",
               "type": "boolean"
            },
            "rdf_uri_format": {
               "title": "RDF URI format string",
               "description": "The RDF URI format string, which must have at least one ``$1`` in it. Note that this field is generic enough to accept IRIs. See the URI specification (https://www.rfc-editor.org/rfc/rfc3986) and IRI specification (https://www.ietf.org/rfc/rfc3987.txt) for more information.",
               "type": "string"
            },
            "providers": {
               "title": "Providers",
               "description": "Additional, non-default providers for the resource",
               "type": "array",
               "items": {
                  "$ref": "#/definitions/Provider"
               }
            },
            "homepage": {
               "title": "Homepage",
               "description": "The URL for the homepage of the resource, preferably using HTTPS",
               "type": "string"
            },
            "repository": {
               "title": "Repository",
               "description": "The URL for the repository of the resource",
               "type": "string"
            },
            "contact": {
               "title": "Contact",
               "description": "The contact email address for the resource. This must correspond to a specific person and not be a listserve nor a shared email account.",
               "allOf": [
                  {
                     "$ref": "#/definitions/Attributable"
                  }
               ]
            },
            "owners": {
               "title": "Owners",
               "description": "The owner of the corresponding identifier space. See also https://github.com/biopragmatics/bioregistry/issues/755.",
               "type": "array",
               "items": {
                  "$ref": "#/definitions/Organization"
               }
            },
            "example": {
               "title": "Example",
               "description": "An example local identifier for the resource, explicitly excluding any redundant usage of the prefix in the identifier. For example, a GO identifier should only look like ``1234567`` and not like ``GO:1234567``",
               "type": "string"
            },
            "example_extras": {
               "title": "Example Extras",
               "description": "Extra example identifiers",
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "example_decoys": {
               "title": "Example Decoys",
               "description": "Extra example identifiers that explicitly fail regex tests",
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "license": {
               "title": "License",
               "description": "The license for the resource",
               "type": "string"
            },
            "version": {
               "title": "Version",
               "description": "The version for the resource",
               "type": "string"
            },
            "part_of": {
               "title": "Part Of",
               "description": "An annotation between this prefix and a super-prefix. For example, ``chembl.compound`` is a part of ``chembl``.",
               "type": "string"
            },
            "provides": {
               "title": "Provides",
               "description": "An annotation between this prefix and a prefix for which it is redundant. For example, ``ctd.gene`` has been given a prefix by Identifiers.org, but it actually just reuses identifies from ``ncbigene``, so ``ctd.gene`` provides ``ncbigene``.",
               "type": "string"
            },
            "download_owl": {
               "title": "OWL Download URL",
               "description": "The URL to download the resource as an ontology encoded in the OWL format. More information about this format can be found at https://www.w3.org/TR/owl2-syntax/.",
               "type": "string"
            },
            "download_obo": {
               "title": "OBO Download URL",
               "description": "The URL to download the resource as an ontology encoded in the OBO format. More information about this format can be found at https://owlcollab.github.io/oboformat/doc/obo-syntax.html.",
               "type": "string"
            },
            "download_json": {
               "title": "OBO Graph JSON Download URL",
               "description": "The URL to download the resource as an ontology encoded in the OBO Graph JSON format. More information about this format can be found at https://github.com/geneontology/obographs.",
               "type": "string"
            },
            "download_rdf": {
               "title": "RDF Download URL",
               "description": "The URL to download the resource as an RDF file, in one of many formats.",
               "type": "string"
            },
            "banana": {
               "title": "Banana",
               "description": "The `banana` is a generalization of the concept of the \"namespace embedded in local unique identifier\". Many OBO foundry ontologies use the redundant uppercased name of the ontology in the local identifier, such as the Gene Ontology, which makes the prefixes have a redundant usage as in ``GO:GO:1234567``. The `banana` tag explicitly annotates the part in the local identifier that should be stripped, if found. While the Bioregistry automatically knows how to handle all OBO Foundry ontologies' bananas because the OBO Foundry provides the \"preferredPrefix\" field, the banana can be annotated on non-OBO ontologies to more explicitly write the beginning part of the identifier that should be stripped. This allowed for solving one of the long-standing issues with the Identifiers.org resolver (e.g., for ``oma.hog``; see https://github.com/identifiers-org/identifiers-org.github.io/issues/155) as well as better annotate new entries, such as SwissMap Lipids, which have the prefix ``swisslipid`` but have the redundant information ``SLM:`` in the beginning of identifiers. Therefore, ``SLM:`` is the banana.",
               "type": "string"
            },
            "banana_peel": {
               "title": "Banana Peel",
               "description": "Delimiter used in banana",
               "type": "string"
            },
            "deprecated": {
               "title": "Deprecated",
               "description": "A flag denoting if this resource is deprecated. Currently, this is a blanket term that covers cases when the prefix is no longer maintained, when it has been rolled into another resource, when the website related to the resource goes down, or any other reason that it's difficult or impossible to find full metadata on the resource. If this is set to true, please add a comment explaining why. This flag will override annotations from the OLS, OBO Foundry, and others on the deprecation status, since they often disagree and are very conservative in calling dead resources.",
               "type": "boolean"
            },
            "mappings": {
               "title": "Mappings",
               "description": "A dictionary of metaprefixes (i.e., prefixes for registries) to prefixes in external registries. These also correspond to the registry-specific JSON fields in this model like ``miriam`` field.",
               "type": "object",
               "additionalProperties": {
                  "type": "string"
               }
            },
            "synonyms": {
               "title": "Synonyms",
               "description": "A list of synonyms for the prefix of this resource. These are used in normalization of prefixes and are a useful reference tool for prefixes that are written many ways. For example, ``snomedct`` has many synonyms including typos like ``SNOWMEDCT``, lexical variants like ``SNOMED_CT``, version-variants like ``SNOMEDCT_2010_1_31``, and tons of other nonsense like ``SNOMEDCTCT``.",
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "keywords": {
               "title": "Keywords",
               "description": "A list of keywords for the resource",
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "references": {
               "title": "References",
               "description": "A list of URLs to also see, such as publications describing the resource",
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "publications": {
               "title": "Publications",
               "description": "A list of URLs to also see, such as publications describing the resource",
               "type": "array",
               "items": {
                  "$ref": "#/definitions/Publication"
               }
            },
            "appears_in": {
               "title": "Appears In",
               "description": "A list of prefixes that use this resource for xrefs, provenance, etc.",
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "depends_on": {
               "title": "Depends On",
               "description": "A list of prefixes that use this resource depends on, e.g., ontologies that import each other.",
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "namespace_in_lui": {
               "title": "Namespace Embedded in Local Unique Identifier",
               "description": "A flag denoting if the namespace is embedded in the LUI (if this is true and it is not accompanied by a banana, assume that the banana is the prefix in all caps plus a colon, as is standard in OBO). Currently this flag is only used to override identifiers.org in the case of ``gramene.growthstage``, ``oma.hog``, and ``vario``.",
               "type": "boolean"
            },
            "no_own_terms": {
               "title": "No Own Terms",
               "description": "A flag denoting if the resource mints its own identifiers. Omission or explicit marking as false means that the resource does have its own terms. This is most applicable to ontologies, specifically application ontologies, which only reuse terms from others. One example is ChIRO.",
               "type": "boolean"
            },
            "comment": {
               "title": "Comment",
               "description": "A field for a free text comment",
               "type": "string"
            },
            "contributor": {
               "title": "Contributor",
               "description": "The contributor of the prefix to the Bioregistry, including at a minimum their name and ORCiD and optional their email address and GitHub handle. All entries curated through the Bioregistry GitHub Workflow must contain this field.",
               "allOf": [
                  {
                     "$ref": "#/definitions/Author"
                  }
               ]
            },
            "contributor_extras": {
               "title": "Contributor Extras",
               "description": "Additional contributors besides the original submitter.",
               "type": "array",
               "items": {
                  "$ref": "#/definitions/Author"
               }
            },
            "reviewer": {
               "title": "Reviewer",
               "description": "The reviewer of the prefix to the Bioregistry, including at a minimum their name and ORCiD and optional their email address and GitHub handle. All entries curated through the Bioregistry GitHub Workflow should contain this field pointing to the person who reviewed it on GitHub.",
               "allOf": [
                  {
                     "$ref": "#/definitions/Author"
                  }
               ]
            },
            "proprietary": {
               "title": "Proprietary",
               "description": "A flag to denote if this database is proprietary and therefore can not be included in normal quality control checks nor can it be resolved. Omission or explicit marking as false means that the resource is not proprietary.",
               "type": "boolean"
            },
            "has_canonical": {
               "title": "Has Canonical",
               "description": "If this shares an IRI with another entry, maps to which should be be considered as canonical",
               "type": "string"
            },
            "preferred_prefix": {
               "title": "Preferred Prefix",
               "description": "An annotation of stylization of the prefix. This appears in OBO ontologies like FBbt as well as databases like NCBIGene. If it's not given, then assume that the normalized prefix used in the Bioregistry is canonical.",
               "type": "string"
            },
            "twitter": {
               "title": "Twitter",
               "description": "The twitter handle for the project",
               "type": "string"
            },
            "mastodon": {
               "title": "Mastodon",
               "description": "The mastodon handle for the project",
               "type": "string"
            },
            "github_request_issue": {
               "title": "Github Request Issue",
               "description": "The GitHub issue for the new prefix request",
               "type": "integer"
            },
            "logo": {
               "title": "Logo",
               "description": "The URL of the logo for the project/resource",
               "type": "string"
            },
            "miriam": {
               "title": "Miriam",
               "type": "object"
            },
            "n2t": {
               "title": "N2T",
               "type": "object"
            },
            "prefixcommons": {
               "title": "Prefixcommons",
               "type": "object"
            },
            "wikidata": {
               "title": "Wikidata",
               "type": "object"
            },
            "go": {
               "title": "Go",
               "type": "object"
            },
            "obofoundry": {
               "title": "Obofoundry",
               "type": "object"
            },
            "bioportal": {
               "title": "Bioportal",
               "type": "object"
            },
            "ecoportal": {
               "title": "Ecoportal",
               "type": "object"
            },
            "agroportal": {
               "title": "Agroportal",
               "type": "object"
            },
            "cropoct": {
               "title": "Cropoct",
               "type": "object"
            },
            "ols": {
               "title": "Ols",
               "type": "object"
            },
            "aberowl": {
               "title": "Aberowl",
               "type": "object"
            },
            "ncbi": {
               "title": "Ncbi",
               "type": "object"
            },
            "uniprot": {
               "title": "Uniprot",
               "type": "object"
            },
            "biolink": {
               "title": "Biolink",
               "type": "object"
            },
            "cellosaurus": {
               "title": "Cellosaurus",
               "type": "object"
            },
            "ontobee": {
               "title": "Ontobee",
               "type": "object"
            },
            "cheminf": {
               "title": "Cheminf",
               "type": "object"
            },
            "fairsharing": {
               "title": "Fairsharing",
               "type": "object"
            },
            "biocontext": {
               "title": "Biocontext",
               "type": "object"
            },
            "edam": {
               "title": "Edam",
               "type": "object"
            },
            "re3data": {
               "title": "Re3Data",
               "type": "object"
            },
            "hl7": {
               "title": "Hl7",
               "type": "object"
            },
            "bartoc": {
               "title": "BARTOC",
               "type": "object"
            },
            "rrid": {
               "title": "RRID",
               "type": "object"
            },
            "lov": {
               "title": "LOV",
               "type": "object"
            },
            "zazuko": {
               "title": "Zazuko",
               "type": "object"
            },
            "togoid": {
               "title": "Togoid",
               "type": "object"
            },
            "integbio": {
               "title": "Integbio",
               "type": "object"
            },
            "pathguide": {
               "title": "Pathguide",
               "type": "object"
            }
         },
         "required": [
            "prefix"
         ]
      },
      "Collection": {
         "title": "Collection",
         "description": "A collection of resources.",
         "type": "object",
         "properties": {
            "identifier": {
               "title": "Identifier",
               "description": "The collection's identifier",
               "type": "string"
            },
            "name": {
               "title": "Name",
               "description": "The name of the collection",
               "type": "string"
            },
            "description": {
               "title": "Description",
               "description": "A description of the collection",
               "type": "string"
            },
            "resources": {
               "title": "Resources",
               "description": "A list of prefixes of resources appearing in the collection",
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "authors": {
               "title": "Authors",
               "description": "A list of authors/contributors to the collection",
               "type": "array",
               "items": {
                  "$ref": "#/definitions/Author"
               }
            },
            "context": {
               "title": "Context",
               "description": "The JSON-LD context's name",
               "type": "string"
            },
            "references": {
               "title": "References",
               "description": "URL references",
               "type": "array",
               "items": {
                  "type": "string"
               }
            }
         },
         "required": [
            "identifier",
            "name",
            "description",
            "resources",
            "authors"
         ]
      }
   }
}

Fields:
field collections: Mapping[str, Collection] [Optional]

Custom collections

field registry: Mapping[str, Resource] [Optional]

Custom registry entries

field web: Mapping[str, Any] [Optional]

Configuration for the web application

pydantic model Xref[source]

Bases: BaseModel

Represents a typed cross-reference.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Show JSON schema
{
   "title": "Xref",
   "description": "Represents a typed cross-reference.",
   "type": "object",
   "properties": {
      "id": {
         "title": "Id",
         "description": "The CURIE of the cross reference",
         "type": "string"
      },
      "type": {
         "title": "Type",
         "description": "The CURIE for the cross reference predicate",
         "example": "skos:exactMatch",
         "type": "string"
      }
   },
   "required": [
      "id",
      "type"
   ]
}

Fields:
field id: str [Required]

The CURIE of the cross reference

field type: str [Required]

The CURIE for the cross reference predicate

pydantic model Synonym[source]

Bases: BaseModel

Represents a typed synonym.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Show JSON schema
{
   "title": "Synonym",
   "description": "Represents a typed synonym.",
   "type": "object",
   "properties": {
      "value": {
         "title": "Value",
         "description": "The text of the synonym",
         "type": "string"
      },
      "type": {
         "title": "Type",
         "description": "The CURIE for the synonym predicate",
         "example": "skos:exactMatch",
         "type": "string"
      }
   },
   "required": [
      "value",
      "type"
   ]
}

Fields:
field type: str [Required]

The CURIE for the synonym predicate

field value: str [Required]

The text of the synonym

Units (mira.dkg.units)

get_unit_terms()[source]

Get tuples for each unit.

query_wikidata(sparql)[source]

Query Wikidata’s sparql service.

Parameters:

sparql (str) – A SPARQL query string

Return type:

List[Mapping[str, Any]]

Returns:

A list of bindings

update_unit_names_resource()[source]

Update a resource file with all unit names.

App Utilities (mira.dkg.utils)

Utilities and constants for the MIRA app.

class MiraState(client, grounder, refinement_closure, lexical_dump, vectors)[source]

Bases: object

Represents the state associated with the MIRA app.

PREFIXES = ['oboinowl', 'owl', 'rdfs', 'bfo', 'caro', 'hp', 'disdriv', 'symp', 'ido', 'vo', 'ovae', 'oae', 'trans', 'doid', 'apollosv', 'efo', 'ncit', 'cemo', 'vido', 'cido', 'idocovid19', 'idomal', 'vsmo', 'covoc', 'probonto', 'geonames']

A list of all prefixes used in MIRA

DKG_REFINER_RELS = ['subclassof', 'part_of']

A list of all relation types that are considered refinement relations

DOCKER_FILES_ROOT = PosixPath('/sw')

The root path of the MIRA app when running in a container

Web Client (mira.dkg.web_client)

web_client(endpoint, method, query_json=None, api_url=None)[source]

A wrapper for sending requests to the REST API and returning the results

Parameters:
  • endpoint (str) – The endpoint to send the request to.

  • method (Literal['get', 'post']) – Which method to use. Must be one of ‘post’ and ‘get’.

  • query_json (Union[Dict[str, Any], List[Tuple[str, Any]], None]) –

    The data to send with the request. This parameter must be filled if method is ‘post’. If method is ‘get’, and the endpoint expects a list, this parameter needs to be a list of tuples of key-value pairs, i.e. [(key, value)], as per the requests api: https://requests.readthedocs.io/en/latest/api/#requests.get To provide a list for one parameter, repeat the key with each value of the list.

    Example: If the endpoint expect key1 to be a list and key2 to be parameter, sending [(key1, value1), (key1, value2), (key2, value3)] as query_json will result in the endpoint receiving the variables key1=[value1, value2], key2=value3

  • api_url (Optional[str]) – Provide the base URL to the REST API. Use this argument to override the default set in MIRA_REST_URL or rest_url from the config file.

Return type:

Union[List[Dict[str, Any]], Dict[str, Any], None]

Returns:

The data sent back from the endpoint as a json, unless the response is empty, in which case None is returned.

get_relations_web(relations_model, api_url=None)[source]

Get relations based on the query contained in the RelationQuery model

A wrapper that call the REST API’s get_relations endpoint.

Parameters:
  • relations_model (RelationQuery) – An instance of a RelationQuery BaseModel.

  • api_url (Optional[str]) – Use this parameter to specify the REST API base url or to override the url set in the environment or the config.

Return type:

Union[List[RelationResponse], List[FullRelationResponse]]

Returns:

If any relation exists, a list of RelationResponse models or FullRelationResponse models if a full query was requested.

Examples

To populate the RelationQuery BaseModel, follow this example:

from mira.dkg.api import RelationQuery
from mira.dkg.web_client import get_relations_web
relation_query = RelationQuery(target_curie="ncbitaxon:10090", relations="vo:0001243")
relations = get_relations_web(relations_model=relation_query)
print(relations[:5])
get_entity_web(curie, api_url=None)[source]

Get information about an entity based on its compact URI (CURIE)

A wrapper that calls the REST API’s entity endpoint.

Parameters:
  • curie (str) – The curie for an entity to get information about.

  • api_url (Optional[str]) – Use this parameter to specify the REST API base url or to override the url set in the environment or the config.

Return type:

Optional[Entity]

Returns:

Returns an Entity model, if the entity exists in the graph.

get_entities_web(curies)[source]

Get information about multiple entities (e.g., their names, description synonyms, alternative identifiers, database cross-references, etc.) based on their respective compact URIs (CURIEs).

A wrapper that calls the REST API’s entities endpoint.

Parameters:

curies (List[str]) – A list of curies for entities to get information about.

Return type:

List[Union[AskemEntity, Entity]]

Returns:

Returns a list of Entity models, if the entities exist in the graph.

get_lexical_web(api_url=None)[source]

Get lexical information for all entities in the graph

A wrapper that calls the REST API’s lexical endpoint.

Parameters:

api_url (Optional[str]) – Use this parameter to specify the REST API base url or to override the url set in the environment or the config.

Return type:

List[Dict[str, Any]]

Returns:

A list of all entities in the graph.

ground_web(text, namespaces=None, api_url=None)[source]

Ground text with Gilda to an ontology identifier

A wrapper that calls the REST API’s grounding POST endpoint

Parameters:
  • text (str) – The text to be grounded.

  • namespaces (Optional[List[str]]) – A list of namespaces to filter groundings to. Optional. Example=[“do”, “mondo”, “ido”]

  • api_url (Optional[str]) – Use this parameter to specify the REST API base url or to override the url set in the environment or the config.

Return type:

Optional[GroundResults]

Returns:

If the query results in at least one grounding, a GroundResults model is returned with all the results.

search_web(term, limit=25, offset=0, api_url=None)[source]

Get nodes based on a search to their name/synonyms

A wrapper that call the REST API’s search endpoint

Parameters:
  • term (str) – The term to search for

  • limit (int) – Limit the number of results to this number. Default: 25.

  • offset (int) – The offset for the results

  • api_url (Optional[str]) – Use this parameter to specify the REST API base url or to override the url set in the environment or the config.

Return type:

List[Entity]

Returns:

A list of the matching entities.

get_transitive_closure_web(relation_types=None, api_url=None)[source]

Get a transitive closure for the given relation type(s)

Parameters:
  • relation_types (Optional[List[str]]) – A list of relation types to get the transitive closure for. Optional. Default is [“subclassof”, “part_of”].

  • api_url (Optional[str]) – Use this parameter to specify the REST API base url or to override the url set in the environment or the config.

Return type:

Set[Tuple[str, str]]

Returns:

A set of tuples of CURIEs representing a transitive closure set for the relations of the requested type(s). The pairs are ordered as (successor, descendant). Note that if the relations are ones that point towards taxonomical parents (e.g., subclassof, part_of), then the pairs are interpreted as (taxonomical child, taxonomical ancestor).

is_ontological_child_web(child_curie, parent_curie, api_url=None)[source]

Check if one curie is a child term of another curie

Parameters:
  • child_curie (str) – The entity, identified by its CURIE that is assumed to be a child term

  • parent_curie (str) – The entity, identified by its CURIE that is assumed to be a parent term

  • api_url (Optional[str]) – Use this parameter to specify the REST API base url or to override the url set in the environment or the config

Return type:

bool

Returns:

True if the assumption that child_curie is an ontological child of parent_curie holds

exception MissingBaseUrlError[source]

Bases: ValueError

Raised when the base url for the REST API is missing