Chuvash Toponymy Database v.0.1

db_docs.md at [fca9ddb8f7]
Login

db_docs.md at [fca9ddb8f7]

File data/csv/db_experiment/v0.8/db_docs.md artifact 7969c79cb7 part of check-in fca9ddb8f7


**Toponymic Database Documentation**

## Overview
This database is designed to systematically store, analyze and eventually visualize historical and contemporary toponyms (place names) with a structured approach, focusing on spatial, linguistic, and temporal aspects. The structure consists of three interlinked tables:
1. **Spatial Table** - Stores geographical locations and metadata.
2. **Linguistic Table** - Captures toponyms, their various variants and linguistic attributes like pronunciation and etymology.
3. **Temporal Table** - Documents historical changes in toponyms over time.

---

## **1. Spatial Table**

### **Purpose:**
The spatial table contains unique geographic entities, their locations, and relevant metadata.

### **Columns:**
- **`ID`** *(int, PK)*: Unique identifier for each geographic object. In the very beginning the order followed an original alphabetically sorted table, 2023_jun_14_dubanov_toponymy, based on a book (Дубанов И.С. «Топонимический словарь Чувашии. Географические названия и термины») order, now it is basically free.
- **`LAT`, `LON`** *(float, nullable)*: Estimated geographic coordinates derived from historical and modern maps (see http://www.etomesto.ru/ , https://retromap.ru/ and various other resources).
- **`OFFNAME`** *(string)*: The most "official" available name for the location (either current or historical).
- **`LANG`** *(string)*: The language of the name, using codes such as `RUS`, `CHU`, `HIM`, `MEM`, `ERZ`, `TAT`, `UNK`.
- **`CLASS`** *(string, nullable)*: Specifies the toponymic category, e.g., oikonym (settlement), hydronym (water body), dromonym (road), etc.
- **`TYPE`** *(string, nullable)*: Specifies the sub-category, e.g., village, city (for oikonyms), river, stream (for hydronyms), etc.
- **`DISTRICT`** *(string, nullable)*: Modern administrative district (e.g., to differentiate places with the same name).
- **`DOUBT`** *(int, nullable)*: Certainty level of assigned coordinates (empty = "I'm sure", 1 = "The coordinates are doubtful").
- **`LANDMARK`** *(text, nullable)*: Relevant for microtoponyms; describes nearby visible features to help locate the object.
- **`COMMENTS`** *(text, nullable)*: Free-text notes for human workers.
- **`OTHER`** *(text, nullable)*: Reserved for future use.

---

## **2. Linguistic Table**

### **Purpose:**
The linguistic table stores different names (and their variants) of the same geographic object across different languages and sources, etymological information and other linguistic metadata.

### **Columns:**
- **`ID`** *(int, PK)*: Unique identifier for each toponym entry, independent from the spatial table.
- **`SPATID`** *(int, FK → Spatial.ID)*: Links the toponym to its corresponding spatial object.
- **`DOUSPAT`** *(int, nullable)*: Certainty of name relation to the spatial object (empty = "I'm sure", 1 = "The spatial link is doubtful").
- **`MAINID`** *(int, FK → Linguistic.ID)*: Represents the "main" name in a "nest" of interconnected toponyms (sometimes grouping is necessary, many sources give several somehow connected names for a single spatial object and we want to reflect this connections in database).
- **`TOPONYM`** *(string)*: The attested name as recorded from sources.
- **`TOPFORMS`** *(text, nullable)*: Holds different attested spelling variations (must contain **the same language text** info as the **`TOPONYM`** field) of the name from multiple sources and free text comments of database workers.
- **`DOUTOPO`** *(int, nullable)*: Certainty of the given **`TOPONYM`** form, not of the **`TOPFORMS`**, only of the **`TOPONYM`** column (empty = "I'm sure", 1 = "It is doubtful").
- **`LANG`** *(string)*: The language of the toponym, following the predefined codes.
- **`DOULANG`** *(int, nullable)*: Certainty of the language of the toponym (empty = "I'm sure", 1 = "It is doubtful").
- **`PRONUNC`** *(text, nullable)*: Reserved for pronunciation information (format yet to be decided).
- **`DOUPRON`** *(int, nullable)*: Certainty of the pronunciation information (empty = "I'm sure", 1 = "It is doubtful").
- **`ETYM`** *(text, nullable)*: Etymological explanation of the name's origin. It is always doubtful.
- **`ORIGIN`** *(string, nullable)*: Language from which the name originated (e.g., `CHU`, `RUS`, `TAT`, `OTH`, `UNK`). This one is always doubtful too.
- **`COMMENTS`** *(text, nullable)*: Free-text notes.
- **`OTHER`** *(text, nullable)*: Reserved for future use.

---

## **3. Temporal Table**

### **Purpose:**
This table records historical changes in toponyms, tracking their usage over time.

### **Columns:**
- **`ID`** *(int, PK)*: Unique identifier for each temporal record.
- **`LINGID`, `LINGNAME`** *(Composite FK → Linguistic (ID, TOPONYM))*: Together, these fields reference a specific toponym in the linguistic table.
- **`START`** *(int)*: Earliest recorded use of the name (the year of first mention according to some source: book or map or something else), full source text goes to **`FULLTEXT`** column.
- **`DOUSTART`** *(int, nullable)*: Certainty level for the `START` date (empty = "I'm sure", 1 = "It is doubtful").
- **`END`** *(int, nullable)*: The last recorded use of the name. If the object still exists, this field remains empty. The source text is stored in the **FULLTEXT** column.
- **`DOUEND`** *(int, nullable)*: Certainty level for the `END` date (empty = "I'm sure", 1 = "It is doubtful").
- **`EVENT`** *(string)*: Categorization of historical change, with predefined values:
  - `MERGEIN`: The object was absorbed into another.
  - `ACTIVE`: The name is still in use.
  - `RENAME`: The object was renamed.
  - `CEASE`: The object ceased to exist.
- **`OBJID`** *(int, nullable)*, **`OBJNAME`** *(text, nullable)*: Context-dependent:
  - **ACTIVE**: Both empty.
  - **RENAME**: Stores the information about the **new** name of the object, it is a reference to Linguistic (ID, TOPONYM).
  - **MERGEIN**: Stores the information about the **absorbing** object, again, it is a reference to Linguistic (ID, TOPONYM).
  - **CEASE**: Both empty.
- **`COMMENTS`** *(text, nullable)*: Free-text notes.
- **`OTHER`** *(text, nullable)*: Reserved for future use.
- **`FULLTEXT`** *(text)*: A complete textual description of the historical event, providing transparency by explicitly referencing the relevant source(s), for instance, the Source ID (see **4. Sources Table**) along with page numbers or other identifying details when applicable.

---

## **4. Sources Table**

### **Purpose:**
This table consolidates bibliographic or archival references that substantiate historical information in the **3. Temporal Table**. Each entry describes a single source, such as a book, article, map, or manuscript. Linking these sources to `Temporal` ensures that statements like “this settlement was called so-and-so from year X to year Y” can be properly backed by historical documentation.

### **Columns:**
- **`ID`** *(string, PK)*: Unique identifier for each source.
- **`TYPE`** *(string)*: The type of source (e.g., `BOOK`, `ARTICLE`, `MAP`, `ARCHIVE`, `WEB`).
- **`AUTHOR`** *(string, nullable)*: Author(s) or compiler(s).
- **`TITLE`** *(text)*: Title of the source (book name, article title, map reference, etc.).
- **`YEAR`** *(int, nullable)*: Publication or creation year of the source.
- **`PUBLISHER`** *(string, nullable)*: Publisher or institution name, if applicable.
- **`CITATION`** *(text, nullable)*: Full bibliographic reference or archival notation (e.g., “И. С. Дубанов, 2020, ...”).
- **`COMMENTS`** *(text, nullable)*: Additional remarks or clarifications about the source.
- **`PDF`** *(string, nullable)*: Is PDF file accessible somewhere? It is simply "yes" or "no".
- **`OTHER`** *(text, nullable)*: Reserved for future use.

### **Usage & Relationship with Temporal Table**:
- Each record in the **Temporal** table can link to one or more **Sources** to document the basis for historical data through free text in the **`FULLTEXT`** field.

---

## **Relationships and Interactions**
- **`SPATID` (Linguistic) → `ID` (Spatial)**: Links toponyms to geographic locations.
- **`LINGID` (Temporal) → `ID` (Linguistic)** and **`LINGNAME` (Temporal) → `TOPONYM` (Linguistic)**: Tracks historical changes of a specific name.
- **`OBJID` (Temporal) → `ID` (Linguistic)** and **`OBJNAME` (Temporal) → `TOPONYM` (Linguistic)**: Records specifically name changes or mergers.

---

## **Next Steps**
- Finalize the format for the `PRONUNC` field.

---

## **Various**
- Comment fields. It is assumed (in order to clarify the workflow for specific fields in the database and to more quickly achieve understanding between workers) that comments are filled in a standardized way. First, the employee puts (if there is a previous comment, then first a space, a vertical bar, another space) the three-letter code assigned to him/her, as a rule, based on his/her full name (surname, name, patronymic), then a colon, then a space, and after this prefix -- the actual text of the comment. Example: "SMY: в 27 км от райцентра, в лесах на правом берегу Суры, ныне не существует | IRI: На карте 1985 г. с retromap.ru это кордон Келейный, через реку от урочища Монашеское с более современных карт, примерно в месте впадения р. Чернушки в р. Бездну"