Chuvash Toponymy Database v.0.1: db_docs.md at [cd3bd33ee3]

File data/csv/db_docs.md artifact 558aaf40a1 part of check-in cd3bd33ee3

**Toponymic Database Documentation**

## Overview
This database is designed to systematically store, analyze and eventually visualize historical and modern toponyms (place names). It takes a structured approach, focusing on spatial, linguistic, and temporal aspects. The database consists of six interconnected tables:

1. **spatial** – geographical point locations and metadata
2. **water** – non-point geographical features like rivers, streams, and lakes
3. **linguistic** – attested toponyms, their variants, and linguistic attributes such as pronunciation and etymology
4. **temporal** – historical changes and usage periods of toponyms
5. **experiment** – a flexible structure for exploring additional data fields related to name changes
6. **sources** – metadata about maps, books, and documents that contain toponym information

---

## **1. spatial table**

### **Purpose:**
Stores discrete geographic entities, their coordinates, and relevant metadata.

### **Columns:**
- **`ID`** *(int, PK)*: Unique identifier for each geographic object. Initially followed an alphabetical order based on the book by Дубанов И. С. «Топонимический словарь Чувашии. Географические названия и термины» (2023_jun_14_dubanov_toponymy), now it is basically free.
- **`LAT`, `LON`** *(float, nullable)*: Estimated geographic coordinates derived from historical and modern maps (see http://www.etomesto.ru/ , https://retromap.ru/ and various other resources).
- **`OFFNAME`** *(string)*: The most "official" available name for the location (either current or historical).
- **`LANG`** *(string)*: The language of the name, using codes such as `RUS`, `CHU`, `HIM`, `MEM`, `ERZ`, `TAT`, `UNK`.
- **`CLASS`** *(string, nullable)*: Specifies the toponymic category, e.g., oikonym (settlement), hydronym (water body), dromonym (road), etc.
- **`TYPE`** *(string, nullable)*: Specifies the sub-category, e.g., village, city (for oikonyms), river, stream (for hydronyms), etc.
- **`DISTRICT`** *(string, nullable)*: Modern administrative district (e.g., to differentiate places with the same name).
- **`DOUBT`** *(int, nullable)*: Certainty level of assigned coordinates (empty = "I'm sure", 1 = "The coordinates are doubtful").
- **`LANDMARK`** *(text, nullable)*: Relevant for microtoponyms; describes nearby visible features to help locate the object.
- **`COMMENTS`** *(text, nullable)*: Free-text notes by database contributors.
- **`OTHER`** *(text, nullable)*: Reserved for future use.

---

## **2. water table**

### **Purpose:**
Stores metadata about rivers, streams, lakes, and other non-point water bodies.

### **Columns:**
Work In Progress

---

## ** 3. linguistic table**

### **Purpose:**
Captures attested toponyms, their variations, and relevant linguistic metadata.

### **Columns:**
- **`ID`** *(int, PK)*: Unique identifier for each toponym entry, independent from the spatial table.
- **`SPATID`** *(int, FK → spatial.ID)*: Links the toponym to a geographic location.
- **`DOUSPAT`** *(int, nullable)*: Certainty of spatial link (empty = certain, 1 = doubtful).
- **`MAINID`** *(int, FK → linguistic.ID)*: Points to the main name in a cluster of interconnected names (sometimes grouping is necessary, many sources give several somehow connected names for a single spatial object and we want to reflect this connections in database).
- **`TOPONYM`** *(string)*: The attested name as recorded from sources.
- **`TOPFORMS`** *(text, nullable)*: Alternative spellings with worker comments (must contain **the same language text** info as the **`TOPONYM`** field).
- **`DOUTOPO`** *(int, nullable)*: Certainty of the form in `TOPONYM` only (empty = certain, 1 = doubtful).
- **`LANG`** *(string)*: The language of the toponym, following the predefined codes.
- **`DOULANG`** *(int, nullable)*: Certainty of the language of the toponym (empty = certain, 1 = doubtful).
- **`PRONUNC`** *(text, nullable)*: Reserved for pronunciation data (format TBD).
- **`DOUPRON`** *(int, nullable)*: Certainty of the pronunciation data (empty = certain, 1 = doubtful).
- **`ETYM`** *(text, nullable)*: Etymological explanation of the name's origin (always treated as doubtful).
- **`ORIGIN`** *(string, nullable)*: Language from which the name originated, e.g. `CHU`, `RUS`, `TAT`, `OTH`, `UNK` (also always treated as doubtful).
- **`COMMENTS`** *(text, nullable)*: Free-text notes.
- **`OTHER`** *(text, nullable)*: Reserved for future use.

---

## **4. temporal table**

### **Purpose:**
Tracks historical changes in toponym usage over time.

### **Columns:**
- **`ID`** *(int, PK)*: Unique identifier for each historical record.
- **`LINGID`, `LINGNAME`** *(Composite FK → linguistic (ID, TOPONYM))*: References the linguistic entry.
- **`NESTID`** *(int, FK → temporal.ID)*: Used for grouping related toponyms (e.g., main village + merged settlements). Typically references the most current Russian toponym.
- **`STARTYEAR`** *(int)*: Earliest recorded use of the name (the year of first mention according to some source: book or map or something else), full source cited in `FULLTEXT`.
- **`DOUSTART`** *(int, nullable)*: Certainty of the start year.
- **`ENDYEAR`** *(int, nullable)*: Last known use of the name (if still active, leave empty), full source cited in `FULLTEXT`.
- **`DOUEND`** *(int, nullable)*: Certainty level for the `ENDYEAR` date (empty = "I'm sure", 1 = "It is doubtful").
- **`EVENT`** *(string)*: Type of historical event:
  - `MERGEIN`: Merged into another object
  - `ACTIVE`: Still in use
  - `RENAME`: Renamed
  - `CEASE`: Ceased to exist.
- **`OBJID`** *(int, nullable)*, **`OBJNAME`** *(text, nullable)*: Contextual references:
  - **ACTIVE**: Leave both empty
  - **RENAME**: Reference the new name, linguistic (ID, TOPONYM)
  - **MERGEIN**: Reference the absorbing object's name, linguistic (ID, TOPONYM)
  - **CEASE**: Leave both empty.
- **`COMMENTS`** *(text, nullable)*: Free-text notes.
- **`OTHER`** *(text, nullable)*: Reserved.
- **`FULLTEXT`** *(text)*: A complete textual description of the historical event, providing transparency by explicitly referencing the relevant source(s), for instance, the source.ID (see **6. sources table**) along with page numbers or other identifying details when applicable.

---

## **5. experiment table**

### **Purpose:**
Used for prototyping new fields and storing all details from sources about when a name was used.

### **Columns:**
*Work in progress.*

---

## **6. sources table**

### **Purpose:**
Stores metadata for bibliographic or archival sources used in the `temporal` table.

### **Columns:**
- **`ID`** *(string, PK)*: Unique identifier.
- **`TYPE`** *(string)*: Source type (`BOOK`, `ARTICLE`, `MAP`, `ARCHIVE`, `WEB`).
- **`TITLE`** *(text)*: Short informal title of the source (book name, article title, map title etc).
- **`YEAR`** *(int, nullable)*: Year of publication or creation.
- **`CITATION`** *(text, nullable)*: Full bibliographic citation.
- **`URL`** *(text, nullable)*: Web link to the source (if available).
- **`DIGCOPY`** *(string, nullable)*: Is a digital copy stored in the project archive? (`yes` or `no`).
- **`COMMENTS`** *(text, nullable)*: Additional notes.
- **`OTHER`** *(text, nullable)*: Reserved.

### **Usage & Relationship with Temporal Table**:
Each record in the temporal table should refer to one or more sources, cited via free text in the `FULLTEXT` field.

---

## **Relationships and Interactions**
- **`SPATID` (linguistic) → `ID` (spatial)**: Links names to places
- **`LINGID`, `LINGNAME` (temporal) → linguistic**: Tracks historical shifts in usage
- **`OBJID`, `OBJNAME` (temporal) → linguistic**: Tracks renaming and mergers

---

## **Next Steps**
- Finalize a format for the `PRONUNC` field.
- Define conventions for populating `DOUBT` fields.

---

## **Commenting Convention**
All comment fields should follow a consistent format. A contributor adds their initials (based on surname, name, patronymic), followed by a colon and a space. If building upon another’s comment, separate entries using ` | ` (space-pipe-space).

Example: 
```
SMY: в 27 км от райцентра, в лесах на правом берегу Суры, ныне не существует | IRI: На карте 1985 г. с retromap.ru это кордон Келейный, через реку от урочища Монашеское с более современных карт, примерно в месте впадения р. Чернушки в р. Бездну
```