ADDED data/csv/db_docs.md Index: data/csv/db_docs.md ================================================================== --- /dev/null +++ data/csv/db_docs.md @@ -0,0 +1,148 @@ +**Toponymic Database Documentation** + +## Overview +This database is designed to systematically store, analyze and eventually visualize historical and modern toponyms (place names). It takes a structured approach, focusing on spatial, linguistic, and temporal aspects. The database consists of six interconnected tables: + +1. **spatial** – geographical point locations and metadata +2. **water** – non-point geographical features like rivers, streams, and lakes +3. **linguistic** – attested toponyms, their variants, and linguistic attributes such as pronunciation and etymology +4. **temporal** – historical changes and usage periods of toponyms +5. **experiment** – a flexible structure for exploring additional data fields related to name changes +6. **sources** – metadata about maps, books, and documents that contain toponym information + +--- + +## **1. spatial table** + +### **Purpose:** +Stores discrete geographic entities, their coordinates, and relevant metadata. + +### **Columns:** +- **`ID`** *(int, PK)*: Unique identifier for each geographic object. Initially followed an alphabetical order based on the book by Дубанов И. С. «Топонимический словарь Чувашии. Географические названия и термины» (2023_jun_14_dubanov_toponymy), now it is basically free. +- **`LAT`, `LON`** *(float, nullable)*: Estimated geographic coordinates derived from historical and modern maps (see http://www.etomesto.ru/ , https://retromap.ru/ and various other resources). +- **`OFFNAME`** *(string)*: The most "official" available name for the location (either current or historical). +- **`LANG`** *(string)*: The language of the name, using codes such as `RUS`, `CHU`, `HIM`, `MEM`, `ERZ`, `TAT`, `UNK`. +- **`CLASS`** *(string, nullable)*: Specifies the toponymic category, e.g., oikonym (settlement), hydronym (water body), dromonym (road), etc. +- **`TYPE`** *(string, nullable)*: Specifies the sub-category, e.g., village, city (for oikonyms), river, stream (for hydronyms), etc. +- **`DISTRICT`** *(string, nullable)*: Modern administrative district (e.g., to differentiate places with the same name). +- **`DOUBT`** *(int, nullable)*: Certainty level of assigned coordinates (empty = "I'm sure", 1 = "The coordinates are doubtful"). +- **`LANDMARK`** *(text, nullable)*: Relevant for microtoponyms; describes nearby visible features to help locate the object. +- **`COMMENTS`** *(text, nullable)*: Free-text notes by database contributors. +- **`OTHER`** *(text, nullable)*: Reserved for future use. + +--- + +## **2. water table** + +### **Purpose:** +Stores metadata about rivers, streams, lakes, and other non-point water bodies. + +### **Columns:** +Work In Progress + +--- + +## ** 3. linguistic table** + +### **Purpose:** +Captures attested toponyms, their variations, and relevant linguistic metadata. + +### **Columns:** +- **`ID`** *(int, PK)*: Unique identifier for each toponym entry, independent from the spatial table. +- **`SPATID`** *(int, FK → spatial.ID)*: Links the toponym to a geographic location. +- **`DOUSPAT`** *(int, nullable)*: Certainty of spatial link (empty = certain, 1 = doubtful). +- **`MAINID`** *(int, FK → linguistic.ID)*: Points to the main name in a cluster of interconnected names (sometimes grouping is necessary, many sources give several somehow connected names for a single spatial object and we want to reflect this connections in database). +- **`TOPONYM`** *(string)*: The attested name as recorded from sources. +- **`TOPFORMS`** *(text, nullable)*: Alternative spellings with worker comments (must contain **the same language text** info as the **`TOPONYM`** field). +- **`DOUTOPO`** *(int, nullable)*: Certainty of the form in `TOPONYM` only (empty = certain, 1 = doubtful). +- **`LANG`** *(string)*: The language of the toponym, following the predefined codes. +- **`DOULANG`** *(int, nullable)*: Certainty of the language of the toponym (empty = certain, 1 = doubtful). +- **`PRONUNC`** *(text, nullable)*: Reserved for pronunciation data (format TBD). +- **`DOUPRON`** *(int, nullable)*: Certainty of the pronunciation data (empty = certain, 1 = doubtful). +- **`ETYM`** *(text, nullable)*: Etymological explanation of the name's origin (always treated as doubtful). +- **`ORIGIN`** *(string, nullable)*: Language from which the name originated, e.g. `CHU`, `RUS`, `TAT`, `OTH`, `UNK` (also always treated as doubtful). +- **`COMMENTS`** *(text, nullable)*: Free-text notes. +- **`OTHER`** *(text, nullable)*: Reserved for future use. + +--- + +## **4. temporal table** + +### **Purpose:** +Tracks historical changes in toponym usage over time. + +### **Columns:** +- **`ID`** *(int, PK)*: Unique identifier for each historical record. +- **`LINGID`, `LINGNAME`** *(Composite FK → linguistic (ID, TOPONYM))*: References the linguistic entry. +- **`NESTID`** *(int, FK → temporal.ID)*: Used for grouping related toponyms (e.g., main village + merged settlements). Typically references the most current Russian toponym. +- **`STARTYEAR`** *(int)*: Earliest recorded use of the name (the year of first mention according to some source: book or map or something else), full source cited in `FULLTEXT`. +- **`DOUSTART`** *(int, nullable)*: Certainty of the start year. +- **`ENDYEAR`** *(int, nullable)*: Last known use of the name (if still active, leave empty), full source cited in `FULLTEXT`. +- **`DOUEND`** *(int, nullable)*: Certainty level for the `ENDYEAR` date (empty = "I'm sure", 1 = "It is doubtful"). +- **`EVENT`** *(string)*: Type of historical event: + - `MERGEIN`: Merged into another object + - `ACTIVE`: Still in use + - `RENAME`: Renamed + - `CEASE`: Ceased to exist. +- **`OBJID`** *(int, nullable)*, **`OBJNAME`** *(text, nullable)*: Contextual references: + - **ACTIVE**: Leave both empty + - **RENAME**: Reference the new name, linguistic (ID, TOPONYM) + - **MERGEIN**: Reference the absorbing object's name, linguistic (ID, TOPONYM) + - **CEASE**: Leave both empty. +- **`COMMENTS`** *(text, nullable)*: Free-text notes. +- **`OTHER`** *(text, nullable)*: Reserved. +- **`FULLTEXT`** *(text)*: A complete textual description of the historical event, providing transparency by explicitly referencing the relevant source(s), for instance, the source.ID (see **6. sources table**) along with page numbers or other identifying details when applicable. + +--- + +## **5. experiment table** + +### **Purpose:** +Used for prototyping new fields and storing all details from sources about when a name was used. + +### **Columns:** +*Work in progress.* + +--- + +## **6. sources table** + +### **Purpose:** +Stores metadata for bibliographic or archival sources used in the `temporal` table. + +### **Columns:** +- **`ID`** *(string, PK)*: Unique identifier. +- **`TYPE`** *(string)*: Source type (`BOOK`, `ARTICLE`, `MAP`, `ARCHIVE`, `WEB`). +- **`TITLE`** *(text)*: Short informal title of the source (book name, article title, map title etc). +- **`YEAR`** *(int, nullable)*: Year of publication or creation. +- **`CITATION`** *(text, nullable)*: Full bibliographic citation. +- **`URL`** *(text, nullable)*: Web link to the source (if available). +- **`DIGCOPY`** *(string, nullable)*: Is a digital copy stored in the project archive? (`yes` or `no`). +- **`COMMENTS`** *(text, nullable)*: Additional notes. +- **`OTHER`** *(text, nullable)*: Reserved. + +### **Usage & Relationship with Temporal Table**: +Each record in the temporal table should refer to one or more sources, cited via free text in the `FULLTEXT` field. + +--- + +## **Relationships and Interactions** +- **`SPATID` (linguistic) → `ID` (spatial)**: Links names to places +- **`LINGID`, `LINGNAME` (temporal) → linguistic**: Tracks historical shifts in usage +- **`OBJID`, `OBJNAME` (temporal) → linguistic**: Tracks renaming and mergers + +--- + +## **Next Steps** +- Finalize a format for the `PRONUNC` field. +- Define conventions for populating `DOUBT` fields. + +--- + +## **Commenting Convention** +All comment fields should follow a consistent format. A contributor adds their initials (based on surname, name, patronymic), followed by a colon and a space. If building upon another’s comment, separate entries using ` | ` (space-pipe-space). + +Example: +``` +SMY: в 27 км от райцентра, в лесах на правом берегу Суры, ныне не существует | IRI: На карте 1985 г. с retromap.ru это кордон Келейный, через реку от урочища Монашеское с более современных карт, примерно в месте впадения р. Чернушки в р. Бездну +```