ADDED data/csv/db_experiment/v0.8/db_docs.md Index: data/csv/db_experiment/v0.8/db_docs.md ================================================================== --- /dev/null +++ data/csv/db_experiment/v0.8/db_docs.md @@ -0,0 +1,118 @@ +**Toponymic Database Documentation** + +## Overview +This database is designed to systematically store, analyze and eventually visualize historical and contemporary toponyms (place names) with a structured approach, focusing on spatial, linguistic, and temporal aspects. The structure consists of three interlinked tables: +1. **Spatial Table** - Stores geographical locations and metadata. +2. **Linguistic Table** - Captures toponyms, their various variants and linguistic attributes like pronunciation and etymology. +3. **Temporal Table** - Documents historical changes in toponyms over time. + +--- + +## **1. Spatial Table** + +### **Purpose:** +The spatial table contains unique geographic entities, their locations, and relevant metadata. + +### **Columns:** +- **`ID`** *(int, PK)*: Unique identifier for each geographic object. In the very beginning the order followed an original alphabetically sorted table, 2023_jun_14_dubanov_toponymy, based on a book (Дубанов И.С. «Топонимический словарь Чувашии. Географические названия и термины») order, now it is basically free. +- **`LAT`, `LON`** *(float, nullable)*: Estimated geographic coordinates derived from historical and modern maps (see http://www.etomesto.ru/ , https://retromap.ru/ and various other resources). +- **`OFFNAME`** *(string)*: The most "official" available name for the location (either current or historical). +- **`LANG`** *(string)*: The language of the name, using codes such as `RUS`, `CHU`, `HIM`, `MEM`, `ERZ`, `TAT`, `UNK`. +- **`CLASS`** *(string, nullable)*: Specifies the toponymic category, e.g., oikonym (settlement), hydronym (water body), dromonym (road), etc. +- **`TYPE`** *(string, nullable)*: Specifies the sub-category, e.g., village, city (for oikonyms), river, stream (for hydronyms), etc. +- **`DISTRICT`** *(string, nullable)*: Modern administrative district (e.g., to differentiate places with the same name). +- **`DOUBT`** *(int, nullable)*: Certainty level of assigned coordinates (empty = "I'm sure", 1 = "The coordinates are doubtful"). +- **`LANDMARK`** *(text, nullable)*: Relevant for microtoponyms; describes nearby visible features to help locate the object. +- **`COMMENTS`** *(text, nullable)*: Free-text notes for human workers. +- **`OTHER`** *(text, nullable)*: Reserved for future use. + +--- + +## **2. Linguistic Table** + +### **Purpose:** +The linguistic table stores different names (and their variants) of the same geographic object across different languages and sources, etymological information and other linguistic metadata. + +### **Columns:** +- **`ID`** *(int, PK)*: Unique identifier for each toponym entry, independent from the spatial table. +- **`SPATID`** *(int, FK → Spatial.ID)*: Links the toponym to its corresponding spatial object. +- **`DOUSPAT`** *(int, nullable)*: Certainty of name relation to the spatial object (empty = "I'm sure", 1 = "The spatial link is doubtful"). +- **`MAINID`** *(int, FK → Linguistic.ID)*: Represents the "main" name in a "nest" of interconnected toponyms (sometimes grouping is necessary, many sources give several somehow connected names for a single spatial object and we want to reflect this connections in database). +- **`TOPONYM`** *(string)*: The attested name as recorded from sources. +- **`TOPFORMS`** *(text, nullable)*: Holds different attested spelling variations (must contain **the same language text** info as the **`TOPONYM`** field) of the name from multiple sources and free text comments of database workers. +- **`DOUTOPO`** *(int, nullable)*: Certainty of the given **`TOPONYM`** form, not of the **`TOPFORMS`**, only of the **`TOPONYM`** column (empty = "I'm sure", 1 = "It is doubtful"). +- **`LANG`** *(string)*: The language of the toponym, following the predefined codes. +- **`DOULANG`** *(int, nullable)*: Certainty of the language of the toponym (empty = "I'm sure", 1 = "It is doubtful"). +- **`PRONUNC`** *(text, nullable)*: Reserved for pronunciation information (format yet to be decided). +- **`DOUPRON`** *(int, nullable)*: Certainty of the pronunciation information (empty = "I'm sure", 1 = "It is doubtful"). +- **`ETYM`** *(text, nullable)*: Etymological explanation of the name's origin. It is always doubtful. +- **`ORIGIN`** *(string, nullable)*: Language from which the name originated (e.g., `CHU`, `RUS`, `TAT`, `OTH`, `UNK`). This one is always doubtful too. +- **`COMMENTS`** *(text, nullable)*: Free-text notes. +- **`OTHER`** *(text, nullable)*: Reserved for future use. + +--- + +## **3. Temporal Table** + +### **Purpose:** +This table records historical changes in toponyms, tracking their usage over time. + +### **Columns:** +- **`ID`** *(int, PK)*: Unique identifier for each temporal record. +- **`LINGID`, `LINGNAME`** *(Composite FK → Linguistic (ID, TOPONYM))*: Together, these fields reference a specific toponym in the linguistic table. +- **`START`** *(int)*: Earliest recorded use of the name (the year of first mention according to some source: book or map or something else), full source text goes to **`FULLTEXT`** column. +- **`DOUSTART`** *(int, nullable)*: Certainty level for the `START` date (empty = "I'm sure", 1 = "It is doubtful"). +- **`END`** *(int, nullable)*: The last recorded use of the name. If the object still exists, this field remains empty. The source text is stored in the **FULLTEXT** column. +- **`DOUEND`** *(int, nullable)*: Certainty level for the `END` date (empty = "I'm sure", 1 = "It is doubtful"). +- **`EVENT`** *(string)*: Categorization of historical change, with predefined values: + - `MERGEIN`: The object was absorbed into another. + - `ACTIVE`: The name is still in use. + - `RENAME`: The object was renamed. + - `CEASE`: The object ceased to exist. +- **`OBJID`** *(int, nullable)*, **`OBJNAME`** *(text, nullable)*: Context-dependent: + - **ACTIVE**: Both empty. + - **RENAME**: Stores the information about the **new** name of the object, it is a reference to Linguistic (ID, TOPONYM). + - **MERGEIN**: Stores the information about the **absorbing** object, again, it is a reference to Linguistic (ID, TOPONYM). + - **CEASE**: Both empty. +- **`COMMENTS`** *(text, nullable)*: Free-text notes. +- **`OTHER`** *(text, nullable)*: Reserved for future use. +- **`FULLTEXT`** *(text)*: A complete textual description of the historical event, providing transparency by explicitly referencing the relevant source(s), for instance, the Source ID (see **4. Sources Table**) along with page numbers or other identifying details when applicable. + +--- + +## **4. Sources Table** + +### **Purpose:** +This table consolidates bibliographic or archival references that substantiate historical information in the **3. Temporal Table**. Each entry describes a single source, such as a book, article, map, or manuscript. Linking these sources to `Temporal` ensures that statements like “this settlement was called so-and-so from year X to year Y” can be properly backed by historical documentation. + +### **Columns:** +- **`ID`** *(string, PK)*: Unique identifier for each source. +- **`TYPE`** *(string)*: The type of source (e.g., `BOOK`, `ARTICLE`, `MAP`, `ARCHIVE`, `WEB`). +- **`AUTHOR`** *(string, nullable)*: Author(s) or compiler(s). +- **`TITLE`** *(text)*: Title of the source (book name, article title, map reference, etc.). +- **`YEAR`** *(int, nullable)*: Publication or creation year of the source. +- **`PUBLISHER`** *(string, nullable)*: Publisher or institution name, if applicable. +- **`CITATION`** *(text, nullable)*: Full bibliographic reference or archival notation (e.g., “И. С. Дубанов, 2020, ...”). +- **`COMMENTS`** *(text, nullable)*: Additional remarks or clarifications about the source. +- **`PDF`** *(string, nullable)*: Is PDF file accessible somewhere? It is simply "yes" or "no". +- **`OTHER`** *(text, nullable)*: Reserved for future use. + +### **Usage & Relationship with Temporal Table**: +- Each record in the **Temporal** table can link to one or more **Sources** to document the basis for historical data through free text in the **`FULLTEXT`** field. + +--- + +## **Relationships and Interactions** +- **`SPATID` (Linguistic) → `ID` (Spatial)**: Links toponyms to geographic locations. +- **`LINGID` (Temporal) → `ID` (Linguistic)** and **`LINGNAME` (Temporal) → `TOPONYM` (Linguistic)**: Tracks historical changes of a specific name. +- **`OBJID` (Temporal) → `ID` (Linguistic)** and **`OBJNAME` (Temporal) → `TOPONYM` (Linguistic)**: Records specifically name changes or mergers. + +--- + +## **Next Steps** +- Finalize the format for the `PRONUNC` field. + +--- + +## **Various** +- Comment fields. It is assumed (in order to clarify the workflow for specific fields in the database and to more quickly achieve understanding between workers) that comments are filled in a standardized way. First, the employee puts (if there is a previous comment, then first a space, a vertical bar, another space) the three-letter code assigned to him/her, as a rule, based on his/her full name (surname, name, patronymic), then a colon, then a space, and after this prefix -- the actual text of the comment. Example: "SMY: в 27 км от райцентра, в лесах на правом берегу Суры, ныне не существует | IRI: На карте 1985 г. с retromap.ru это кордон Келейный, через реку от урочища Монашеское с более современных карт, примерно в месте впадения р. Чернушки в р. Бездну"