File data/archive/db_docs.md artifact 558aaf40a1 part of check-in 0eb324393e
Toponymic Database Documentation
Overview
This database is designed to systematically store, analyze and eventually visualize historical and modern toponyms (place names). It takes a structured approach, focusing on spatial, linguistic, and temporal aspects. The database consists of six interconnected tables:
- spatial – geographical point locations and metadata
- water – non-point geographical features like rivers, streams, and lakes
- linguistic – attested toponyms, their variants, and linguistic attributes such as pronunciation and etymology
- temporal – historical changes and usage periods of toponyms
- experiment – a flexible structure for exploring additional data fields related to name changes
- sources – metadata about maps, books, and documents that contain toponym information
1. spatial table
Purpose:
Stores discrete geographic entities, their coordinates, and relevant metadata.
Columns:
- ID(int, PK): Unique identifier for each geographic object. Initially followed an alphabetical order based on the book by Дубанов И. С. «Топонимический словарь Чувашии. Географические названия и термины» (2023_jun_14_dubanov_toponymy), now it is basically free.
- LAT,- LON(float, nullable): Estimated geographic coordinates derived from historical and modern maps (see http://www.etomesto.ru/ , https://retromap.ru/ and various other resources).
- OFFNAME(string): The most "official" available name for the location (either current or historical).
- LANG(string): The language of the name, using codes such as- RUS,- CHU,- HIM,- MEM,- ERZ,- TAT,- UNK.
- CLASS(string, nullable): Specifies the toponymic category, e.g., oikonym (settlement), hydronym (water body), dromonym (road), etc.
- TYPE(string, nullable): Specifies the sub-category, e.g., village, city (for oikonyms), river, stream (for hydronyms), etc.
- DISTRICT(string, nullable): Modern administrative district (e.g., to differentiate places with the same name).
- DOUBT(int, nullable): Certainty level of assigned coordinates (empty = "I'm sure", 1 = "The coordinates are doubtful").
- LANDMARK(text, nullable): Relevant for microtoponyms; describes nearby visible features to help locate the object.
- COMMENTS(text, nullable): Free-text notes by database contributors.
- OTHER(text, nullable): Reserved for future use.
2. water table
Purpose:
Stores metadata about rivers, streams, lakes, and other non-point water bodies.
Columns:
Work In Progress
** 3. linguistic table**
Purpose:
Captures attested toponyms, their variations, and relevant linguistic metadata.
Columns:
- ID(int, PK): Unique identifier for each toponym entry, independent from the spatial table.
- SPATID(int, FK → spatial.ID): Links the toponym to a geographic location.
- DOUSPAT(int, nullable): Certainty of spatial link (empty = certain, 1 = doubtful).
- MAINID(int, FK → linguistic.ID): Points to the main name in a cluster of interconnected names (sometimes grouping is necessary, many sources give several somehow connected names for a single spatial object and we want to reflect this connections in database).
- TOPONYM(string): The attested name as recorded from sources.
- TOPFORMS(text, nullable): Alternative spellings with worker comments (must contain the same language text info as the- TOPONYMfield).
- DOUTOPO(int, nullable): Certainty of the form in- TOPONYMonly (empty = certain, 1 = doubtful).
- LANG(string): The language of the toponym, following the predefined codes.
- DOULANG(int, nullable): Certainty of the language of the toponym (empty = certain, 1 = doubtful).
- PRONUNC(text, nullable): Reserved for pronunciation data (format TBD).
- DOUPRON(int, nullable): Certainty of the pronunciation data (empty = certain, 1 = doubtful).
- ETYM(text, nullable): Etymological explanation of the name's origin (always treated as doubtful).
- ORIGIN(string, nullable): Language from which the name originated, e.g.- CHU,- RUS,- TAT,- OTH,- UNK(also always treated as doubtful).
- COMMENTS(text, nullable): Free-text notes.
- OTHER(text, nullable): Reserved for future use.
4. temporal table
Purpose:
Tracks historical changes in toponym usage over time.
Columns:
- ID(int, PK): Unique identifier for each historical record.
- LINGID,- LINGNAME(Composite FK → linguistic (ID, TOPONYM)): References the linguistic entry.
- NESTID(int, FK → temporal.ID): Used for grouping related toponyms (e.g., main village + merged settlements). Typically references the most current Russian toponym.
- STARTYEAR(int): Earliest recorded use of the name (the year of first mention according to some source: book or map or something else), full source cited in- FULLTEXT.
- DOUSTART(int, nullable): Certainty of the start year.
- ENDYEAR(int, nullable): Last known use of the name (if still active, leave empty), full source cited in- FULLTEXT.
- DOUEND(int, nullable): Certainty level for the- ENDYEARdate (empty = "I'm sure", 1 = "It is doubtful").
- EVENT(string): Type of historical event:- MERGEIN: Merged into another object
- ACTIVE: Still in use
- RENAME: Renamed
- CEASE: Ceased to exist.
 
- OBJID(int, nullable),- OBJNAME(text, nullable): Contextual references:- ACTIVE: Leave both empty
- RENAME: Reference the new name, linguistic (ID, TOPONYM)
- MERGEIN: Reference the absorbing object's name, linguistic (ID, TOPONYM)
- CEASE: Leave both empty.
 
- COMMENTS(text, nullable): Free-text notes.
- OTHER(text, nullable): Reserved.
- FULLTEXT(text): A complete textual description of the historical event, providing transparency by explicitly referencing the relevant source(s), for instance, the source.ID (see 6. sources table) along with page numbers or other identifying details when applicable.
5. experiment table
Purpose:
Used for prototyping new fields and storing all details from sources about when a name was used.
Columns:
Work in progress.
6. sources table
Purpose:
Stores metadata for bibliographic or archival sources used in the temporal table.
Columns:
- ID(string, PK): Unique identifier.
- TYPE(string): Source type (- BOOK,- ARTICLE,- MAP,- ARCHIVE,- WEB).
- TITLE(text): Short informal title of the source (book name, article title, map title etc).
- YEAR(int, nullable): Year of publication or creation.
- CITATION(text, nullable): Full bibliographic citation.
- URL(text, nullable): Web link to the source (if available).
- DIGCOPY(string, nullable): Is a digital copy stored in the project archive? (- yesor- no).
- COMMENTS(text, nullable): Additional notes.
- OTHER(text, nullable): Reserved.
Usage & Relationship with Temporal Table:
Each record in the temporal table should refer to one or more sources, cited via free text in the FULLTEXT field.
Relationships and Interactions
- SPATID(linguistic) →- ID(spatial): Links names to places
- LINGID,- LINGNAME(temporal) → linguistic: Tracks historical shifts in usage
- OBJID,- OBJNAME(temporal) → linguistic: Tracks renaming and mergers
Next Steps
- Finalize a format for the PRONUNCfield.
- Define conventions for populating DOUBTfields.
Commenting Convention
All comment fields should follow a consistent format. A contributor adds their initials (based on surname, name, patronymic), followed by a colon and a space. If building upon another’s comment, separate entries using | (space-pipe-space).
Example:
SMY: в 27 км от райцентра, в лесах на правом берегу Суры, ныне не существует | IRI: На карте 1985 г. с retromap.ru это кордон Келейный, через реку от урочища Монашеское с более современных карт, примерно в месте впадения р. Чернушки в р. Бездну