Glomming a Database – Part I

The first thing to do when wrapping and glomming a database is to extract the endeme sets.

Part I – Extracting the endeme sets

To extract candidate endemes from a database, it would be nice to build a tool that would extract endeme sets from a database in preparation to glomming a database. Then you could build an endematic wrapper around an entire database.

Process the items of each column to identify the endeme sets:

  • use Levenshtein matrix to coalesce similar stuff.
  • use my endemes for 5000 words document to coalesce similar meanings.
  • more approaches based in data science.

Lookup tables may be implicit or explicit.

Table Column analyses

Column analyses:

  • general content/data columns
    – mostly unique items columns [U] – no endeme sets extracted
  • id/plumbing column
  • context column
  • lookup id column
  • implicit lookup table columns
    • Bits – multiple bit columns ‘column’, process multiple items of multiple columns with same type (bit) [B] – 1 set extracted
    • Conflated – two or more endeme sets [C] – 2+ sets extracted
    • Denormalized – zero normal form column [D] – 1 or 2 sets extracted
    • Endematic – endematic range – 16-32 rows [E] – 1 set extracted
    • Few – few rows – under 8 [F] – a fraction of a set extracted
    • Many different items column [M] – 1 set extracted
  • Freetext
    – R 1+  freetext column with repeating words
    – R 1+  freetext column with repeating concepts
  • Look for concept sets, concepts have additional structure that endemes do not have
    – concepts are generally characterized by two, 3 or 4 endeme sets in a row or chain

Lookup table analyses

Explicit lookup table analyses, process lookup id’s in a databse to build stuff

  • explicit lookup table content column
    • Bits – multiple bit columns ‘column’, process multiple items of multiple columns with same type (bit) [B] – 1 set extracted
    • Conflated – two or more endeme sets [C] – 2+ sets extracted
    • Denormalized – zero normal form column [D] – 1 or 2 sets extracted
    • Endematic – endematic range – 16-32 rows [E] – 1 set extracted
    • Few – few rows – under 8 [F] – a fraction of a set extracted
    • Many – many rows – over 64 – [M] 1 set extracted
    • Unique – most rows are unique – no endeme sets extracted
  • where it’s used as context

Junction table analyses

  • there’s got to be something I can do with junction tables.

Views and Reports

Endemizing reports, views, and stored procedures that return data sets (‘report’ and ‘get’ sp’s)

  • Context profiles?
  • Endematic metadata – reports mostly show numbers, endemes can store relative values
  • The row is the endeme item, the endeme indicates how it ‘compares’/’relates’ to other items

Other stored procedures (and inline SQL code)

These may be used to identify contexts and relationships between tables and columns.

 

I wonder if I can do the same thing with code?

Advertisements

The Information Instinct to Denormalize

Denormalization has something to do with information. People like to see denormalized data because they like to see its context. Context is an important part of information. Denormalization is the opposite of the DRY principle in programming.

Examples of Denormalization for Information Purposes

Here are some examples I have seen of denormalization for information purposes, some good, some bad:

  • Endemes [fully controlled]     endemes coalescing atomic units of information – boiling them down into a format that yields them together. This one is often good. It needs lots of overhead/framework programming however.
  • Container Name References [partially controlled] container name/reference included in each item in the container, sometimes good, often bad.
  • Meaningful ID’s [partially controlled] id’s containing data – sort of a name that is a summary of natural key, summaries are information. This one is usually bad.
  • Presentation Layers [partially controlled] Presentation layer presentations, screens, reports, forms, dashboards, usually good, sometimes bad. Bad when they get int he way of taking the information in the program to the next level.
  • Zero Normal Form [mostly uncontrolled]  columns containing multiple data items, usually of different ‘types/columns/classes/fields’, always bad.
  • Freetext [fully uncontrolled]   freetext. Usually good. Sometimes you should have a dropdown selection instead.

What is Programming?

Programming is all about defining the meanings of words and using them to do things.

Our next step is to write programs that handle the meaning of words or something analagous to that. I could call this ‘Managing data using word meanings’. ‘Finishing the processing of the meaning of words’. or ‘Finishing the processing of words’.

I am thinking about the need to make a case for the existence of level 3. Very little work gets done on this level as far as I know. So maybe level 3 does not exist.

My Case for Level 3

My case for its existence is – there are bits, numbers, words, and sentences

Level 4 is all about relationships
level 3 is all about meaning
level 2 is all about data
level 1 is all about storage

level1 1 2 and 3 all seem to have something to do with the meanings of words:
level 1 provides the storage
level 2 provides the processing of words using hard coded context, for its precision.
level 3 completes the meaning of words
level 4 puts the words in context

Maybe Typing is the Problem

maybe compilers are not the problem. maybe typing is the problem. Not just strong typing, any typing. We need something more flexible, and yet still processable. Xml is a form of context, however it has the hard coded typing problem. Why is hard coded typing a problem? Because hard coded context itself can not be processed. Is that the real problem?

The idea of meaning for data and context to manage it

Context is critical. relationship is critical but it is not primary. The meaning of a word by itself is critical. We need to learn how to handle words. We have made words hard processable. Now we need a layer on to op the hard processable words that finishes
the job of lower level programming.

What will it take to finish the job of meaning aside from relationship?

1. context processing – contexts are a sort of relationship but what I mean by context is context by class and table.
– the ability to process that context is important – is this level 3 or level 4?
– the part that is level 3 is the part that manages the contexts of levels 1 and 2
! we need a context manager.
2. endematic meaning – the meaning of the word (mostly) by itself – in other words the definition of a node in a graph.
3. the meainig of words
the endeme set to be applied – is this level 4?
4. a system to store a words meanings and endeme sets

endeme – enumerated meme – this is the definition of an endeme in A.I. terms.

Every Diagram Has its Use, Here is an Overview of Software Development Diagrams

I have gone and looked at the various software development diagrams available on Wikipedia. I have focused on the graphs (nodes and connections). The natural level for graphs is level 4 (knowledge representation and ontologies). However level 4 has a broader scope than just (K.R. and O.). Here is what I was able to suss out.

Level 4 – High Level Information Programming – Knowledge Representation

Level 4 proper diagrams mostly have to do with relationships.

  • Anchor modeling
  • Concept map
  • Conceptual graph
  • Conceptual model
  • Concept-oriented model
  • Context diagrams
  • Data flow diagram – this diagram has a long and storied history, As level 4 develops, I expect to see this diagram rise in importance again.
  • Domain model
  • EXPRESS-G
  • Information flow diagram
  • Information model
  • Object model
  • Object-role modeling
  • Semantic network
  • Semantic Web
  • Top Level Ontology

Some level 4 proper diagrams are based more on hierarchies than relationships. Of course a hierarchy is a kind of relationship.

  • Argument map
  • Cladistics
  • Document Object Model

Level 4 for Other levels

Here are some level 4 diagrams that apply to other levels:

  • 4(5) Fuzzy cognitive map
  • 4(3) Nets within Nets
  • 4(3) Specification and Description Language
  • 4(3) Composite Structure Diagram (UML)
  • 4(2) Abstract syntax tree
  • 4(2) Class diagram (UML)
  • 4(2) LePUS3
  • 4(2) Tree structure
  • 4(1) Entity–relationship model
  • 4(1) Hierarchical database model
  • 4(1) Network model

Diagrams for Other Levels

Here are diagrams that apply directly to other levels, although one could
certainly build systems at level 4 that would process these.

Level 1 – Data Storage and Relational Databases

Level 1 isn’t really very diagrammatic. It mostly relies on grids. The diagrams that apply best seem to be level 4 diagrams that manage level 1 such as entity relationship diagrams.

  • Jackson structured programming
  • Symbol table

Level 2 – Data and Object Oriented Programming

Many of these diagrams have to do with various sorts of process flow.

  • Algebraic Petri net
  • Behavior tree
  • Business process mapping
  • Business process model
  • Control flow diagram
  • Decision tree
  • Directed acyclic graph
  • DRAKON diagram
  • Dynamic model
  • Enhanced Transition Schematic
  • Event-driven process chain
  • Finite-state machine
  • Flowchart
  • Flow process chart
  • Function model
  • Interaction overview diagram (UML)
  • Petri net
  • Sequence diagram (UML)
  • State diagram (UML)
  • State Machine
  • Use Case Diagram (UML)

Level 3 – Low Level Information Programming

I have had a lot of trouble trying to find any diagrams that apply to level 3.
These are the closest, but they are very thin gruel.

  • Activity diagram (UML)
  • Composite structure diagram
  • Business logic
  • Component diagram (UML)

Level 5 – Artificial Intelligence

Knowledge representation is a substrate for artificial intelligence. I expected to find more of these. Perhaps as level 4 develops there will be more in the future.

  • C-K theory
  • Extended Enterprise Modeling Language

Level 6 – Computational Creativity

Computational Creativity is a severely underdeveloped level. I didn’t expect to find anything, but here is one that may make sense.

  • Mind map

This Library Lets You Put Endemes Into Action

I have just pushed out a new version of the information library. It is at https://sourceforge.net/projects/informationlibc/. This library will allow you to work with endemes in a POCO framework friendly way. The library is also business friendly so you can use it on the job without a license.

New features to the library include endeme lists, endeme references and endeme fields. Also added are more unit tests, more data table methods, and some endeme based micro languages.

What Kind of Software Developer Are You?

Technology    +-------------+          User
Focus         | User        |         Focus
              | Interface   |
+-------------+-------------+-------------+
| Integration | Object      | Information |
| Orientation | Orientation | Orientation |
+-------------+-------------+-------------+
              | Data        |
Network       | Storage     |            BI
Focus         +-------------+         Focus

There are five kinds of software developers. Many developers cover more than one kind. Object Orientation is the core of programming. The value of specialization is that you can get more work done, with higher quality, and better service to users if you assign tasks to the software developers that specialize in each kind of task. The value of information is that it can tell you what specializations are needed for each tasks, and which software developers have which specializations. The value of endemes is that they provide a framework for seamless specialization information gathering and use.

  • Integration oriented developers specialize in IT, integrating systems, using pre-built systems, using frameworks, maintenance programming and troubleshooting, new technology integration, and system architecture.
  • Object oriented developers specialize in data structures, architectures, framework building, middle tier development, and computer languages, testing, and UML.
  • Information oriented developers specialize in domain based design, user needs, endemes, knowledge representation, business intelligence, business rules, business needs and middle tier development, user concepts, and information modeling.
  • User interface developers specialize in user interface coding, layout, UI design, UX, usability, mobile, web, desktop and user concepts.
  • Data storage developers specialize in databases, SQL, NoSQL, performance, data modeling, load balancing, and database administration.

Getting a Computer to Understand More Than Words

Somewhere between words and sentences is multi-word terms. Endemes can be used to define words in a way a computer can understand. Multi-word terms is the next step up from using endemes to define words:

Here is the stack:
1. endeme characteristics
2. words
3. multi word terms = concepts
4. phrases
5. sentences
6. paragraphs
7. books, conversations